On joining a cluster, generate appropriate join presences to reconcile the MUC participant visibility
Description
Environment
Activity
Guus der Kinderen August 5, 2021 at 2:25 PMEdited
Ugh. Most of the above assumes that there will always be a call to joinedCluster()
(“I have joined a cluster”) on one node, and joinedCluster(NodeID)
(“someone joined our cluster) on the others. I don’t know if that’s true. We’ve seen with cluster breakages that each part of the cluster only calls leftCluster(NodeID)
(“someone left our cluster”), where at least I naively assumed that one isolated cluster node would call leftCluster()
(“I have left the cluster”). This turned out to be a false assumption, in the sense that it does not always occur (but it does, sometimes - eg: when disabling clustering, or recovering from a split-brain scenario).
Guus der Kinderen August 5, 2021 at 2:13 PM
The second option from my last comment probably is not ideal for another reason: when the syncing of data is initiated by the other nodes of the cluster, can we be sure that the local node has finished reconciling inconsistencies between locally known data, and possibly conflicting remote data?
Maybe the best approach is to have the joining node pull data from the cluster - either by using option 1 in my last comment, or by implementing a new cluster task. The latter could possibly be used too for pushing the local data to the cluster nodes.
Guus der Kinderen August 5, 2021 at 1:16 PM
To know about all occupants on all cluster nodes by node ID, we can again have at least two approaches:
We can iterate over the clustered cache (which at that point can be used to access all clustered data. This, however, is iterating over rooms, an not so much occupants. Also, this cache does not include a reference to what node-id provided what data - and trying to maintain that entity-to-clusternode relationship is what is one of the sources of troubles that we aim to take away with the MUC rewrite.
We can have each node send its list of occupants (with it’s own ID), so that we have a snapshot of data. On first glance, this involves some data duplication (and will cause the joining node to be bombarded with tasks on a larger cluster.
Guus der Kinderen August 5, 2021 at 12:38 PM
I just realized that we’ll also need to know about all occupants on all cluster nodes, by node ID, to repopulate the local copy as introduced by OF-2224
Guus der Kinderen August 5, 2021 at 10:15 AM
The solution should probably take into account that we (probably) need to avoid sending presence stanzas for occupants that have possibly not yet been added to the clustered cache (I’m not sure if we can guarantee the order of events when a node is joining a cluster).
Both solutions in the description make the joining node responsible for sending presence for all local users to all users on remote nodes, so that seems to be covered.
When remote nodes ‘get’ the ‘remote node joined’ event, I’m not sure if we can be sure if the users on that node have already been added to the clustered cache. The nodes that receive those events might not see the new users yet.
Implementation-wise, I don’t think we can re-used the existing ‘broadcast’ functionality, as that will send data to all occupants of the room, including the occupants that have already had the information. It seems that we somehow need to work with stanzas directed at single users.
I’m seeing two ways to implement the distribution of the stanza:
Have a node generate stanzas, addressed from the room, to individual occupants (that could include users on remote nodes), and send those ‘locally’ using the normal routing API (eg:
XMPPServer.getInstance().getPacketRouter().route(presence)
). The existing implementation should make sure that the stanzas are properly distributed over the cluster.Have a node generate a clustered task, directed at the node(s) on which the intended recipients are connected, and have those nodes (during execution of the task) generate stanzas and send them to the recipients that are on the local node only (again, using something like the normal routing API).
On first glance, the latter seems to be more efficient (less data across the cluster), but more complex (more room for bugs).
An issue that also comes into play is that of code duplication. It is probably best to not have many methods that are roughly doing the same. If we can find an implementation that we can re-use, all the better.
When a node is temporarily disconnected from a cluster, the state of MUC membership local to any node will have been reconciled according to reachability and leave presences sent (see OF-2229, OF-2230).
On rejoining the cluster (or joining for the first time) caches will be updated (dealt with in another task) that reconciles the cluster’s view of who is in the room. Join presences need to be sent so that all participants share that full list of occupants.
Join presences for all MUC participants on both sides need to be sent across the “join boundary” between the joining node. That is to say, if a 3rd node joins a cluster, then all occupants on the first 2 nodes require a join presence from all occupants on the 3rd AND all occupants on the 3rd node need a join presence from all occupants on the first 2 nodes.
Any user receiving a duplicate join presence for another occupant that they already regard as “present” should be avoided.
This could be the responsibility of the joining node:
Broadcast a join presence for all local users to all other nodes in the cluster
Iterate the clustered cache to get list of non-node-local occupants and broadcast join presence for each to local node only
Alternatively, this could be the responsibility of all nodes
Joining node broadcasts a join presence for all local users to all other nodes in the cluster
Existing nodes broadcast a join presence for their local users to the joining node