Boris Grozev of Jitsi reports that they discovered a race condition which leads into MUCs becoming unjoinable.
We ran into the same issue with Jitsi/Jicofo after updating to Smack 4.4.3, and I believe the root cause is a race condition in MultiUserChat. Here’s the race condition we believe is possible:
Thread A executes the MultiUserChat#presenceListener for a stanza containing self-presence. It gets interrrupted here (note that it does not hold a lock on the MultiUserChat object).
A user Thread B executes leave(), which clears occupantsMap. It also removes presenceListener from the connection, but it doesn’t affect Thread A which is already running.
Thread A continues execution and places an entry in occupantsMap for the local occupant.
At this stage, until the MultiUserChat object gets garbage collected, it is impossible to join the same MUC with the same nickname. Getting the cached MultiUserChat object from MultiUserChatManager and calling createOrJoin(nickname) leads to the thread being stuck in the state Guus reported. This is because enter waits for the processedReflectedSelfPresence flag to be set, which neverr happens because the self-presence was mistakenly handled as an update to an existing occupant 1.
We believe we’re running into this problem in practice, because in the current Jicofo code we send a presence update right before we leave the room, and the XMPP server runs on the local machine. These were seen shortly before we saw a thread hung in enter() for the same MUC.
Pull request with the proposed fix at https://github.com/igniterealtime/Smack/pull/494