NullPointerException in S2S when cluster node is switched off
Description
This issue was found in Openfire 4.8.0-beta, in a clustered environment.
When one of the cluster nodes get shut down, users on the other cluster node can no longer communicate with federated domains.
Looking at the remote server sessions on the admin console of the remaining cluster nodes, they are listed as one-sided (incoming only).
This NullPointerException is logged:
Environment
None
Activity
Show:
Guus der Kinderen January 12, 2024 at 2:17 PM
The cleanup routine does not account for the possibility of having more than one session from a remote domain, which is causing an Exception to be thrown. This in turn prevents the ‘cluster leave’ event from being properly processed, which ends up leaving invalid data in various caches. This is what then results in the observed NullPointerExceptions.
Guus der Kinderen January 12, 2024 at 12:52 PM
I suspect that the unexpected null valus are a result of RemoteIncomingServerSession not being able to successfully execute a RemoteSessionTask.
When reproducing, I find that the Incoming Server Session Info Cache keeps referring to the cluster node that has been disconnected.
In the logs, this error is logged:
This suggests that the invocation of SessionManager.leftCluster(SessionManager.java:1751) which is the method that processes the leaving of a cluster node, got prematurely terminated (it error’ed out). That could easily lead to invalid state.
(we do not need to restore the info for sessions on other nodes, as those will be dropped right after invoking this method anyway).
If an exception is thrown, then the code responsible for doing whatever needs doing ‘right after invoking this method’ probably does not get executed.
Anno van Vliet December 12, 2023 at 4:30 PM
Closing sessions in the admin console are not fixing the issue.
Another additional info is that clicking on the entries in the Server Session generates a JSP error:
This issue was found in Openfire 4.8.0-beta, in a clustered environment.
When one of the cluster nodes get shut down, users on the other cluster node can no longer communicate with federated domains.
Looking at the remote server sessions on the admin console of the remaining cluster nodes, they are listed as one-sided (incoming only).
This NullPointerException is logged: