We're updating the issue view to help you get more done. 

Failure to recover from cluster restart

Description

As reported in https://discourse.igniterealtime.org/t/openfire-cluster-unable-to-recover-from-nodes-crashing/76594, the following steps leads to NullPointer- and ClassCasting exceptions.

  1. Send a message from client A (connected to node A) to client B (connected to node B)

  2. Client B receives the message

  3. Send a SIGTERM to the OpenFire process running on node A

  4. Restart OpenFire on node A

  5. Reconnect client A

  6. Send a message from client A (connected to node A) to client B (connected to node B)

  7. Client B receives the message

  8. Send a SIGTERM to the OpenFire process running on node B

  9. Restart OpenFire on node B

  10. Reconnect client B

  11. Send a message from client A (connected to node A) to client B (connected to node B)

Logs from Node B:

Environment

None

Activity

Show:
Nathan Neulinger
August 30, 2016, 8:35 PM

FYI, in case it helps you any - I found that a "fast" restart of openfire causes this every time, but if you leave it down for a while (over 30 seconds by default I think) between 3-4, 8-9 - it is more generally able to come back online.

I thought I had opened a jira issue on this, but can't seem to find it now.

Nathan Neulinger
August 30, 2016, 8:57 PM
Edited
Greg Thomas
January 23, 2019, 2:23 PM
Moved to GH

Assignee

Dave Cridland

Reporter

Guus der Kinderen

Labels

None

Priority

Major
Configure