When one cluster node disconnects, users cannot connect to other nodes for 60 seconds

Description

If Openfire runs in a cluster, and the cluster loses one cluster node, at least some (possibly all) users cannot connect to nodes that remain in the cluster for quite a bit of time (approximately 60 seconds).

It is undesirable for a cluster to stop accepting new connections when one of the cluster nodes fail.

Environment

None

Activity

Show:
Guus der Kinderen
November 19, 2020, 11:32 AM

This occurs only when there's an unclean break from the cluster. I believe that this is unavoidable to a large degree: when running an async cluster and detecting network issues, there's a gray area where a remote node might simply be briefly unresponsive (eg: garbage collection) or actually disconnected. We probably should rely on Hazelcasts expertise here, instead of trying to tweak the params for this detection.

Not a bug

Assignee

Guus der Kinderen

Reporter

Guus der Kinderen