Fixed
Details
Assignee
Guus der KinderenGuus der KinderenReporter
Guus der KinderenGuus der KinderenFix versions
Priority
Critical
Details
Details
Assignee
Guus der Kinderen
Guus der KinderenReporter
Guus der Kinderen
Guus der KinderenFix versions
Priority
Created March 10, 2025 at 3:57 PM
Updated March 10, 2025 at 7:14 PM
Resolved March 10, 2025 at 7:14 PM
IgniteRealtime (running 5.0.0-SNAPSHOT) was in a deadlock-ish state (without an actual deadlock being reported - locks held over the cluster contributed to the erroneous state, but won’t be reported as a deadlock’ed state).
This was logged on one of the cluster nodes:
What is confusing is that a outgoing server session is routing a stanza to a local entity (in this case, a MUC room).
Judging from line numbers in the stack trace, it can be deduced that this stanza must have been an error stanza that is triggered by
LocalSession#canProcess
returningfalse
.A few things are notable here:
the
else
block of thecanProcess()
check assumes that thefalse
value is strictly caused by a Privacy List (XEP-0016)-based condition. That’s not at all the case in every implementation for thecanProcess()
method.The
LocalSession
that’s in play in this instance must have been aLocalOutgoingServerSession
: as it’s triggered byOutgoingSessionPromise
. ThecanProcess()
implementation in this class already sends out its own error. It appears that another error is sent inLocalSession
In
LocalOutgoingServerSession
’s implementation, we’ve previously found that it was a good idea to send the error asynchronously to prevent deadlocks (OF-2341 OF-2342). The ‘second’ error that’s being reflected byLocalSession
is not sent asynchronously (and seems to trigger a deadlock).It appears to be desirable to modify the processing of the return value of
canProcess
. It must both be consistent with the intended error based on the type of problem (eg: Privacy Lists require strict processing, different from connectivity issues). It also must not be duplicated.