After adding Hazelcast to the XMPP domain at igniterealtime.org, I noticed stack traces like these in the error log:
The root cause lies in org.jivesoftware.openfire.pep.PEPServiceManager#pepServices, which is a cache of PEPService instances. When used in a cluster, it tries to serialize PEPService instances, which fails. The AdHocCommandManager that is part of the instance (org.jivesoftware.openfire.pep.PEPService#adHocCommandManager) is not Serializable.
The obvious way out would be making that class Serializable. I do wonder if that's not adding to much to an already rather bloated cached entity.
Looks like a timing issue during cluster startup. I have added some synchronization to the CacheFactory that should fix this (SVN 13382).
Didn't appear to work and another bug appeared
Yikes! Hope there's no "3-stikes" rule in effect ... apologies for the churn. How quickly race conditions turn into steeplechases.
Anyway - It seems we were switching to the clustered factory strategy class prematurely (after initiating the cluster start, but before the server had actually joined the cluster). I have applied another small change to the CacheFactory (SVN 13383) to correct this timing problem.
Ready for dogfooding ... if you dare!
Thanks, dogfooded just now and did not get any errors. Will mark as fixed