Over the last few weeks, we've noticed that the CPUs of the server that run the XMPP domain on igniterealtime.org spike.
In several thread dumps, it appears that a number of threads are busy updating a hashmap in the class IQPEPHandler. One (of many) example of such a thread is this one:
In a heap dump, I found that this map was almost 10MB in size (holding only textual data).
Given the above, it appears that this map is misused.
When looking at the code, it becomes clear that the map is intended to be a cache that holds all presence from all JIDs on domains other than the local domain. This in itself is a travesty - apart from the fact that the data will never be complete and up-to-date, an attempt to maintain such a cache easily consumes to much resources (the CPU spikes being a symptom of this).
The cache should be removed.
The cache is used in an attempt to send pubsub notification to remote JIDS that are online (and not to those that are not online).
It appears that the pubsub specs allow for this check to be skipped:
Implementations of pubsub MAY deliver event notifications only when the subscriber is online. In these cases, the option may be a node configuration option as shown in the examples above. To facilitate this, the pubsub service needs to subscribe to the subscriber's presence and check the subscriber's current presence information before sending any event notifications (as described in RFC 3921). Presence subscriptions MUST be based on the subscribed JID.
I have removed all traces of the cache, as well as the infrastructure that was pretending that it'd be able to supply all relevant data. Data provided by that interface is incomplete at best - it should not be used.