Examination of recent openfire heap dumps shows some HTTPSessions with thousands of pendingElements inside. I suspect the bug with with a HTTPConnection not getting closed properly, but regardless, I propose this patch to place a cap on the pendingElements. This introduces a new JiveGlobal
xmpp.httpbind.client.maxpending
defaulted to 99999
patch attached
So, I ran this patch in production, so to see what happens, and look at this:
2011.04.13 01:16:52 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:17:22 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:17:52 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:18:22 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:18:52 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:19:22 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:19:52 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
2011.04.13 01:20:22 org.jivesoftware.openfire.http.HttpSessionManager - Closing HTTP sessionID 99065ea1 due to inactivity
keeps repeating every 30 seconds. isClosed must be getting set, without actually closing the session. Hmmmm
Oye, now I am up to 186 sessions that are not being removed. My patch makes info.log turn over quickly :/
Hmmmm, I don't recall what I ended up doing with this. Getting off the timeline for now for inclusion in a future release.
I've been looking at this one, and I believe the problem is actually that the session close listeners are not being called if there is a delivery problem with the pending elements during session shutdown. This prevents the invalid session(s) from being removed from the session map.
Also in my opinion putting in a limit to the pending elements queue does nothing to address this issue (per Daryl's comments above), so the proposed patch will not be applied.
I am testing a potential fix and will commit shortly.
Assumed fixed based on initial testing, although regression testing continues.