Improve BOSH robustness when connections drop for unknown reasons
In one of our clients' environment, they were experiencing HTTP connections being randomly dropped (probably caused by a proxy server somewhere along the way). This is causing long polling connections to be dropped and our chat client to re-request the connection.
This wasn't working in openfire, and we kept getting the error "Could not locate connection: xxxxx".
The cause of this seemed to be twofold:
The BOSH spec allows a client to re-request a previous RID, and the code is in the createConnection method of HttpSession to do this, but it wasn't adding the connection to the connectionQueue (and so HttpSession.getResponse was throwing the "Could not locate connection" exception). Adding the re-requested connection to the connection queue seemed to solve this.
The other problem was that the client was detecting the dropped connection BEFORE openfire was, so the client was re-requesting a long-polling connection while openfire still thought the current one was active. This case isn't actually mentioned in the BOSH XEP, but the sensible solution seemed to me to assume that if a curently active connection has been re-requested to do the following:
Assume that there has been a problem
Respond immediately to the existing connection (this will almost certainly be an empty response, and will likely fail, but not necessarily)
Immediately return the new connection with a copy of the response from the previous connection.
I have attached a patch with my implementation of this against the current openfire trunk.
I can't find anything to add to the reasoning of the author. I can see how it would affect robustness. I've applied to patch to trunk. Let's test-run it on Ignite.
I applied this patch and am running igniterealtime with it now! Very interested to test it out...