This is a hard case to reproduce but using optimistic caches and using a cache only for locks instead of locking the cache to modify is not the best option. Therefore, we need to:
1) review cache types of each cache depending on its usage
2) get rid of optimistic caches since they cannot be locked
3) lock the cache to modify instead of using a cache of locks
4) Fix some NPE in ClusterListener and make it more error-proof
Hard to give a test case here. Generate a hell of a concurrent load on the server, kill nodes, etc. and make sure that presence of contacts in your roster are always correct in all nodes.