Pubsub items (persistent) may be dropped in certain cases
Pubsub items that are published to persistent nodes may be dropped in certain cases:
I/O error during flush on the write cache
Slow response from the cluster during shutdown
We would like to add some retry logic where appropriate and slim down the pubsub module shutdown process to minimize potential for dropped pubsub items.
Improved pubsub persistence and ensured proper shutdown from launcher app
Hi Robin -
Thanks for reviewing the patch ... I agree with your recommendation and will take another swing to pull the retry counter out of the PublishedItem class sometime over the next few days.
The retry count stuff really shouldn't be in the PublishedItem, as it shouldn't contain persistence information.
I think it would be better to create a persistence decorator that contains this information which is created in the savePublishedItem(item) method. Then the persistence manager can deal with this extension of the PublishedItem directly without 'polluting' the real PublishedItem.
Improved PubSub persistence to retry DB write(s) rather than dropping published items on error/rollback. Also swapped out various timer threads to use the TaskEngine instead, and set remaining workers to be daemon threads.