Pubsub items (persistent) may be dropped in certain cases

Description

Pubsub items that are published to persistent nodes may be dropped in certain cases:

  • I/O error during flush on the write cache

  • Slow response from the cluster during shutdown

We would like to add some retry logic where appropriate and slim down the pubsub module shutdown process to minimize potential for dropped pubsub items.

Environment

None

Activity

Show:
Tom Evans
May 15, 2013, 12:41 AM

Improved pubsub persistence and ensured proper shutdown from launcher app

Tom Evans
May 13, 2013, 9:55 PM

OK - I took another stab at this and would appreciate another code review. Have a look at SVN 13651 and let me know.

Tom Evans
May 6, 2013, 4:54 PM

Hi Robin -

Thanks for reviewing the patch ... I agree with your recommendation and will take another swing to pull the retry counter out of the PublishedItem class sometime over the next few days.

Cheers,
Tom

Robin Collier
May 4, 2013, 12:10 PM

The retry count stuff really shouldn't be in the PublishedItem, as it shouldn't contain persistence information.

I think it would be better to create a persistence decorator that contains this information which is created in the savePublishedItem(item) method. Then the persistence manager can deal with this extension of the PublishedItem directly without 'polluting' the real PublishedItem.

Tom Evans
May 3, 2013, 5:33 PM

Improved PubSub persistence to retry DB write(s) rather than dropping published items on error/rollback. Also swapped out various timer threads to use the TaskEngine instead, and set remaining workers to be daemon threads.

Fixed
Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Tom Evans

Reporter

Tom Evans