Pubsub items (persistent) may be dropped in certain cases

Description

Pubsub items that are published to persistent nodes may be dropped in certain cases:

  • I/O error during flush on the write cache

  • Slow response from the cluster during shutdown

We would like to add some retry logic where appropriate and slim down the pubsub module shutdown process to minimize potential for dropped pubsub items.

Environment

None

Activity

Show:

Tom Evans May 15, 2013 at 12:41 AM

Improved pubsub persistence and ensured proper shutdown from launcher app

Tom Evans May 13, 2013 at 9:55 PM

OK - I took another stab at this and would appreciate another code review. Have a look at SVN 13651 and let me know.

Tom Evans May 6, 2013 at 4:54 PM

Hi Robin -

Thanks for reviewing the patch ... I agree with your recommendation and will take another swing to pull the retry counter out of the PublishedItem class sometime over the next few days.

Cheers,
Tom

Robin Collier May 4, 2013 at 12:10 PM

The retry count stuff really shouldn't be in the PublishedItem, as it shouldn't contain persistence information.

I think it would be better to create a persistence decorator that contains this information which is created in the savePublishedItem(item) method. Then the persistence manager can deal with this extension of the PublishedItem directly without 'polluting' the real PublishedItem.

Tom Evans May 3, 2013 at 5:33 PM

Improved PubSub persistence to retry DB write(s) rather than dropping published items on error/rollback. Also swapped out various timer threads to use the TaskEngine instead, and set remaining workers to be daemon threads.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created May 2, 2013 at 12:27 AM
Updated May 15, 2013 at 12:41 AM
Resolved May 15, 2013 at 12:41 AM

Flag notifications