We're updating the issue view to help you get more done. 

Pubsub items (persistent) may be dropped in certain cases

Description

Pubsub items that are published to persistent nodes may be dropped in certain cases:

  • I/O error during flush on the write cache

  • Slow response from the cluster during shutdown

We would like to add some retry logic where appropriate and slim down the pubsub module shutdown process to minimize potential for dropped pubsub items.

Environment

None

Acceptance Test - Entry

None

Activity

Show:
Tom Evans
May 3, 2013, 5:33 PM

Improved PubSub persistence to retry DB write(s) rather than dropping published items on error/rollback. Also swapped out various timer threads to use the TaskEngine instead, and set remaining workers to be daemon threads.

Robin Collier
May 4, 2013, 12:10 PM

The retry count stuff really shouldn't be in the PublishedItem, as it shouldn't contain persistence information.

I think it would be better to create a persistence decorator that contains this information which is created in the savePublishedItem(item) method. Then the persistence manager can deal with this extension of the PublishedItem directly without 'polluting' the real PublishedItem.

Tom Evans
May 6, 2013, 4:54 PM

Hi Robin -

Thanks for reviewing the patch ... I agree with your recommendation and will take another swing to pull the retry counter out of the PublishedItem class sometime over the next few days.

Cheers,
Tom

Tom Evans
May 13, 2013, 9:55 PM

OK - I took another stab at this and would appreciate another code review. Have a look at SVN 13651 and let me know.

Tom Evans
May 15, 2013, 12:41 AM

Improved pubsub persistence and ensured proper shutdown from launcher app

Assignee

Tom Evans

Reporter

Tom Evans

Labels

None

Expected Effort

None

Ignite Forum URL

None

Components

Fix versions

Affects versions

Priority

Major
Configure