So here's an interesting issue and I believe this is a bug in clearing the Drupal queue.

http://www.momblognetwork.com/

We have about 5333 blogs that are being processed. This is backed up by the node table and more importantly, schedule table:

mysql> select count(*) from feeds_schedule;                                     
+----------+
| count(*) |
+----------+
|     5333 | 
+----------+
1 row in set (0.00 sec)

When using Drupal queue, Feeds appears to take each entry from the schedule table, adds to the queue table for processing.

You should never have more items in Drupal queue than Feeds schedule, right? [assuming using no other modules with drupal_queue]

mysql> select count(*) from queue;                                              
+----------+
| count(*) |
+----------+
|    58994 | 
+----------+
1 row in set (0.00 sec)

mysql> select distinct(name) from queue;                                        
+-------------+
| name        |
+-------------+
| feeds_queue | 
+-------------+
1 row in set (0.00 sec)

Seems like something is off here :)

It seems old items are never being deleted out of the queue.

I believe the error is within protected function unschedule($job), it should have this function:

The full queue item returned by DrupalQueueInterface::createItem() needs to be passed to DrupalQueueInterface::deleteItem() once processing is completed.

Comments

Scott Reynolds’s picture

subscribe.

alex_b’s picture

Not sure what's going on.

There can be a higher number of items in a queue as a schedule (e. g. feeds_schedule) can easily spawn a number of queued jobs that is higher than the number of entries in a schedule.

That said, the number of items in the queue you're posting doesn't look healthy :-/ The system is backing up badly, the question is why.

DrupalQueueInterface ::deleteItem() should be called fine from drupal_queue_cron_run() - maybe something is causing the worker in feeds to crap out and never return, thus keeping scheduled items forever in the queue... I'd start debugging in drupal_queue_cron_run().

OT:

Regardless of this issue, with 5333 feeds you're having a hard time to get to a decent performance with polling, right? Do you run into any wild spikes as the queue is trying to process parallelly? Did you try using PubSubHubbub with superfeedr as dedicated hub? You could also look at how many of the 5k feeds support PuSH natively - note: supporting a scenario where some feeds are PuSHed, others are polled, will require #721428: Make scheduler next scheduled time based to be addressed.

m3avrck’s picture

Polling has been working quite well for 5000 feeds. We have not noticed any spikes (using an EC2 M1-Large instance for Web and Master/Slave DBs) and have not had any reported issues of feeds not updating.

Using the drupal_cron indeed was causing far too many issues with 5000 feeds.

Next up is to switch to PubSub which should indeed be the fastest/smoothest.

alex_b’s picture

m3avrck, Scott Reynolds: is this resolved?

alex_b’s picture

Status: Active » Postponed (maintainer needs more info)
David Goode’s picture

Status: Postponed (maintainer needs more info) » Closed (fixed)