Closed (fixed)
Project:
Search API
Version:
7.x-1.x-dev
Component:
Framework
Priority:
Minor
Category:
Task
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
19 Oct 2010 at 20:58 UTC
Updated:
3 Jan 2014 at 02:04 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
drunken monkeyWow, the oldest open issue in the Search API, finally about to be resolved – well, at least half of it. Patch coming soon.
See #1225620: Use the Batch API for the "Index now" functionality for the other half, which might well take another nine months …
Comment #2
drunken monkeyPatch here, please test/review.
One thing I couldn't really pin down is: When do the items in the queue get processed? I thought this was controlled by the "time" setting, and items would get processed automatically on page requests? But after a bit of testing, this doesn't seem to be the case. When running cron, the items would be inserted in the queue. Then, immediately some of the items would be processed. And then the other items would just lie in the queue, unprocessed. Needless to say, this has to be cleared up before this can be committed.
Does anyone know more about this? Documentation for this is kind of useless, once again.
Comment #3
fago>And then the other items would just lie in the queue, unprocessed.
They are processed on the next cron runs.
>+ 'time' => 10,
Why only 10seconds? Any reason to not stay with the default?
Apart from that patch looks good to me, though I've not tested it yet.
Comment #4
drunken monkeyOh, OK, seems like I misunderstood the concept a bit. No wonder, without any real documentation, that doesn't assume you already know what they're talking about …
Corrected patch coming soon (this one will put items several times into the queue).
Comment #5
drunken monkeyHm, still thinking about how to best correct this …
1) Store items which are already in the queue in a variable, or in the database.
2) Mark items as indexed when inserting them into the queue, and just mark them as dirty again if they couldn't be indexed.
3) Make sure the items still need to be indexed in the worker callback.
4) Don't include items in the task, but only the batch size, and retrieve the concrete items in the worker callback.
I first went with 2), but then realized that that would essentially make the progress display on the "Status" tab useless, as it would hit 100% as soon as the first cron run starts, even though it could still take weeks until the items are indexed. 1) I really don't know about, and 3) would necessitate a new API function for checking if specific items need to be indexed – something we didn't need before, which is especially bad due to #1064884: Add support for indexing non-entities.
At the moment, 4) would sound best to me – it doesn't seem to have any real disadvantages, other than that there might be several no-op tasks lying in the queue (which, I guess, wouldn't do any harm). It just seems to me to be a bit against the spirit of cron queues. Any opinions there?
Comment #6
drunken monkeyOK, this one implements 4), and seems to work fine.
Comment #7
fagouhm, no 4) doesn't work for a queue as two parallel workers would process the same items - ouch.
To keep the status information up2date I'd propose adding a new "queued" flag to the table, then only remove the entries as soon as the queue was successful + only add entries if there is no entry at all. That should do it, or?
Comment #8
drunken monkeyAs long as cron runs don't overlap, this should never happen, I think. The workers aren't called in parallel (how should they, in PHP?) by
drupal_cron_run(). Also, even if this should be the case in some rare instances, there's really little harm done, only a slight performance loss.Your suggestion would be what I listed as 1). While possible, I guess this would also need changes to the data source controller in #1064884: Add support for indexing non-entities, which I'd like to avoid, if possible. And when we'd have to change it, I'd rather go with 3), I think (as it at least wouldn't need another field).
Comment #9
fago>As long as cron runs don't overlap, this should never happen, I think.
That's the main plus of using the queue, items may be safely processed parallel. Look at drupal_cron_run(), it releases the lock before it is working through the queue.
So cron should run shortly, then long-running operations can happen in the queue. If in the meantime the next cron run starts, everything is fine...
Comment #10
drunken monkeyIt's not long-running, it only runs 15 seconds. I don't think parallel processing is the aim of cron queues, but rather that the operations can safely be interrupted and later be continued, without the danger of a timeout.
I don't really see this happening in any way unless a) some module has sometimes very long-running items in the queue (and the PHP timeout is set to an absurdely high value) or b) the cron interval is less than a minute. Oh, or c), the user sets the cron batch size to a very large value (or -1), but that's a configuration error then – and the cron interval would still have to be less than the PHP timeout.
Comment #11
drunken monkeyPatch adapted to latest changes in HEAD.
Comment #12
drunken monkeyOops, missed a merge conflict …
Comment #13
fagoPoint is a) b) and c) may as well happen + d) its 15 seconds per queue, not total.
and the docs of hook_cron_queue_info() say...
Comment #14
drunken monkeyComment #15
fagoPatch looks good, though I've not tested it yet.
Another remark: What happens if an entity is deleted while it is queued? Probably indexing fails, but is it going to be wrongly re-queued too?
Comment #16
drunken monkeyDeleted items are deleted from the indexing table, and therefore can't be marked as queued/changed/indexed anymore. The attempt is made in the worker callback, but it shouldn't have any effect.
Comment #17
drunken monkeyI've had another idea: a hidden variable for the runtime of the queue in a cron run. Patch upcoming.
Comment #18
fagoops, sry I forgot to give feedback! It works fine for me, a hidden setting (variable) for the maximum queue time would be good to have.
Comment #19
fago:P
Comment #20
drunken monkeyComment #21
fagoLooks good to me.
Comment #22
drunken monkeyYou and your sloppy reviews …
But after I fixed the patch, I think it now works. Committed.