The built in cron runner just runs at the end of the request, which is horrible. See #1554872: "Run cron every" setting (cron_safe_threshold) is not an interval but a Boolean for amusing things you can run into now due to this.
It was originally put in running from a 1px gif, but this was reverted later since that 1px gif meant a full Drupal bootstrap every time it was requested (including when that 1px gif was served from a cached page, so you'd hit Drupal even if you were serving pages from varnish).
However there's new tricks now, so we don't need to do either. If we have this, then it'll also be most of the necessary infrastructure for #1189464: Add an 'instant' queue runner.
This depends on #1447736: Adopt Guzzle library to replace drupal_http_request() since one of the criteria for finding a new http client is that it can handle non-blocking HTTP requests. So marking postponed, but leaving at major since this is a nasty gotcha on sites that don't want it, and crappy performance for those users who are unlucky enough to be the ones to trigger the cron run.
We could also fire each cron hook in its own separate PHP runner to keep memory usage down and prevent an error in one cron job killing the others, doesn't necessarily need to be done here though.
Comment | File | Size | Author |
---|---|---|---|
#81 | interdiff_80-81.txt | 903 bytes | ranjith_kumar_k_u |
#81 | 1599622-81.patch | 4.42 KB | ranjith_kumar_k_u |
#80 | reroll_diff_65-80.txt | 2.33 KB | ravi.shankar |
#80 | 1599622-80.patch | 4.43 KB | ravi.shankar |
#65 | interdiff-1599622.patch | 1.88 KB | dawehner |
Comments
Comment #2
gielfeldt CreditAttribution: gielfeldt commentedFor inspiration, Ultimate Cron already does this for D6+D7 using a non-blocking version of drupal_http_request() via Background Process.
Currently mikeytown2 and I are discussing merging the HTTP Parallel Request & Threading Library with Background Process to produce a cleaner separation and better API for respectively http requests and starting background processes.
I know improving the http request API in D8 is in the works, which among other things would probably include non-blocking request possibilities. Perhaps this task also warrants a library for running arbitrary code in the background?
Regarding the individual cron hooks, an extension of this API is being discussed here: #1442434: Do not port Elysia Cron, recommend Ultimate Cron for Drupal 8
EDIT: forgot the link to the background process discussion #1357652: Merge Projects: Replace Background Process's HTTP client with HTTPRL's.
Comment #3
BerdirGuzzle has been added to core, so this should now be possible...
Comment #4
BerdirHere we go. Seems to be working fine for me. The API is not very intuitive, but works.
Had to add the async plugin, copied the version definition (*) from guzzle itself, but I think this wrong, it seems to do a git clone?
Comment #5
David_Rothstein CreditAttribution: David_Rothstein commentedDoesn't this need to acquire a lock to avoid a stampede scenario? (If the site is busy and slow such that HTTP requests are having trouble connecting, then it seems like triggering a new HTTP request off of each one which itself will not connect could be a problem.)
I'm also wondering what happens here on servers that are configured to not be able to make HTTP requests to themselves (see #245990: HTTP request testing unreliable and may be undesirable). As I recall there are reports of 30-second delays in those scenarios, and I assume that would be even before any of this non-blocking stuff kicks in (though I'm not sure).
Comment #6
rbayliss CreditAttribution: rbayliss commentedRegarding the async plugin, I think you can add that plugin to the request rather than the client. So you'd have something like this:
Comment #7
effulgentsia CreditAttribution: effulgentsia commentedSee #1834594-28: Update dependencies (Symfony and Twig) follow up fixes for Composer. At some point, there may be a Guzzle release we can use instead, but at least as of a few weeks ago, there wasn't.
Comment #8
BerdirThanks for the reviews!
- Yes, adding the subscriber to the request is *much* easier, thanks for the pointer.
- Added a lock around the thing, agreed that this makes sense.
- The whole function should probably be moved into a separate listener (it is only invoked by RequestCloseSubscriber) that gets all the things injected. That is, however, not trivial as "the things" means CacheFactory, KeyValueFactory for state, Lock*, guzzle client and there's the do-not-execute-this-on-ajax-requests @todo in RequestCloseSubscriber. Separate issue/follow-up?
* which is broken as it is missing the shutdown function/releaseAll() part as a opposed to lock() and yes, there is an issue for it.
@effulgentsia: Thanks, that explains it. Looking at https://github.com/guzzle/guzzle/commits/master, it seems that the fixes is included in 3.0.6 and 3.0.7, so it should be save to change to that version in yet another separate issue?
Comment #9
YesCT CreditAttribution: YesCT commentedIs this the issue referred to in #8
#1825450: Use lock service in lock()
Follow up for
needs clarification/correction on what function is to be moved.
#1883896: Move cron invocation to separate listener
Follow up for
#1883904: Update to Guzzle 3.0.7
Comment #10
YesCT CreditAttribution: YesCT commented#8: cron-async-request-1599622-8.patch queued for re-testing.
Comment #12
YesCT CreditAttribution: YesCT commentedoops.
#1875818: Update composer.json to use the latest stable 3.0.x version of Guzzle is the one that should be the follow-up.
closed as a duplicate of that: #1883904: Update to Guzzle 3.0.7
Comment #13
BerdirRe-roll, no other chances.
Patch is a bit bigger, due to strange changes in the composer files. I updated my local composer version and now it ordered stuff differently it seems...
Comment #15
BerdirThat was a re-roll against an old 8.x version, this should be better.
Comment #16
Damien Tournoud CreditAttribution: Damien Tournoud commentedAs implemented, the locking is pointless.
->send()
is asynchronous, so this code path is always really fast.What would make sense is to try to acquire the cron lock, so that we don't try to reschedule a cron run while the cron is actually already running (we update
system.cron_last
at the end of the run).Comment #17
BerdirRight, that makes sense.
There is still a window between checking that and executing the HTTP request where we could try to start multiple cron multiple time. Wondering if we don't care (any bigger site should not use this anyway) or if we should add another state flag or something like that. The locking system unfortunately doesn't allow us to acquire a lock and keep it for the requested time even if the process is finished.
Comment #18
gielfeldt CreditAttribution: gielfeldt commentedIMO it all comes down to if we can reacquire locks across requests.
There's another lock api thread in progress: http://drupal.org/node/1225404
Comment #19
mikeytown2 CreditAttribution: mikeytown2 commentedFixing the locks would be nice. In HTTPRL I hack around core to make the magic happen. As a result of this I also have a cron job to clean up orphaned locks.
Comment #20
catchCross-request locking is also needed for #2205527: Move configuration import lock to lock.persistent service since a lock can not exist beyond a single request.
Comment #21
mikeytown2 CreditAttribution: mikeytown2 commentedWhat if we would allow for contrib to handle/override call_user_func_array() in _drupal_shutdown_function()
#2246659: Use drupal_register_shutdown_function() in a really unique way OR allow for alt call to call_user_func_array() in _drupal_shutdown_function()
That way a 3rd party module could be called instead of call_user_func_array and run the callback in a new process if configured to do so.
Comment #22
dawehnerAfaik this issue can be marked as solved?
KernelEvents::terminate()
so after the respond is sentCron.php
has a lock, @catch do we need more?Comment #23
catchWe have cross-request locking now.
@dawehner if KernelEvents::terminate() runs after the page is actually served I'd be very surprised.
Comment #24
BerdirIt kind of does, but I don't know what exactly that means to a normal browser.
It is a fact that we had weird test failures in the past, because there was code that did run in terminate() while the parent was already dropping tables, which has to mean that he he got the response.
I guess we need some sort of test to make this explicit, not sure if automating that is possible, or if we can do it manually, e.g. by putting a sleep(10) into a cron hook and then executing a request on the parent, with poormanscron enabled.
Comment #25
BerdirOk, reproduced like this:
1. add a sleep(10) to system_cron() or a similar place that runs then.
2. Delete last cron run timestamp: drush ev "\Drupal::state()->delete('system.cron_last');"
3. Request the frontpage with a browser: It waits 10s, network tab does not show any response before it is done
4. Repeat 2, request the frontpage with curl (make sure that page cache is disabled): The complete HTML output is immediately delivered, then the connection stays open for 10s before the bash input comes back.
Conclusion: This doesn't seem to work they way we intended it to at the moment.
However, based on http://codehackit.blogspot.ch/2011/07/how-to-kill-http-connection-and.html, adding a flush() in index.php after send() is called, has a very interesting effect:
a) The browser starts to build the page immediately
b) Network tab shows headers, but not yet the response
c) The browser is still considering to page as loading
d) After 10 seconds, loading is completed, and only then the javascript is triggered, and e.g. the toolbar pops up.
I'm not sure what's what we want...
To verify if cron was executed, run drush ev "var_dump(\Drupal::state()->get('system.cron_last', 0));"
Comment #26
mikeytown2 CreditAttribution: mikeytown2 commentedWhat about http://php.net/fastcgi-finish-request
If that call doesn't do it then the only way I see this working is by using guzzle to do an async request to self to run cron. http://guzzle.readthedocs.org/en/latest/clients.html#asynchronous-requests It's one of the main reasons I pushed for guzzle in D8 #1447736: Adopt Guzzle library to replace drupal_http_request(); this exact situation.
Comment #27
catchYes I've been assuming we'd use guzzle for this issue.
We tried things like running cron in a shutdown function and ob_flush() towards the end of Drupal 7 and didn't get anywhere reliable, Berdir's testing suggests not a lot has changed there.
Comment #28
mikeytown2 CreditAttribution: mikeytown2 commentedYou can try out this function: httprl_background_processing(). I did get it working with some browsers but not all; even before I added fastcgi_finish_request to it some time ago. In httprl it can cut the connection at both ends when executing a function in a new background process; it can also return data back thus allowing for multiple processes running and then combining the results back together. Both modes require non blocking (async) http requests.
Comment #29
Fabianx CreditAttribution: Fabianx commentedSo what about if we do:
a) We use an internal callback to cron via a http request (async) - if we can detect where Drupal lives. If that fails, fall back to b).
AND
b) We use what we have now and see why response does not arrive for cron().
I also had interesting effects with b) with the big pipe implementation.
Especially when the JS is in the footer, those browsers seem to react like this. However if just one JS is in the header, at least chrome did start executing the JS immediately.
My initial implementation of big pipe, was that JS is collected from all assets and added after all fragments had loaded - however then the browser never rendered the page as explained by berdir.
Comment #30
BerdirNote that I've already implemented a) a while ago, and it's pretty easy with guzzle, we just need an additional composer package and then it's just a few lines of code. And that seemed to work but was then postponed on locking issues...
We just went back to evaluate if it really is necessary, and it looks like the answer is yes, so we need a re-roll of that patch and then add the locking.
Comment #31
Wim LeersRelated reading:
http://php.net/manual/en/function.connection-status.php#43273 and http://stackoverflow.com/questions/138374/close-a-connection-early contain two interesting things that could offer alternatives to #25.
Comment #32
Wim Leers@catch also pointed to #566494-60: Running cron in a shutdown function breaks the site, which is related to #31.
Comment #33
klausiNote that the current AutomaticCron implementation on kernel.terminate works just fine on FastCGI with Apache + mod proxy fcgi + PHP-FPM for example. The reason is that the Symfony Response class calls fastcgi_finish_request() for us which closes the connection.
So this is only a problem on Apache + mod_php, don't use that on D8 prod sites.
Comment #34
catchSo I still think we should do this, and we added cross-requested locking for config batch so that bit should be doable.
Comment #35
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedThe possibility for Drupal to reach itself should be in core - regardless if we do it here or elsewhere.
If its in core, it is properly supported and contrib/ (e.g. ultimate_cron) can rely on it.
If it is not in core, we get the same problems with background_process, ultimate_cron, ... as in D7.
Comment #36
mikeytown2 CreditAttribution: mikeytown2 commentedHaving Drupal reach it's self isn't easy. I've had some success in the httprl module; see _httprl_build_drupal_root() for the hoops I have to jump through.
First step is to find the drupal root dir; this is manly a D6 compatibility thing as the D6 & D7 code are the same. Next is handling basic auth. Then there is a lot of logic to try and figure out what IP address can be used to connect to its self; 127.0.0.1 doesn't always work. http vs https. And finally clean urls.
If this was baked into core that would be very nice. In the latest dev version of httprl I'll try a couple of different variations for the hostname when doing a self connection as well as waiting X ms before closing an async connection; see httprl_install_try_different_settings_checker()
Comment #37
Wim LeersComment #38
Damien Tournoud CreditAttribution: Damien Tournoud at Centarro commentedI have to reiterate klausi: So this is only a problem on Apache + mod_php, don't use that on D8 prod sites.. FastCGI is the only decent production infrastructure for PHP.
Comment #39
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented#38: That is not true, streaming works nicely with Apache, too.
Maybe the response sending misses a flush() in the end?
I still think non-blocking HTTP request to self is better though ...
Comment #40
geerlingguy CreditAttribution: geerlingguy as a volunteer commented@Damien / @klausi - the problem with that statement is that, by and large, the majority of hosts running Apache are still using mod_php; and probably 90% or more of the guides for "build you own LAMP server" on the Internet (even recent once in DO and Linode's libraries) only mention how to use mod_php. I don't think we can ship with features that don't work on the majority of Apache installs...
Comment #41
dawehner#2507031: Optimize automatic cron subscriber by moving automatic cron to a module is something on top of it.
Comment #42
chx CreditAttribution: chx commented> We use an internal callback to cron via a http request (async) - if we can detect where Drupal lives. If that fails,
Didn't we learn already that fails more than not and usually horribly? I can do some issue digging but I remember attempting this and failing badly. (And we released it like that . Whether it was D6 or D7 I can't remember.)
Comment #44
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSince we now have cross-request locking and the Guzzle library that we're using (6.2.X) has built-in async capabilities, the patch from #15 becomes incredibly simple and straightforward.
Comment #45
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedAhem.. how about actually removing the inline cron execution :)
Comment #48
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedOops.
Comment #49
BerdirInteresting. But isn't doing it async and then doing something still going to wait on the response and release the lock going to keep the project/request active and therefore edoesn't really change much?
Comment #50
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@Berdir, according to http://docs.guzzlephp.org/en/latest/faq.html#can-guzzle-send-asynchronou... , if we're not calling
$promise->wait()
we're not keeping the current request active.Comment #52
BerdirHow async exactly works is above my head, but PHP does have limits, like no threads. It still has to operate within the current process/thread eventually. So while we can do other things until it is done, we can't actually end the process.
Which brings us back to the fun topic of how sending a respone actually works, which is different in mod_php and cgi-based approaches for example.
But maybe I'm all wrong :)
To try, add a sleep(60) or so to system_cron() and trigger it. does the request finish (completely) or does it continue to run? See #24 and following comments.
Comment #53
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commented#52 Usually a PHP process is not aborted however, even if the "browser" starting the request goes away.
So yes it needs some testing, but httprl e.g. has used this method since quite some time successfully, so there is some "async" knowledge already.
Comment #54
dawehnerThe way how guzzle interpretates async is a bit different to something one might be used to from node JS.
This feature allows you to send multiple requests at the same time, and then, when stuff is executed, aka. waited, all curls are executed in parallel, but we still wait on all of them.
By reading the guzzle code I don't directly see where they could execute the request, when no wait is called, but the objects are destroyed.
Comment #55
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@dawehner is right, if we don't call
wait()
on the promise, the request is never executed. And if we call it, the script will wait for it to finish, so it's not "non-blocking" anymore.Even the library's author says that Guzzle is not meant for "fire and forget" type of requests: https://github.com/guzzle/guzzle/issues/1429#issuecomment-197152914
Are there any other options for implementing this? I couldn't really find any decent solution on the interwebs, maybe @mikeytown2's httprl library could help?
Comment #56
catchI think we should look at httprl for this, although I've had trouble with that and advagg so I think we should also consider just moving this to an AJAX request. i.e. have a library that we conditionally add when cron needs running, that executes the AJAX request, when cron doesn't need running don't add it.
The original issue ruled this out (see #566494: Running cron in a shutdown function breaks the site and #331611: Add a poormanscron-like feature to core - because they wanted hook_exit() on cached pages to also trigger this, but for me it's a completely reasonable restriction that this only runs on auth/cache miss pages with js enabled. The page cache used to be cleared on cron runs when the feature was originally added, that's no longer the case.
Comment #57
BerdirNot running on cached pages seems fine to me as well*, but not sure if we can make that change in a minor release? But yes, if you need reliable cron, don't use poormanscron. It's that simple :)
But what I actually meant in #49 is that persistent locking and fire and forget rule each other out *on principle*. We can't have both. doesn't matter what we actually use :)
* I guess one problem there is that on simple sites where nobody logs in or creates content and has everything cached, cron might not be running over longer periods of time?
Comment #58
dawehnerBut is that for itself a problem? Which tasks would require to run cron, when there is no user changing anything. One example might be aggregator, but its is the only thing I could think of.
Comment #59
Berdirupdate.module for example, that wouldn't check for new updates then for example and not send out mails.
Comment #60
dawehnerJust a really rough idea. Could we expose the last executed cron run somehow as JS setting or so and then do the HTTP request, just if its time to do so?
This unix timestamp could be rendered into the page cache using some token, as the usual levels won't work. Everyone who is running on a proper reverse proxy shouldn't use the automatic_cron module anyway.
Comment #61
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedAJAX has the trouble that many users could at the same time see:
- Oh its time to run cron, lets flood Drupal with 1000s of requests ;). (though its possible to add some randomness to that to limit the effect of stampeding)
Also ultimatecron used the fire and forget method quite successfully to run things via background_process in the background.
I am not seeing why we cannot try to send off a request to Drupal and if it succeeds (background process was created) we stop the request and if not we do the current cron() call?
Could be as simple as:
Comment #62
twistor CreditAttribution: twistor as a volunteer commentedIt's simple enough to open an http connection with fsockopen() or whatever. Then, it's just a matter of closing the connection without waiting on a response. On cron's side do the locking and ignore_user_abort().
I thought the difficult part was that it's unreliable to make HTTP requests to the local host.
Comment #63
catchThat's expected to run once per week though, how many sites don't get a single uncached request in a week?
Yes that's one reason for #965078: HTTP request checking is unreliable and should be removed in favor of watchdog() calls and the reason I brought up AJAX again.
We could use the lock system there though - acquire a 2 minute lock on the first request that triggers the AJAX. If the AJAX requests gets fired and runs cron, the timestamp will get updated in that time. If not you get another chance 2 minutes later when the lock expires.
Comment #65
dawehnerHere is some experiment with implementing the async HTTP request. To be honest I'm not sure whether the localhost limitation is a problem.
If you are running on a weird environment not being able to use automated_cron is probably the least of your problems.
Comment #67
mikeytown2 CreditAttribution: mikeytown2 commentedLooping back and connecting to your self can be surprisingly hard over a wide range of server configurations; good example is SSL terminated higher up, this will sometimes result in an redirect loop, or with the web server not able to do SSL as it's not expecting it.
Comment #68
dawehner@mikeytown2
In other words, you propose using JS to do the execution?
Comment #79
bradjones1Dropping in here to point at the question posed at #3295790-7: Post-response task running (destructable services) are actually blocking; add test coverage and warn for common misconfiguration of whether we need to do a
flush()
afterResponse::send()
, more generally.Comment #80
ravi.shankar CreditAttribution: ravi.shankar at OpenSense Labs commentedAdded reroll of patch #65 on Drupal 9.5.x.
Comment #81
ranjith_kumar_k_u CreditAttribution: ranjith_kumar_k_u at Zyxware Technologies commented