When I run drush cc all I get more than 60 messages of the following kind:
cache_menu(*) was not cleared. APC cli uses a different memory storage then the webserver. For more info see: http://drupal.org/node/1278232
This is rather annoying... especially because there is a lot of commands which automatically execute a cache clear at some point. Would it be possible to reduce those ~60 messages to one for each invocation of drush?
Also, would it be possible to actually "fix" the cache clearing instead? I'm no expert, but a quick google search revealed a possible solution on stackoverflow. What do the experts think?
For reference, the message was introduced in #1278232: Provide documentation on clearing the cache when running cache clear in CLI (Drush) mode..
Comment | File | Size | Author |
---|---|---|---|
#150 | 123-248.interdiff.txt | 542 bytes | Mixologic |
#148 | 1565716_drush_apc_clear.patch | 4.98 KB | R.Muilwijk |
#123 | 120-123interdiff.txt | 431 bytes | Mixologic |
#123 | 1565716-apc-xmlrpc-cc-123.patch | 4.8 KB | Mixologic |
#122 | 1565716-apc-extra-bootstrap-cc-122.patch | 4.67 KB | torotil |
Comments
Comment #1
regilero CreditAttribution: regilero commentedOne problem, the trick used on Stack Overflow link means having a script available on localhost IP, it implies things which may be obvious (but they aren't) like an http server listening on localhost, a php process connected on that webserver, for this IP (and it could be an external PHP process if it's php-fpm). So the domain name & IP to use shoudl be settings (command line options?) and some documentation on webserver requirements (name of the script, access restriction on that script) should be well documented. Not a simple thing.
In case of multi-server installations Apc caches have one specificity, they 're server specific. So clearing the cache with drush would only clear the APC cache on server. We could use the fix (if any) for that very annoying message to fix this problem in the same time.
For example we could maybe set a cache validity timestamp when running the apc flush from php-cli (drush). And we could reuse that validity timestamp when reading from cache to return a cache miss for records older than this setting. This way every server retrieving outdated data would know it's outdated. The problem is that we cannot use a variable if cache_variable is stored on APC... This could only be stored on a shared storage (database? session!? other).
If there's any genius out there with a nice idea, please save us from this very annoying apc-cli limitation.
Comment #2
bforchhammer CreditAttribution: bforchhammer commentedWell, what I was thinking after reading the entry on stackoverflow, is the following: if cache clear works in the web-interface (on admin/config/performance), then it should also be possible to call that page via wget or curl from the command line and therefore have the right process take care of executing the cache clear.
For this to work, drush of course needs to know which URL to call (e.g. http://example.com/apc/cc?bin=...); and instead of submitting the form on admin/config/performance, I'd imagine that it would be better to add a rather simplistic custom page handler which avoids page rendering logic etc.
settings.php
. Variable should be set to public-facing base-URL of site. (e.g. http://example.com).$base_url
, "localhost", ..?) or simply fall-back to the displaying an error message.apc/clear
), which clears the cache for a specified cache bin; allow parameters for cache-bins, cids etc., make it inaccessible for regular users by checking for anapc_key
parameter (re-use cron_key?).$conf[apc_cc_baseurl]/apc/clear?bin=cache_menu&apc_key=12345
I'm not sure what you mean by "multi-server installations", but if every server has a public facing URL we could also allow
apc_cc_baseurl
to be an array and issue multiple wget/curl requests...Comment #3
R.Muilwijk CreditAttribution: R.Muilwijk commentedSomething that might be possible is a solution like this:
- Introduce a $conf['apc_stores'] = array('webnode1' => 'http://webnode1.site.com', 'webnode2' => 'http://webnode2.site.com');
- check or apc module is enabled. If it's enabled generate a string for the cache clear command and crypt it somehow with drupal hash. Foreach apc_stores call a callback defined in apc module like http://webnode1.site.com/apc_clear?command=$encrypted_string.
- Module callback would decrypt using drupal hash and run the actual cache clear on webnode's APC user cache.
Comment #4
R.Muilwijk CreditAttribution: R.Muilwijk commented@bforchhammer, yes the Cache clear all works. But this is only on the CURRENT webserver node. When using multiple webserver nodes behind a load balancer you would have to run a Cache clear all on ALL nodes!
Comment #5
regilero CreditAttribution: regilero commentedI like the idea of having an array of front server to reach, requesting for apc cache flush. +1
Comment #6
MrHaroldA CreditAttribution: MrHaroldA commentedReverting a couple of features using drush spawns hundreds of APC warnings. Really annoying ...
Comment #7
regilero CreditAttribution: regilero commented@bforchhammer:
When you have such configuration:
You have 3 APC caches, one on each front server. (More if each front server hanlde several domains, but that's another problem).
Using
drush cc all
on front server 1 would only clear the APC cache on front server 1, not on the 2 others. That's why comment #3 array solution would be nice also.Comment #8
geerlingguy CreditAttribution: geerlingguy commentedI have numerous drush actions triggered via cron, and each one gets an email kicked back with these errors displayed in the email. My system had about 42000 emails when I just checked after about a week (there are a lot of cron jobs that interact with drush).
Maybe you could use a variable to determine whether or not the error messages should be displayed?
So, like:
(Note the addition of 'apc_show_cli_error' in there). If this approach is accepted, I'll gladly write a patch really quick. I just commented out the drush_log() call on my site.
Comment #9
MrHaroldA CreditAttribution: MrHaroldA commented@geerlingguy, I would rather have Drupal clearing the APC cache on the first webbased request than hiding the message, Clearing cache with APC enabled can't be done through the CLI now, which kinda sucks.
Comment #10
regilero CreditAttribution: regilero commentedTaking #8 in account I think this bug needs som higher priority.
Waiting for a working cache cjear we could at least try to avoid messages floods which can have some dangerous side effects on working installations. Maybe storing a timestamp when emitting this drush_log message and avoiding more than one message per minute at least.
Comment #11
pounardMy favorite solution is the #1 proposed validity timestamp, this is also appliable to all other backends and a ensures that it will work with the multiple frontends configuration. It is today the approach taken for checking tags validity in the Drupal 8 database cache backend, instead of using a timestamp they use a more complex tag checksum, but it's still a good approach IMO.
I can see only one problem with it, if some cache keys are never being read and has no expiration time they will stall indefinitely in the APC shared memory.
I can see an alternative based upon this solution, but that will work with APC only:
1) Set the lastest flush timestamp for each bin into a variable
2) When instanciating the APC backend:
2.1) Load a special APC user key such as "$host_latest_flush_$bin", if not present set it to now
2.2) If this key exists and is prior to the variable, flush APC locally normally and reset this user key to now
Comment #12
R.Muilwijk CreditAttribution: R.Muilwijk commentedOne of the main reasons for using APC cache is because it's the closest cache to PHP and faster then for example Memcache. One of the downsides is that the CLI shared memory is not similar to the webservers shared memory. When using multiple webnodes the shared memory of the different webnodes are not even similar. When using APC it's important you understand this.
So there are two types of cache clearing, expiration and because of an action. APC already handles expiration, double implementing this is a waste of processing time so I'm not willing to make read's slower by implementing a custom system for this.
The second type, cache clearing because of an action. By hitting the 'Cache clear all' button, using Drush, saving a node or any other action is harder. This should invalidate the cache. The questions is what is a proper way without making read's slower to invalidate this for all the different types of shared memory instances. At the moment I still think #3 is our best bet.
Comment #13
pounardActually both #1 and #11 would work knowing all that. The only detail I forgot for #11 is that variables are cached too. Grr.
EDIT: Both are based upon the comparison of two timestamps, one in the local cache (weither is CLI or web doesn't matter) knowing that the shared variable (not in cache) would be the same for all contextes of the same site, while the local variable (in APC user space) would be unique for each context and would allow to know using a simple comparison when the local cache is outdated or not.
Comment #14
R.Muilwijk CreditAttribution: R.Muilwijk commentedCould you show me the concept in some dummy code (doesn't needs to actually work)?
Comment #15
pounardYes of course, but you'll have to wait until I get home I don't have the environment for this @work.
Regarding #2, it would work, but I'm afraid that on some environments it might fail for any reason (for example, php-fpm or CGI having different DNS resolution than the proxy in front, or any other possible reason): implementing it would be nice but accompagnied with a reasonable fallback.
Regarding #2, I think the same environmental restriction would apply.
I'm against just ignoring the drush warnings, they all reveal pragmatic errors and should not be hidden, but solved instead.
Further, I think the memcache module might have some pieces of the solution for us.
Comment #16
regilero CreditAttribution: regilero commented@pounard, one problem with #1 && #11 is the multiple server configuration as shown in #7.
Invalidaing apc cache content is a task that each server will have to do. So the instruction (wether it's a variable or something else) will have to be available for all server. Even if some of them have already performed the wipeout.
Another problem: If the variables are stored in apc you have no way to alter a variable from cli and have the web part knowing it.
You need something which is always checked when apc cache is used and which may return something until every backend is known to have performed the requested action. This will have som speed impact (maybe small, maybe not) and could be related to file or database access.
Solution #3 is not a bad one I think. Even with a php-fpm multi-front install we should be able to list all front server with a custom url, allowing direct drush HTTP requests on each server. it may have sone impact one firwalls configurations, module documentation and local name resolution but at least it would work.
Comment #17
pounardYes you're right, variables are being cached, and if they are in APC we're doomed. It still can work if the 'cache_bootstrap' table is not in APC, but I don't think this will happen a lot...
Comment #18
seandunaway CreditAttribution: seandunaway commentedYes this is a bit annoying and would love a workaround.
I've been hiding with this (the cli check) in settings.php:
But of course there is still the the inability to clear the APC cache from the command line.
I never had this problem with CacheRouter in D6 with APC user cache. How did they manage it?
Comment #19
seandunaway CreditAttribution: seandunaway commentedmaybe we can write a drush command to clear the apc cache using curl or some way via the server to access the other memory segment?
Comment #20
sunTagging. (I realize this is strange issue tag for this project, but I'm trying to gather/collect interesting issues+approaches across all d.o projects)
Comment #21
Anonymous (not verified) CreditAttribution: Anonymous commentedFrom my reading of CacheRouter's code, it doesn't look like it would also work with the use case of multiple backend webservers.
I'll be glad to work on and test a fix for this. I'm leaning toward an array of IP's for the backend and using cURL. The idea is that we can leverage cURL to override the DNS resolution. (From what I can tell, that won't work with drupal_http_request, because it forces the 'host' header to come from the url given.)
I could probably have a patch implementing this method rolled Sunday. Users that only allow PHP to execute specific files would need to be sure to add the clearing PHP file to their filters as a file that is allowed to be run through PHP.
At least that's my thinking.
Comment #22
Anonymous (not verified) CreditAttribution: Anonymous commentedWhat this does is check for an array of server addresses. These addresses can be set by simply adding $conf['apc_servers'][] = 'your.server.ip.address'; to your settings.php. It loops over these addresses and uses cURL to make a request to each of your defined backends for the clear.php file that is included. If you don't have a backend set, it defaults to 127.0.0.1. You can use hostnames if you wish.
Why did I not use drupal_http_request()? Simple. It forces the Host header to be the same as the site that you are connecting to, which would not work in this case. A simple patch to core can fix it, but that needs to be fixed in 8 before it can be backported to 7, and the current direction here seems somewhat unclear - #1664784: drupal_http_request() has problems, so allow it to be overridden. It's probably pretty likely anyway that if you're using APC, you probably have the access or a host that's willing to run cURL.
Comment #23
pounardI have multiple questions:
Comment #24
MrHaroldA CreditAttribution: MrHaroldA commentedIt also looks like clear.php isn't protected by anything (like a cron key or something) so everybody can flush all APC caches for an entire server ...
Comment #25
Anonymous (not verified) CreditAttribution: Anonymous commenteddrupal_http_request() would have been my first choice, but it resets the 'Host' header whether you set it or not. I personally patch core to fix that in some dev environments when I don't want to rely on /etc/hosts or can't rely on DNS when testing APIs. So until that is actually fixed in core, we are stuck using cURL, fsocketopen, or stream sockets. The main reasons I used cURL are that I'm more comfortable in cURL and find it more readable (personal preference).
Re: menu router item - We can do that instead. This was just a 20 minute quickie to prove that the concept would work.
Re: Access checks - We could have it check for a key that is set similar to the cron key. Or we could even just use the cron key. I wasn't sure what way others would want to go there. Once again, proof of concept that we can clear APC through Drush.
Re: Clearing all of APC - once again, just an attempt to prove the concept of clearing APC through cache would work. I admit I didn't look deep to see how the module normally cleared APC. I was hoping to spark some more serious discussion toward a resolution rather than just continuing in circles.
I can make some more time later this week (probably Wednesday or Thursday) to work on it if no one else has the time. I'm staying pretty swamped, and this doesn't really seem too difficult to resolve.
Comment #26
pounardOh OK, sure this is a nice PoC.
Whether we choose a router item or a PHP file, we must add access control: the cron key like security might be enough thought we must ensure it might not be caught by external eyes: trying to enforce HTTPS if present (disabled per default but configurable) could be a good idea.
Why I prefer the menu item router is because the PHP file can be moved out, using a router item will ensure that Drupal is properly bootstrapped instead of having us to bootstrap it in our own clear.php file: this will be less error prone.
Comment #27
sunHowever, you cannot bootstrap to a level/phase in which the router would work, because that involves cache lookups on various layers already, and the entire purpose of this is to flush potentially stale caches.
Thus, if you want to make it bullet-proof, I'm fairly sure you'll need a custom bootstrap that circumvents all caches.
In light of that, going with a apc_clear.php, potentially even including the prefix to clear, as well as a public/private key hash/token validation sounds more reasonable.
Lastly, just a question, feel free to ignore me: If this is about Drush only, then why not use Drush's built-in remote site commands? (which operate over SSH, if I'm not mistaken -- didn't use them myself yet)
Comment #28
pounardThis is not only for Drush, the problem exists when you hit the "clear cache" button via UI too. This must be Drush agnostic.
The router item is prone to stalled cache, but this specific router item is not supposed to change, so even considering that we might bootstrap an outdated Drupal, it'd still work.
I agree that having a plain php file instead could be better, but this must be in the Drupal root, side by side with index.php, else the session handling (or potentially other stuff) won't work because some core internals are deeply mis-using PHP superglobals and rely on the SCRIPT_PATH to setup some various hashes at various places (session_id() and session_name() are two of them).
So, until we don't have that fixed, we cannot allow ourself to attempt a Drupal bootstrap using a custom PHP file.
Comment #29
MrHaroldA CreditAttribution: MrHaroldA commented@Brian: I took some time to familiarize myself a bit with the APC-module and it's workings so if you need any assistance please let me know!
Comment #30
Sylvain_G CreditAttribution: Sylvain_G commentedJust run into this issue with a
LB
- Web 1
- Web 2
Configuration
Anoying message and some side effects
AFIK APC is the closest to PHP but not a Service, what you are trying to acheave is a bypass a hard limitation of APC
I'm considering moving my cache to a memcache instance on the backend Mysql server
Comment #31
MrHaroldA CreditAttribution: MrHaroldA commentedHere's a proof of concept and work in progress patch to enable Drush to clear caches on your web node(s).
It's full of @todo's, missing error checks, plausible security holes(!) ... but it seems to work! ;)
settings.php example, note the 'apc_nodes' entry:
Please have a look at it and tell me if I'm heading in the right direction.
Comment #32
MrHaroldA CreditAttribution: MrHaroldA commentedOne issue I already found: when the site is in 'maintenance mode', the site will report a "503 Service unavailable" when trying to clear the cache.
Comment #33
Jorrit CreditAttribution: Jorrit commentedI think the functionality to loop over nodes should also be invoked when clearing the cache from the site itself. If I clear an APC cache on server 1, it would be great if the APC caches on the other servers would be cleared as well.
Comment #34
MrHaroldA CreditAttribution: MrHaroldA commented@Jorrit: good idea!
I will also make the setting configurable so that you can capture it in a feature.
Comment #35
MrHaroldA CreditAttribution: MrHaroldA commentedxmlrpc keeps working, even when the site is in maintenance mode... Shall I rewrite my patch to use that?
Comment #36
vinmassaro CreditAttribution: vinmassaro commentedAny progress on this? It is a blocker for us to use the APC module in our university installation when these messages are output by Drush. Thanks.
Comment #37
MrHaroldA CreditAttribution: MrHaroldA commented@vinmassaro: I'm waiting for the maintainer to chip in on the XMLRPC -vs- menu callback issue. Once that's settled, a patch can be made pretty quick, maybe even next week ...
Comment #38
vinmassaro CreditAttribution: vinmassaro commented@MrHaroldA: Ok, hope R.Muilwijk can chime in so this can move forward and a patch can be tested. Thanks!
Comment #39
likewhoa CreditAttribution: likewhoa commented@MrHaroldA just do it and reviews/revisions/RTBC will come ;)
Comment #40
MrHaroldA CreditAttribution: MrHaroldA commented@likewhoa ;)
Comment #41
MrHaroldA CreditAttribution: MrHaroldA commentedHere's a patch to flush APC caches with XMLRPC. Configuration is the same as in #31 and even with the same @todo's, warnings and errors!
Where Art Thou, R.Muilwijk?!? ;)
(keeping it at 'needs work')
Comment #42
Anonymous (not verified) CreditAttribution: Anonymous commentedJust looking at the code (I haven't actually run it yet... staying extremely busy recently), this requires Drupal to answer for each "APC node" by hostname, correct?
Comment #43
pounardI'm a bit suprised to see code into the APC class destructor, could this be a Drupal shutdown handler instead leaving the cache backend class untouched/untainted?
Comment #44
MrHaroldA CreditAttribution: MrHaroldA commented@pounard: I've added the code in the APC class destructor to gather all bins that should be flushed and only perform one request to each web node to flush them.
Comment #45
pounardI still think this is doable (and generalizable to other backends than APC) by doing it in a more generic fashion outside of the cache object. I'll try to do a PoC later.
Comment #46
MrHaroldA CreditAttribution: MrHaroldA commented@pounard: Thanks! Looking forward to it!
Comment #47
vinmassaro CreditAttribution: vinmassaro commented@pounard: thanks, looking forward to testing a patch.
Comment #48
pounardI'm working on it, coming soon! :)
If I understand well there is two main problems:
1) Clearing APC cache locally (APC only)
2) Sync. all web nodes for "local" caches (Potentially all backends
I did in a separate module the 2) using the XMLRPC patch above. 1) works with this patch too (I guess).
Comment #49
MrHaroldA CreditAttribution: MrHaroldA commented@pounard: the main issues was that you can't clear APC cache from the commandline, the webserver needs to clear it's own cache. If the site is served by multiple web nodes, the cache should also be cleared on those servers.
Comment #50
crea CreditAttribution: crea commentedUsing APC on multiple web nodes is a very bad idea (because of consistency issues caused by high concurrency). Supporting that is supporting bad design. Don't do that.
Comment #51
pounardWhile I agree considering we continue in remote hit path, it could be safe, nice and fast using a different solution for invalidating caches than remote hit, a simple solution such as storing a timestamp in a shared storage that serves the purpose of invalidating everything created before would be very efficient. Problem is that current Drupal doesn't really allows this, it would need to be implemented outside of the cache backend but in the cache API underneath.
Comment #52
Sylvain_G CreditAttribution: Sylvain_G commented#50 is right, clearing other web node cache can have some side effects, especialy regarding the Form Build Id
Comment #53
regilero CreditAttribution: regilero commented#50 & #51, of course using apc as a cache backend when using several servers cannot and musn't be done for all caches. But some very early drupal bootstrap caches can fit quite well in apc cache, cache_class_cache or cache_class_cache_bootstrap.
Comment #54
crea CreditAttribution: crea commented#53
If you still need standalone cache storage there's no point in making the whole configuration more complex just for some marginal speed gains. Simplicity wins any time.
Comment #55
regilero CreditAttribution: regilero commented#54. I wish you were right about simplicity. But hey, this is Drupal. There is no point in making separate cache bins handling if it is not to aplly separate cache policies on theses bins. And of course having a multifront drupal installation means you are running a quite complex instance which requires some complex fine tunning. And in all the small gains you can get with Drupal fine tuning; targeting local files caches on APC doesn't seems like a bad idea.
Of course I would prefer a more simple CMS which doesn't try to replace APC job and does not make IO directly on the fs, and doesn't try to manage a fiesystem cache map, and use php autoloaders to find files, etc. But simplicity is not a Drupal word.
Comment #56
pounardStill working on it, but I don't have much time @work to polish my work. I will post a patch as soon as I can, can be in a few days or maybe next week.
Comment #57
crea CreditAttribution: crea commentedSeparate cache bins feature is mostly useful for separating things like "form cache" which is not a cache, actually. Otherwise in my experience it was always better to have all the bins in the same place.
But we don't have to make it more complex, just because it's already. Do you have a proof that keeping those 2 bins in APC, compared to, say, memcache brings significant performance gains ?
Comment #58
MrHaroldA CreditAttribution: MrHaroldA commentedI do!
Memcache didn't boost any of our website's performances (even slightly lowered the requests/sec). It did however reduced the load on the database, so I guess high-traffic sites with lots of logged in users will benefit most from memcache. Using APC as
cache_class_cache_bootstrap
andcache_class_cache
boosted requests/sec with 8-12%, depending on the size of the site.Unfortunately we can't use APC because we can't clear the cache from Drush ;)
Comment #59
Anonymous (not verified) CreditAttribution: Anonymous commentedI can confirm what MrHaroldA says about the performance increase for those two bins, although I can't offer benchmark data. We currently host 12 sites, most with single web servers and some with dual web servers and one with 3. We only saw a reduced database load from a Memcache-only setup, and even a minor reduction in performance that was more pronounced when using multiple servers. Remember, Memcache requires a TCP connection if you're going distributed, and TCP connections have additional overhead to process as well as latency. We did not see the minor performance reduction when using a local Unix socket, but that's only good if you've only got one web server or if you're going to run multiple Memcache daemons (one on a TCP port, and one on a local Unix socket).
When placing those two bins in APC, we got around a 10% boost. Those bins probably change the least often, which is why I like putting them in APC. For the other bins which change more often and need to be consistent from web server to web server, I prefer having those in Memcache.
Comment #60
TwoDI gave the #41 patch a test run but had a few issues.
Only actually clearing the cache in the class destructor risks keeping old data cached too long. For example; I create a cache instance for some table, read from the cache and update a couple of rows in the table. I naturally make sure to invalidate/delete the cached version of the updated rows. But then I hold on to the cache instance while doing more stuff which needs to fetch more rows from the same table, potentially including the rows I updated earlier. To be efficient in case someone else already fetched some of the data sets I'm interested in, I ask the cache first...
That could be solved by perhaps making the
$drush_flush
array keyed by the bin and cid and always look in there before returning any cached rows. I'd love to get around to trying that, but I couldn't right now and needed a quicker solution. So, I simply moved everything from the destructor into the clear method and everything related to$drush_flush
.Also made a few smaller tweaks to make debugging a bit easier and make a checks less verbose.
!empty($GLOBALS['conf']['apc_nodes'])
is enough, no need forisset()
first.I'm currently using the code provided in the patch here on a production server, but I would not recommend anyone else doing it before a more thorough review has been done.
Btw, for some reason, clearing some user caches, like the menu cache, in APC takes several minutes and almost always times out when done over XMLRPC. I can't use Admin menu's shortcut to clear all caches either, that always times out... Anyone else having this issue with APC?
EDIT: I forgot to mention that the #41 patch also logged the "@todo" warning message on every Drush command which accessed the cached because Drush was available but there was nothing to flush in the destructor.
Comment #61
chrisjlee CreditAttribution: chrisjlee commentedComment #62
arosboro CreditAttribution: arosboro commentedI applied the patch in 60, and it works on a single server with apc running locally.
However, when I disable/re-enable a module upon using drush pm-enable, I get:
It's appearing multiple times when I use drush up too.
Comment #63
chadsten CreditAttribution: chadsten commentedSubscribe.
Comment #64
helmo CreditAttribution: helmo commentedOne quick minor remark.
This variable could use a better name.. e.g 'has_drush'
Comment #65
R.Muilwijk CreditAttribution: R.Muilwijk commentedJust reviewed the patch from #60 and this are my thoughts:
- The patch naming is based on using Drush. The clearing is also only limited to Drush 'if ($this->drush)'. Is this really neccessary? For example if we have 4 webnodes and on webnode 3 some cid is cleared. Wouldn't we want this to happen on all nodes also?
- The XMLRPC Callbacks are called apc_drush_ though it's possible to also clear non APC caches.
- Instead of using Drupal's API cache_clear_all we are directly retrieving the object with _cache_get_object($bin). Is there a specific reason to do so?
- For going through the nodes the $GLOBALS['conf']['apc_nodes'] is used. Is there a reason we cannot use variable_get('apc_nodes', array())?
- When a particular webnode is temporary not available what would the default timeout be for a XML Request?
Comment #66
MrHaroldA CreditAttribution: MrHaroldA commented@R.Muilwijk, what about doing a code sprint somewhere in the (near) future? We meet somewhere and code for a couple of hours? This issue is taking far too long to complete this way.
Comment #67
R.Muilwijk CreditAttribution: R.Muilwijk commentedMrHaroldA, fine with me. Though if I have answers to my questions I'm willing to commit ;)
Comment #68
MixologicThe problem Im having, is that Drush/CLI php cannot clear the apc cache thats tied to the apache process, as they are in separate memory spaces, and as a result, apc module is going crazy with error messages on all my drushing.
So, to re-state what I think the problem is:
In order for drush to clear any caches, it needs to reach out to the webserver via http as thats the only method to get access to APC's opcode/user variable memory space.
A method whereby one can remotely clear a cache remotely could potentially be useful for another use case - cache invalidation across multiple web servers.
Challenges
1. For the site Im building right now, a 'drush cc all' invokes the clear method 56 times. Every time a variable is set or deleted a variable, thats another cache clear. So thats a substantial amount of http requests, and thats only if we're trying to propagate all these granular cache clearing events across a single webserver from drush to http. If we extend that to multiple web heads, we'll have N extra http requests for N web heads. All of that extra network traffic starts to look like the same performance tradeoff you'd get with memcached vs APC.
2. If we want to selectively empty caches, we have to have a way to determine whether a cache clearing event should propagate to other servers, or whether we are responding to an event that was propagated *from* another server, and therefore should not be propagated.
If we're using any drupal bootstrap at all (and we should, in order to have authentication and security), then we run into other circular issues: Example:
If we stick to only managing CLI->HTTP its fairly easy to tell which environment we're in.
The use case for clearing APC based caches cache on multiple web heads is arguably more edge case (only useful for certain infrequently changing caches like cache_class_cache_bootstrap), and is broken now for both front-end and CLI uses (any cache clear action that uses APC as its backend is only going to hit whichever web head responds to the request). And making this work for a multi head APC enviroment basically involves re-inventing what memcached already does.
Attached is a modification of #60, that
Still a little concerned that *any* drush command that happens to make a cache clearing event in any form would trash the cache on a production server (for example, a drush command on a cronjob running every five minutes)... so.. this might not be exactly what we want without *some* sort of caveat.
Comment #69
pounardI'm still not confortable with hitting with HTTP requests all frontends, this forces us to maintain a list of frontends, while using a notification system in a common storage would avoid us needing to do that.
Comment #70
MixologicSo rather than using xmlrpc to flush the drupal caches, we could instead use the DrupalQueue to store up any cache clearing events that are happening on the drush side, and then as part of hook_init in apc.module we could pop the cache clearing events off of the stack.
This only really works for one frontend, because 'common storage' that all frontends can access = memcached.
APC is not architected to be a user data object store in a multi-headed environment, and while certain caches that are less affected by staleness and syncing issues can be implemented on top of them, I think its out of scope for this issue.
I went ahead and re-rolled using this method, and, its a *lot* cleaner. Much less code, drush commands will only dump the caches they're supposed to.
Comment #71
MrHaroldA CreditAttribution: MrHaroldA commented@Mixologic, that actually looks very promising! If, and when APC should cover multiple web nodes, it could even be expanded to store and claim in node-specific queues.
I wonder what R.Muilwijk has to say about this approach ...
Comment #72
MixologicShould be needs review
Comment #73
crea CreditAttribution: crea commentedSounds like an async cache clear. Really bad idea for consistency reasons.
Comment #74
Jorrit CreditAttribution: Jorrit commentedThe memcache module uses a 'generation' integer that is increased every time a cache bin is cleared, stored in a variable. It is used as part of the cache key, so increasing the integer invalidates the cache immediately. The downside is that the cache is actually not cleared, but using APC's expiration system this may be not a problem. Has this been considered?
Comment #75
MrHaroldA CreditAttribution: MrHaroldA commentedIt also adds a query to every page load to check the queue.
Where is that integer stored? Or where should APC store that integer?
Comment #76
Jorrit CreditAttribution: Jorrit commentedBy that I mean the
variable_get
kind of variable. So ultimately: the database.Comment #77
crea CreditAttribution: crea commentedI would like to vote for #74 as the most promising solution
Comment #78
Mixologic@crea Im not sure what you mean by
Drush makes changes to the underlying database, thus making the cache in APC stale, and inconsistent with the underlying data. So when drush does this, it needs to trigger APC to invalidate its caches. In all instances where a cache is cleared as a result of a drush command, *not* clearing the cache in APC means you are already inconsistent with whats in the cache vs. what *should* be in the cache.
Relying on a variable is problematic.
1. Variables themselves are cached, so drush could update that variable and it would not be reflected on the front end.
2. That variable cannot be stored in APC, because the problem we're trying to solve here is that drush cannot access whats in APC..
3. Memcache can use a variable because drush is able to communicate with memcache.
4. Memcache is using the generation integer to solve the multiple node issue, not to solve 'how drush can communicate with memcache'.
Also
It only about .5 ms to all my page loads. There are only two fundamental ways that drush can signal to the webserver that its caches are stale - passively with webserver polling (a database query on every pageload) or Actively with httpd connections. XMLRPC/HTTP/SOAP what have you is so terribly non-performant as to make drush useless, which is why I threw up that earlier patch that just dumped everything all at once.
Comment #79
crea CreditAttribution: crea commentedMixologic
async cache clear is bad, because u trigger it in one request, and process in another. And between the requests you leave the system is in unpredictable state - drush got report of cache clear, but continues to work with the old cache.
Comment #80
Mixologic@crea - I think you're not understanding the problem here.
Drush does not use the cache. Because we've configured the cache to use APC, every drush request will create and destroy its own APC cache, on every invocation of drush.
Drush makes changes to the database, which means that the http front end is continuing to work with the 'old/stale' cache.
Asynchonous cache clearing is *not* bad and is not a problem whatsoever. Asynchrous updates to the underlying database, without clearing all caches built from that database *are* bad, and that is the problem we're trying to solve here.
Finally, this isn't even asynchronous. As soon as drush reports that a cache bin needs to be cleared, the very next http invocation will clear that cache bin before it gets a chance to use the stale data.
Comment #81
btopro CreditAttribution: btopro commentedneed to do some more testing on #70 tomorrow but this seems viable. Initial tests seemed promising, will need to fill cache bins then try to destroy them via drush w/ data that I know will be stale.
Comment #82
btopro CreditAttribution: btopro commentedconfirmed this morning that #70 works. +1 committal and if the number increment on bins ever gets a patch that seems like a separate issue. I could see either method being used though I'm not sure if the number incrementing would cause fragmentation issues since you wouldn't be releasing the cache bin data you'd just be creating new ones.
Comment #83
MrHaroldA CreditAttribution: MrHaroldA commentedIsn't it true that the cache is only emptied when a user requests an uncached page (so that hook_init() is called)? Which in theory could take up to a couple of hours?
And what if you're using multiple servers that have their own APC cache? Only the first server will get it's cache purged, right?
Just picking my brain on this ...
Comment #84
btopro CreditAttribution: btopro commentedyeah, but this provides a possible solution of:
Run drush cc all
go to site via browser, login and bins will be cleared
vs currently there is no solution.
No clue on multiple servers, would need someone to step up and confirm if that works or not. Guessing no since it's just using the built in php APC functions for clearing data but I don't understand multi-server APC setup.
Another thought is is drupal bootstrapped enough in hook_boot to be able to run the init portion? I don't think this is an end-all-be-all patch, I just think it actually provides a potential solution.
Comment #85
MrHaroldA CreditAttribution: MrHaroldA commentedIn #70 Mixologic already mentioned "This only really works for one frontend"
#74 sounded perfect, but #78 noted that variables are cached too, so that's also not viable ...
I frankly don't know what to do next.
Comment #86
btopro CreditAttribution: btopro commentedMaybe allow for capabilities like this via sub-modules, this being one possible solution?
Replace the current part that queues items w/ a call to a new hook like module_invoke_all('apc_cache_clean_up', $this, $cid, $wildcard), then move the hook_init() part of the patch to a new sub-module called apc_drush_cc_webnode (better name needed). Then other approaches discussed in this thread could be added to other, small sub-modules as potential solutions present themselves.
Limitations of this approach being needs to be hit w/ an auth'ed call to trigger the clear, doesn't always hit all bins (I think it did in my testing though) and doesn't clear across environments. Other approaches may have additional limitations but I feel like this is so specific to the type of environment you are deploying Drupal / APC in that a hook based approach may be the most flexible solution.
Comment #87
pounardI still think that using some kind of variable (or a single SQL query) to check for the last purge timestamp would be a better solution than sending HTTP requests to all nodes, and moreover will be really failsafe. This would cost an extra query to the SQL backend (because we actually cannot use the variable table) but considering the time we can save at bootstrap by using APC this could be a good compromise.
The patch in #70 is not good because you can store and read cache entries way before it's being called, i.e. the cache_bootstrap bin for example.
Comment #88
MrHaroldA CreditAttribution: MrHaroldA commentedThat could actually be hidden behind a setting for those who don't need/want it!
Comment #89
pounardYes of course
Comment #90
twistor CreditAttribution: twistor commentedI have to agree that using the DB is the only sane way to do this. It adds a single db query, but it's fully indexed, so it won't even hit the table on Innodb.
Here's a quick idea of how this could work. Rather than add another setting to disable it, we could just check whether the module is enabled or not.
I'm not sure if I followed all the logic for NULL, *, empty, or whatever that Drupal does.
Comment #91
twistor CreditAttribution: twistor commentedOf course that was too simple, module_exists() isn't available yet.
Comment #92
pounardWe can reduce the number queries by SELECT * FROM {last_purge}'ing directly only once during bootstrap instead of querying for each bin. There is between 10 or 20 bins, which makes 20 SELECT'ed lines.[EDIT: I am stupid I misread your patch, sorry...]We need an option to be able to deport this SQL query into another backend (cache or variable) depending on how is the system. For example, if the site admin does not use APC for cache_bootstrap, we can use variables and therefore elminate this query. This system must also be optional (case where there is only one frontend).[EDIT: I misread twice, I have to buy me new eyes!]Patch looks good.
Comment #93
twistor CreditAttribution: twistor commentedMy patch has exposed another bug.
If you clear cache from drush, then refresh, you'll see a lot of messages like "Warning: Invalid argument supplied for foreach() in field_attach_load()" and "Notice: Trying to get property of non-object in field_attach_load()".
This is because getMultiple() is setting all of the cache keys, even when they're false. This is not correct, or at least what people are expecting.
Comment #94
pounardNice catch.
Comment #95
Jooblay.net CreditAttribution: Jooblay.net commentedtesting
Comment #96
Anonymous (not verified) CreditAttribution: Anonymous commentedHello folks, glad to see this issue is progressing. Looking forward to it.
Fengtan recently published Drush Webservice and I realized there might be some interest in what options that module might provide here. I haven't tried it yet, but I plan to use webservice for one project soon and I'm thinking I might run some tests to see if it would accomplish the same thing.
Comment #97
RaulMuroc CreditAttribution: RaulMuroc commentedAfter apply latest module (comment #93) the warning disappeared for me. I guess it is correctly cleaning cache.
Comment #98
Jibus CreditAttribution: Jibus commentedSame as #97
Patch from #93 applied and worked nicely ! Thanks !
(Tested on a PGSQL database)
Comment #99
MixologicUsing a timestamp to determine whether or not a cache should be cleared can be problematic. If your CLI timezone variable is PST, and your webserver is using GMT, you may run into an issue where caches stay stale for 8 hours, for example. In drush we're explicitly clearing a cache - why dont we carry that explicit directive over to the front end and not depend on something unstable like a timestamp.Edit - timestamps are not subject to timezone issues. They are, by default, in GMT.@pounards insight in #87 was spot on:
Attached is a patch that still leverages the database to record necessary cache clearing events (using drupal queue), and moves the cache clearing check into the cache constructor, much like the patch in #93.
Note that this patch is specific to drush: APC should *not* be used on multiple web heads. I mentioned in #70 that "This only really works for one frontend" because APC really only works for one frontend. APC is not a distributed key value store. Memcached and Redis are both great examples of a distributed key value store, and if you *need* to run a key value on multiple web heads, then you should use them instead of APC. But APC's key/value is faster! Its faster precisely because it does not have all of the logic and network overhead required to make it distributed. So, by the time you bake in everything necessary to make APC work with multiple webheads, you'll likely end up with equivalent or worse performance than either memcached or Redis.
Comment #100
MixologicBad comments.
Comment #101
MixologicComment #102
MixologicComment #103
pounardUsing a queue for this is a very nice way of achieving. I do like it because it looks like a nice journaling mecanism. Thought it could be optimized a bit by clearing the queue once you claimed at least one wildcard clear order.
This behavior should still be enabled or disabled by a variable as the #93 patch does, instead of relying on drush presence, this would avoid people not using drush to end up with useless queue queries.
Another note, it would be simpler to claim only one queue item, and drop the full cache bin when having one, I doubt that people would switch huge cache bins such as field or page on APC: they would consume their RAM so quickly that APC would spend its time purging least recently used entries; Considering this we'd better not even bother trying doing things right by reading queued orders and reading them, but just clear completly the bin each time you got any order. That'd probably need some profiling but I seriously doubt anyone sane would give GB's of their RAM to APC.
Even thought I agree APC should not be used when you have multiple frontends, I continue to think that people might use it anyway, case in which the queue method won't work anymore because you drop the queued items when consuming it making them disapear for other frontends. And we're back to #93 for such environments.
Comment #104
MixologicTrue. I was watching it do 130 queue pulls to flush all the caches. At the same time, this isn't something that happens often - it just makes that next page after a drush clearing event take a few more milliseconds.
If so, then this needs to be something on a settings page, not a hidden variable that you would only find by looking at the code. (apc_show_debug is a good example of a hidden feature baked in).
Thats not a decision we should make. I have a site with minimal content on a server with 32 GB of ram. I have plenty of ram to support putting the entire cache in APC.
- why should invalid configurations be supported? People can also try to run APC using fastcgi or fcgid instead of php-fpm, and they would be shooting themselves in the foot, with a user cache per process. The only situation I can see where you might get away with using APC usercache with multiple processes or frontends is where you don't care about cache consistency at all. In which case, how would any of these patches help?
Comment #105
pounardAs I recall FPM processes share the same OPCode and user cache; The same goes for PHPCGI I supposed since it's the ancestor (not that different) of FPM. So it's actually not an invalid configuration.
It's not invalid to use a cache at the frontend level even if you have multiple frontends, the only thing you need is to have a good sync mecanism, #93 was not that bad in that regard. In that use case with this mecanism you have 1 extra SQL query per page but a lot of other remote queries could go to APC instead of a remote server, I mean if the sysadmin decided it was worth the shot after doing some measures, then it does not sound illegal at all to me.
Comment #106
MixologicThats not true. PHP-FPM fixed the problem that both fastcgi and fcgid had with having an opcode cache and user cache per process. The bug report is here: https://bugs.php.net/bug.php?id=57825 and note the comments, particularly the one from rasmus@php.net on 2011-07-08. More info on this stackoverflow thread: http://stackoverflow.com/questions/598444/how-to-share-apc-cache-between...
If you are trying to use an APC user cache on an apache system with fastcgi, or with fcgid, you are going to have data consistency issues between processes with multiple opcode caches (bad for memory consumption) and multiple user caches (bad for consistency). This should not be something we should try and support.
Other than this thread, I can find no examples of people advocating the use of APC on multiple front ends. In fact, one of the ways people *do* ensure consistency in APC caches is to wrap it in another distributed cache (http://fourkitchens.com/blog/2009/11/30/intelligent-memcached-apc-intera...). Every example I see all say if you want to use a key/value datastore, you cannot use apc if you need more than one webserver.
We cannot prevent people from doing foolish things and misconfiguring their webservers, but we certainly shouldn't give people the impression that this type of configuration is reasonable by adding weak support for it.
#93 only causes the front end to reset it's cached data if the cid = '*'. There are many cases where a cache would get cleared out by drush, yet not hit the * cid. This would result in cache data inconsistency.
Comment #107
pounardOk so I didn't recall correctly, I'm used to Nginx when using FPM, so I don't have that kind of problem on my own.
#93 does not handle other than wildcard flush, that's true, but it could just send a full purge each time, doesn't sound that bad if you use it on small caches such as the bootstrap bin.
Comment #108
torotil CreditAttribution: torotil commentedHere is a patch based on #60 and addressing the issues raised in #68:
This is still drush specific but it is rather easy to generalize. The following steps would be needed:
Comment #109
torotil CreditAttribution: torotil commentedThis one also applies to 7.x-1.x.
Comment #110
MixologicThis is great, I it solves the performance crush of too many xmlrpc requests, eliminates the need for any excessive hits to the database, and keeps the clears tightly synced.
Couple of things:
$GLOBALS['conf']['apc_nopropagate'] would only ever get set during an XMLRPC callback, and as such would never be true. So this setting doesnt actually do anything, and continues to promote the idea that you can use APC on multiple servers.
This introduces that problem mentioned in #69 - namely that now we have to maintain a list of nodes somewhere, which could be different between development and production, and there's no way to manage that list with this patch. It's just a hidden setting one has to put in settings.php.
This has to go in apc.drush.inc - hook_drush_exit doesnt get called in .module files.
Comment #111
MixologicWhoops. Forgot to include the apc.drush.inc file in the patch.
Comment #112
Exploratus CreditAttribution: Exploratus commentedTried #111. Seems to be working, notices are not showing up.
Comment #113
Ravenight CreditAttribution: Ravenight commented#111 seems to work for me too.
Comment #114
jewseppi CreditAttribution: jewseppi commented#93 resolves the CLI warning without any issue.
#109 resolves CLI warning but causes a new error.
'all' cache was cleared in /var/www/drupal/htdocs#default [success]
xmlrpc() error: (-11001) php_network_getaddresses: getaddrinfo [error]
failed: Name or service not known
Comment #115
torotil CreditAttribution: torotil commented#111 relies on the base_url to be set correctly. (ie. to be the correct URL for calling the site). Without any additional configuration drush will generate a temporary base-url http://sitename that won't resolve. That's why you get the network error.
#109 solves this by requiring the user to actively set the correct base_url (or optionally more than one).
Configuring drush to know the correct base_url is not such a bad idea in the first place. At least your cron-jobs should have that anyway. To do so you can either use the --uri/-l option of drush or configure a site alias.
In general I'd prefer to have a configurable list of urls to send the xmlrpc-requests to and let it fall back to array($base_url). That would open the door again for a multi-server solution (which IMHO isn't even that far away).
Comment #116
MixologicPerhaps all that is needed is to add a check to see if $base_url is not set (returns http://sitename) and add back in a drush error message that says "base_url is not configured, drush cannot clear the cache"
Can we please keep this issue on track and not continue to derailing it with ideas of 'multi servers'. The scope of this issue is to bridge the gap between drush and the webserver. Lets get that working and committed first.
There are myriad issues with having multiple instances APC user caches, and these patches are miles away from addressing all the synchronization, race conditions, and data integrity issues that can result from having multiple copies of the cache. All of those issues exist whether you are clearing cache from drush or from the front end, so realistically they can/should be solved independently, so if somebody wants to tackle all of that, can we open a separate issue for it?
Comment #117
torotil CreditAttribution: torotil commentedDuring further tests I've had one issue with the xmlrpc callback: If I disable apc while it is still configured in the settings.php the webnode has a cache that doesn't know about apc's xmlrpc-callback. This means I'm unable to flush APC-caches on the webnode.
So I've attempted a few things at once:
In my tests this seems to work just fine.
Comment #118
torotil CreditAttribution: torotil commentedHere is the patch with a separate flush.php file but without multi-site capabilities.
Comment #119
torotil CreditAttribution: torotil commented… and here is the XMLRPC based approach with an additional warning if the $base_url == 'http://sitename'.
Comment #120
Mixologic119 - was missing the apc.drush.inc file - attached one + the interdiff
Also, added some require_once into that drush_exit as it would throw a Fatal error: Class 'DrupalAPCCache' not found if you were doing something other than cache operations.
118 - Im pretty sure we wouldnt want to add additional php files that respond directly to requests. It increases the attack surface area of a drupal install, so something like this should probably go through the security team- so Im not sure what we need a flush.php file for instead of just using xmlrpc.
117 - How would this mitigate race conditions, and how would you set up a test to show that they are mitigated?
Comment #121
torotil CreditAttribution: torotil commented119, 120: I've introduced a
if (class_exists('DrupalAPCCache'))
into the other patches for that reason. If the class was not loaded, chances are it wasn't used to flush any caches.118: having to use a stale cache to be able to flush caches is a bad idea in general. as described above this lead to a race condition where I was not able to flush APC on the webnode at all (neither via drush nor directly). The extra bootstrap solves this by not using any caches (except variables for the cron-key and the bin-prefix). At least it reduces the number of caches that we use during a cache flush to a few known. We might still reduce that by using a HMAC for authentication instead of the form-key.
Regarding your security concerns:
117: is prone to the very same race conditions as 118 and still less than 119.
Comment #122
torotil CreditAttribution: torotil commentedHere is a HMAC based version of 118 with the extra class_exists() check. This is what I'll most likely use. The only thing that it relies upon is that the bin prefixes (if they are configured in the database) do not change.
Comment #123
Mixologic119, 120: I've introduced a if (class_exists('DrupalAPCCache')) into the other patches for that reason. If the class was not loaded, chances are it wasn't used to flush any caches.
- Thats a much better idea. Attached is that update.118: / 122 : Unless Im misunderstanding something, what you describe is not a race condition, it's a misconfiguration issue that only happens if you have an invalid cache backend specified in your settings.php. Unless we're trying to make it so that the cache class can be used *without* enabling the apc module (and still support drush), which, I don't think is a requirement.
Security: $_SERVER['PHP_SELF'] is a field that contains user input. It would be pretty farfetched that an attacker would A. be able to create a directory named flush.php somewhere under document root, and B. upload a file named flush.php/dirname/includes/bootstrap.inc somewhere under that docroot. Its the kind of vector that, on its own is pretty trivial, but could be combined with other vectors allow for a breach.
117: I don't think we're talking about the same thing when I say 'race condition'. The fundamental reason multiple APC cache nodes is a bad idea is that caches are effectively data. Data that may get relied upon to create other data. If you are storing this data in two places, there is a very real possibility that separate requests will need to manipulate the same data in different ways. When that happens, you have a race condition where whichever request saves first will either be overwritten by the second request with now stale information.
Comment #124
torotil CreditAttribution: torotil commentedFor us it is: APC has to be configured in settings.php. The module state is managed in the database. Usually our deployment does first configure all files and then bootstrap the database. Having to change the settings.php after installing requires an extra step and also complicates the usual dev-staging-production setup.
$_SERVER['PHP_SELF'] is the path to the currently executed PHP script. If a user can manipulate that you have a really serious problem with your server setup.
How would such a directory affect the script? The attacker would need to create an includes/bootstrap.inc somewhere between DRUPAL_ROOT/ and DRUPAL_ROOT/sites/modules/…/apc/. So if the attacker is able to hide arbitrary code somewhere in the docroot of your webserver (which means he has write-access to your docroot outside of a files directory): Why not just manipulate drupals index.php or create a separate php-script and call it? I think in almost all cases this comes down to the statement: If someone is able to execute arbitrary PHP-code on your server he can do so using by using a forged bootstrap.inc too.
Comment #125
torotil CreditAttribution: torotil commentedThat's the same with every cache. It's a tradeoff between speed and consistency. In theory we even have race conditions with drupals core cache and multiple simultaneous requests! (Request A doing several cache clears while B does something else that uses the partially flushed cache)
What exactly do you think is the essential difference between
?
Comment #126
Mixologicre: 124:
It's not clear to me if that is a problem with enabling the module in general, or if you have a non-standard deployment process that has issues with this.
Is it that if you enable this in settings.php, and you have in update_N hook to enable the module in production, that when deploying that code, it can't actually run the update hook because it's already enabled in settings.php? Or does drush just fail to clear cache that one time until the module is enabled?
$_SERVER['PHP_SELF'] is user submitted data.
if you go to
http://myserver.com/sites/all/modules/contrib/apc/flush.php/user/submitted/data/here
, it will execute flush.php, and $_SERVER['PHP_SELF'] will be set to '/sites/all/modules/contrib/apc/flush.php/user/submitted/data/here'. Thats not a server configuration issue, thats by design and well defined behavior.This makes $_SERVER['PHP_SELF'] vulnerable to XSS attacks if it is ever printed without sanitation. But In this particular case I was imagining a scenario where it was vulnerable to a directory traversal attack. i.e.
https://myserver.com/sites/all/modules/contrib/apc/flush.php/../../../../default/files/somedir
.Im unable to pass the directory traversal strings into $_SERVER['PHP_SELF'] as apache interprets them when looking for the script to execute, and no amount of encoding seems to be able to pass the traversal paths to PHP_SELF. - That doesnt mean it cant be done, just that I don't know how to do it.
In any case flush.php may be secure and have no vulnerabilities whatsoever. But the only way to know that is to have many people look at it, so if thats avoidable, its better.
I.e. Im not trying to have a security discussion about a specific issue, just that if you add a new way to access the server, you need to have a security discussion.
Its a tradeoff between speed and *currency*.
The drupal cache is architected in such a way as to maintain consistency. Im not aware of too many issues in the queue where people have complained about strange data corruption and the answer having been 'yeah, caching will do that to you'.
They probably all suffer from some consistency issues. In some iterations there wasnt a network dependency. xmlrpc adds that back in, so that all would be affected by a network outage. So if the network is down, drush cant flush the cache.
Probably the main difference I see there is that drush is likely to not be happening all the frequently, and so drush to it's own node is probably not that big of a deal, but when you start synchronizing the caches on normal usage, thats when there is likely to be a collision. But again, without going through every possible scenario in core, I cant say for sure that any of this is an issue.
In any case, Issues take so much longer to get resolved unless we minimize the issue scope. In this case it's still (I think) making drush work with 1 webnode. As we keep bolting on parts and pieces to support stuff thats out of scope for the issue, we're going to keep seeing more and more things that need to be added. There may be no issue whatsoever with running on multiple web heads, and my concerns are moot, but that doesn't change the fact that there isn't an admin interface to manage the nodes in these patches.
Comment #127
torotil CreditAttribution: torotil commentedI'm pretty sure we have such race-conditions in drupal core. The reason it doesn't show is because the caches are inconsistent only for very short periods of time. And in almost all cases the results are harmless or not noticeable. Let's just take the menu cache and the bootstrap cache. During activation of a module cache_clear_all() is called. It first clears the bootstrap cache then the menu cache. If in the tiny fraction of a second another request is made to drupal it is possible that it will use the stale data of the menu cache while using the a current version of the bootstrap cache. In short: drupal cache clears are neither atomic nor transactional - so it can't be consistent in all circumstances.
Back to the actual problem:
$_SERVER['PHP_SELF'] being manipulateable is due to a rather peculiar behavior of apache (I'd expect my web-server to return a 404 in such a case). So you're right that in that case the server might execute a bootstrab.inc outside of the drupal-root.
Using $_SERVER['SCRIPT_NAME'] should be safe though.
Comment #128
MixologicI still do not see the need for flush.php other than the one time that you enable apc caching on your servers, which is non-standard from how I understand typical dev/stage/production deployments to work. Deploying code with the settings.php change in it, and then running update.php with an update hook that enables the module as part of the deployment process should work fine. If thats what you're doing, and it doesnt, then thats another story, but flush.php seems unnecessary.
Comment #129
twistor CreditAttribution: twistor commentedI don't think any kind of web request synchronization will be remotely feasible.
@torotil, you seem to be confusing what consistency and race-conditions we're talking about. Consistency is only per cache bin. The menu cache and the bootstap cache are unrelated and know nothing about each other. Cache clears can absolutely be atomic and transactional. It depends on the cache backend. But, that should definitely be a goal. While yes, there are race conditions in Drupal core, they are considered extremely important and urgent issues and not primarily involving the cache system.
Using a web request could mean that caches are out of sync for multiple seconds. flush.php is an optimization that is a non-starter. Any kind of synchronization that goes through Apache/HTTP is a non-start though in my opinion.
I think the real way to solve this problem is to make the APC cache wrap a secondary cache backend. This could be redis, memcache, the db, whatever. In a similar fashion to #93, we could store the cache_clear timestamps in the wrapped backend. Any cache sets, will get written though to the secondary backend.
APC would make a single get from the wrapped cache backend to check if its data is expired, and if it is expired it would check the wrapped cache backend for a value. If it finds a value it stores it locally. If it doesn't, only then would it generate a cache entry. This would allow for a much higher level of consistency and have the added bonus of only making one server generate the values.
I know the patch I wrote in #93 didn't account for all of the various ways of clearing a cache, I stated so at the time. It was just a POC. All of that logic can be added to this design. I believe memcache is already doing something very similar with how it stores cache clear times in variables. It's complex, and pounard is probably correct in that it would be acceptable to just rebuild the whole cache. This design would at least limit that to a single server.
I know we can't force users to only use APC for smaller read-heavy caches, but we should instruct them to do so. If they don't, then they own the broken pieces.
Comment #130
Mixologic@twistor: The architecture you're describing sounds similar to what facebook implemented a few years ago.
http://www.slideshare.net/guoqing75/4069180-caching-performance-lessons-...
There are a lot of slides in the presentation that appear to be blanked out, but they are there if you download the slideshow. Pretty informative, even for slides - they are from 2008, but I think most of the architecture still applies.
Most important is Slide 11, what they store in the APC cache. They also indicate on slide 16 that they store a key in memcached to use for invalidation.
I still firmly believe that all this discussion of making apc work on multiple webheads should be its own issue. Most of the use cases for apc as a cache backend are *not* going to involve multiple webheads, and trying to solve that problem is blocking the immediate need, which is make drush, on a single server, clear the in memory apc cache.
Comment #131
iamjon CreditAttribution: iamjon commentedApplied and used the patches from #123. Works as advertised. The only caveat is to add --uri at the end of the cache clear call
so instead of drush cc all -y you would do soemthing like drush cc all -y --uri="yoursite.com"
Comment #132
btopro CreditAttribution: btopro commented+1 if this works against 1 server; that's at least a solution of some kind instead of right now there being no capability to do this. more complex scenarios can be accounted for as a separate issue.
Comment #133
Sylvain_G CreditAttribution: Sylvain_G commentedAnyway php 5.5 will go Zend Opcache, no more APC and apc user space stuff.
Comment #134
Mixologic@Sylvain_G - php 5.5 only implements Zend's Opcache. The APC user cache will still be around as APCu (http://pecl.php.net/package/APCu)
APCu is functionally compatible with APC, and as such can be used interchangeably. See the README.md here: https://github.com/krakjoe/apcu
Besides, Drupal 8 is going to be on 5.4, which will be supported for a very, very long time.
Comment #135
MixologicSpeaking of Drupal 8, just found out today that APCu is on its way to being part of core.
https://drupal.org/node/1807662
https://drupal.org/node/2231595
So, hey, there's that.
Comment #136
Sylvain_G CreditAttribution: Sylvain_G commented@Mixologic thanks for APCu, wasn't aware
php 5.4 will be end of life march 2015 with is NOT a very very ling time frame
Comment #137
Mixologic@Sylvain_G - yes, php 5.4 will be EOL'ed but Drupal 8 will be around for a long time.
Drupal 6 is still technically supported, and it runs on 5.2.x, which was EOL'ed Jan 2011. There's always a time lag between when a version of PHP is eol'ed and when we can actually stop supporting it (when newer versions of Drupal come out).
Hopefully this will be much less of an issue with semantic versioning and D8 and we'll more likely be able to stay on top of updates. It already appears as though d8 is fully forward compatible with 5.6 (doesnt use any deprecated functions from php5.4 or php5.5)
Comment #138
MixologicAlso, another useful thing to look at: D8 has a fast, inconsistent cache. Beejeebus just backported it to D7. https://drupal.org/project/schrodicache
Comment #139
btopro CreditAttribution: btopro commentedis the inconsistent cache part supposed to be a joke cause people are anti-APC User bin? Not getting the "inconsistent cache" reference. Reviewed the code, looks like it's a cache backend for the user bin much like this module is (just back ported from D8 so much more object oriented, no UI or documentation). It also doesn't appear to have a solution to the issue raised in this thread since it just has a generic clear function for single vs all bins flushed.
Comment #140
MixologicYeah, sorry, not a joke. I thought he was backporting the cache chaining features of here: https://drupal.org/node/1651206
and here: https://drupal.org/node/2231595
Looking at the code its, uh, yeah. Simple.
Comment #141
btopro CreditAttribution: btopro commentedOh I see what their intent is; basically cache in multiple locations and something like APCu with cache bins that get wiped under ram wiping would technically be inconsistent; yeah unfortunately that repo is currently very minimalist / not really that different from this project (yet).
Comment #142
Jibus CreditAttribution: Jibus commentedAny chance to get #123 committed ? :)
Comment #143
pimok3000 CreditAttribution: pimok3000 commentedafter patching the beta4 module with #123 using drush cc all -y i get :
xmlrpc() error: (-32700) Error. Not well formed!
The bunch of warnings about CLI are gone though!
Any ideas how to fix this?
Comment #144
Mixologictorstenzenk: Patches are always intended to apply to the dev branch only, not to beta-4. I would back out that patch, upgrade to the dev version, then apply the patch.
Comment #145
btopro CreditAttribution: btopro commented#123 reviewed and tested. While there are times it can fall down I think providing a baseline to what is currently effectively broken is a better solution then providing no solution at all. Running this in production and very happy that this is playing nicely with my usual clean up routines.
Comment #146
pimok3000 CreditAttribution: pimok3000 commented@Mixologic: yeah of course i did use the deta4-dev, i accidently didn´t mention that, i´m sorry. the error stays the same.
i cleanly uninstall the whole module and downloaded the -dev version to be absolutely sure. Then i patch the module with this one first https://www.drupal.org/node/1984180 and then use #123 with the result i wrote about in #143
Comment #147
R.Muilwijk CreditAttribution: R.Muilwijk commentedThe patch from comment 123 https://www.drupal.org/node/1565716#comment-8669945 looks promising. However shouldn't a check be done whether the the module exists? At the moment the setup is just to include the class and add it to your $conf array. The module is not required for the cache bin to work.
Comment #148
R.Muilwijk CreditAttribution: R.Muilwijk commentedAttached is #123 with a check whether the module is enabled. The only thing missing is a configurable $conf['apc_webnodes'] array for when you are running multiple web nodes and need to clear the cache on all of them using drush.
Comment #149
MrHaroldA CreditAttribution: MrHaroldA commentedIIRC, the consensis was that you shouldn't run APC on multiple webnodes to begin with. It's been a while, though ;)
Comment #150
MixologicInterdiff looks good.
Good catch on the module check.
I hadn't realized there were situations where hook_drush_exit could get called if the module were not enabled, but a drush help, for example, would trigger that call: http://drupal.stackexchange.com/questions/108038/does-hook-drush-exit-ge...
I would still call this patch RTBC, and create another issue to allow for the addition and configuration of multiple web nodes if you *really* know what you're going to be caching and can avoid race conditions, which, using drupal 7 is currently not possible without a core patch: https://www.drupal.org/node/1679344
Comment #151
R.Muilwijk CreditAttribution: R.Muilwijk commentedCommited to dev.
Comment #154
mr.andrey CreditAttribution: mr.andrey commentedJust adding a note here for reference.
The following error:
Is resolved by creating sites/default/drushrc.php with the following contents (with "mysite.com" being your actual site name):
Cheers,
Andrey.