When I run drush cc all I get more than 60 messages of the following kind:

cache_menu(*) was not cleared. APC cli uses a different memory storage then the webserver. For more info see: http://drupal.org/node/1278232

This is rather annoying... especially because there is a lot of commands which automatically execute a cache clear at some point. Would it be possible to reduce those ~60 messages to one for each invocation of drush?

Also, would it be possible to actually "fix" the cache clearing instead? I'm no expert, but a quick google search revealed a possible solution on stackoverflow. What do the experts think?

For reference, the message was introduced in #1278232: Provide documentation on clearing the cache when running cache clear in CLI (Drush) mode..

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

regilero’s picture

One problem, the trick used on Stack Overflow link means having a script available on localhost IP, it implies things which may be obvious (but they aren't) like an http server listening on localhost, a php process connected on that webserver, for this IP (and it could be an external PHP process if it's php-fpm). So the domain name & IP to use shoudl be settings (command line options?) and some documentation on webserver requirements (name of the script, access restriction on that script) should be well documented. Not a simple thing.

In case of multi-server installations Apc caches have one specificity, they 're server specific. So clearing the cache with drush would only clear the APC cache on server. We could use the fix (if any) for that very annoying message to fix this problem in the same time.

For example we could maybe set a cache validity timestamp when running the apc flush from php-cli (drush). And we could reuse that validity timestamp when reading from cache to return a cache miss for records older than this setting. This way every server retrieving outdated data would know it's outdated. The problem is that we cannot use a variable if cache_variable is stored on APC... This could only be stored on a shared storage (database? session!? other).

If there's any genius out there with a nice idea, please save us from this very annoying apc-cli limitation.

bforchhammer’s picture

Well, what I was thinking after reading the entry on stackoverflow, is the following: if cache clear works in the web-interface (on admin/config/performance), then it should also be possible to call that page via wget or curl from the command line and therefore have the right process take care of executing the cache clear.

For this to work, drush of course needs to know which URL to call (e.g. http://example.com/apc/cc?bin=...); and instead of submitting the form on admin/config/performance, I'd imagine that it would be better to add a rather simplistic custom page handler which avoids page rendering logic etc.

  1. Introduce a variable 'apc_cc_baseurl', which users can add to their APC configuration section in settings.php. Variable should be set to public-facing base-URL of site. (e.g. http://example.com).
    • If variable is not set, either try other base urls ($base_url, "localhost", ..?) or simply fall-back to the displaying an error message.
  2. Create an ajax handler function (e.g. on path apc/clear), which clears the cache for a specified cache bin; allow parameters for cache-bins, cids etc., make it inaccessible for regular users by checking for an apc_key parameter (re-use cron_key?).
  3. When we have a command-line cache clear, use wget/curl to call the respective url, e.g. $conf[apc_cc_baseurl]/apc/clear?bin=cache_menu&apc_key=12345
  4. Check response of apc/clear requests and display error messages to drush user, if applicable

I'm not sure what you mean by "multi-server installations", but if every server has a public facing URL we could also allow apc_cc_baseurl to be an array and issue multiple wget/curl requests...

R.Muilwijk’s picture

Something that might be possible is a solution like this:
- Introduce a $conf['apc_stores'] = array('webnode1' => 'http://webnode1.site.com', 'webnode2' => 'http://webnode2.site.com');
- check or apc module is enabled. If it's enabled generate a string for the cache clear command and crypt it somehow with drupal hash. Foreach apc_stores call a callback defined in apc module like http://webnode1.site.com/apc_clear?command=$encrypted_string.
- Module callback would decrypt using drupal hash and run the actual cache clear on webnode's APC user cache.

R.Muilwijk’s picture

@bforchhammer, yes the Cache clear all works. But this is only on the CURRENT webserver node. When using multiple webserver nodes behind a load balancer you would have to run a Cache clear all on ALL nodes!

regilero’s picture

I like the idea of having an array of front server to reach, requesting for apc cache flush. +1

MrHaroldA’s picture

Reverting a couple of features using drush spawns hundreds of APC warnings. Really annoying ...

regilero’s picture

@bforchhammer:

I'm not sure what you mean by "multi-server installations", but if every server has a public facing URL we could also allow apc_cc_baseurl to be an array and issue multiple wget/curl requests...

When you have such configuration:

Load Balancer ----->front server 1 serving www.domain.com
            \---> front server 2 serving www.domain.com
            \---> front server 3 serving www.domain.com

You have 3 APC caches, one on each front server. (More if each front server hanlde several domains, but that's another problem).

Using drush cc all on front server 1 would only clear the APC cache on front server 1, not on the 2 others. That's why comment #3 array solution would be nice also.

geerlingguy’s picture

I have numerous drush actions triggered via cron, and each one gets an email kicked back with these errors displayed in the email. My system had about 42000 emails when I just checked after about a week (there are a lot of cron jobs that interact with drush).

Maybe you could use a variable to determine whether or not the error messages should be displayed?

So, like:

  function clear($cid = NULL, $wildcard = FALSE) {
    if (drupal_is_cli() && function_exists('drush_log') && variable_get('apc_show_cli_error', 1)) {
      drush_log($this->bin . '(' . $cid . ') was not cleared. APC cli uses a different memory [...]', 'warning');
      return;
    }

(Note the addition of 'apc_show_cli_error' in there). If this approach is accepted, I'll gladly write a patch really quick. I just commented out the drush_log() call on my site.

MrHaroldA’s picture

@geerlingguy, I would rather have Drupal clearing the APC cache on the first webbased request than hiding the message, Clearing cache with APC enabled can't be done through the CLI now, which kinda sucks.

regilero’s picture

Priority: Normal » Major

Taking #8 in account I think this bug needs som higher priority.

Waiting for a working cache cjear we could at least try to avoid messages floods which can have some dangerous side effects on working installations. Maybe storing a timestamp when emitting this drush_log message and avoiding more than one message per minute at least.

pounard’s picture

My favorite solution is the #1 proposed validity timestamp, this is also appliable to all other backends and a ensures that it will work with the multiple frontends configuration. It is today the approach taken for checking tags validity in the Drupal 8 database cache backend, instead of using a timestamp they use a more complex tag checksum, but it's still a good approach IMO.

I can see only one problem with it, if some cache keys are never being read and has no expiration time they will stall indefinitely in the APC shared memory.

I can see an alternative based upon this solution, but that will work with APC only:
1) Set the lastest flush timestamp for each bin into a variable
2) When instanciating the APC backend:
2.1) Load a special APC user key such as "$host_latest_flush_$bin", if not present set it to now
2.2) If this key exists and is prior to the variable, flush APC locally normally and reset this user key to now

R.Muilwijk’s picture

One of the main reasons for using APC cache is because it's the closest cache to PHP and faster then for example Memcache. One of the downsides is that the CLI shared memory is not similar to the webservers shared memory. When using multiple webnodes the shared memory of the different webnodes are not even similar. When using APC it's important you understand this.

So there are two types of cache clearing, expiration and because of an action. APC already handles expiration, double implementing this is a waste of processing time so I'm not willing to make read's slower by implementing a custom system for this.

The second type, cache clearing because of an action. By hitting the 'Cache clear all' button, using Drush, saving a node or any other action is harder. This should invalidate the cache. The questions is what is a proper way without making read's slower to invalidate this for all the different types of shared memory instances. At the moment I still think #3 is our best bet.

pounard’s picture

Actually both #1 and #11 would work knowing all that. The only detail I forgot for #11 is that variables are cached too. Grr.
EDIT: Both are based upon the comparison of two timestamps, one in the local cache (weither is CLI or web doesn't matter) knowing that the shared variable (not in cache) would be the same for all contextes of the same site, while the local variable (in APC user space) would be unique for each context and would allow to know using a simple comparison when the local cache is outdated or not.

R.Muilwijk’s picture

Could you show me the concept in some dummy code (doesn't needs to actually work)?

pounard’s picture

Yes of course, but you'll have to wait until I get home I don't have the environment for this @work.

Regarding #2, it would work, but I'm afraid that on some environments it might fail for any reason (for example, php-fpm or CGI having different DNS resolution than the proxy in front, or any other possible reason): implementing it would be nice but accompagnied with a reasonable fallback.

Regarding #2, I think the same environmental restriction would apply.

I'm against just ignoring the drush warnings, they all reveal pragmatic errors and should not be hidden, but solved instead.

Further, I think the memcache module might have some pieces of the solution for us.

regilero’s picture

@pounard, one problem with #1 && #11 is the multiple server configuration as shown in #7.

Invalidaing apc cache content is a task that each server will have to do. So the instruction (wether it's a variable or something else) will have to be available for all server. Even if some of them have already performed the wipeout.

Another problem: If the variables are stored in apc you have no way to alter a variable from cli and have the web part knowing it.

You need something which is always checked when apc cache is used and which may return something until every backend is known to have performed the requested action. This will have som speed impact (maybe small, maybe not) and could be related to file or database access.

Solution #3 is not a bad one I think. Even with a php-fpm multi-front install we should be able to list all front server with a custom url, allowing direct drush HTTP requests on each server. it may have sone impact one firwalls configurations, module documentation and local name resolution but at least it would work.

pounard’s picture

Yes you're right, variables are being cached, and if they are in APC we're doomed. It still can work if the 'cache_bootstrap' table is not in APC, but I don't think this will happen a lot...

seandunaway’s picture

Yes this is a bit annoying and would love a workaround.

I've been hiding with this (the cli check) in settings.php:

/**
 * Add APC Caching.
 */
if (php_sapi_name() !== 'cli') {
  $conf['cache_backends'] = array('sites/all/modules/apc/drupal_apc_cache.inc');
  $conf['cache_default_class'] = 'DrupalAPCCache';
  $conf['page_cache_without_database'] = TRUE;
  $conf['page_cache_invoke_hooks'] = FALSE;
  //$conf['apc_show_debug'] = TRUE;  // Remove the slashes to use debug mode.
}

But of course there is still the the inability to clear the APC cache from the command line.

I never had this problem with CacheRouter in D6 with APC user cache. How did they manage it?

seandunaway’s picture

maybe we can write a drush command to clear the apc cache using curl or some way via the server to access the other memory segment?

sun’s picture

Issue tags: +APC

Tagging. (I realize this is strange issue tag for this project, but I'm trying to gather/collect interesting issues+approaches across all d.o projects)

Anonymous’s picture

From my reading of CacheRouter's code, it doesn't look like it would also work with the use case of multiple backend webservers.

I'll be glad to work on and test a fix for this. I'm leaning toward an array of IP's for the backend and using cURL. The idea is that we can leverage cURL to override the DNS resolution. (From what I can tell, that won't work with drupal_http_request, because it forces the 'host' header to come from the url given.)

I could probably have a patch implementing this method rolled Sunday. Users that only allow PHP to execute specific files would need to be sure to add the clearing PHP file to their filters as a file that is allowed to be run through PHP.

At least that's my thinking.

Anonymous’s picture

Status: Active » Needs review
FileSize
2.54 KB

What this does is check for an array of server addresses. These addresses can be set by simply adding $conf['apc_servers'][] = 'your.server.ip.address'; to your settings.php. It loops over these addresses and uses cURL to make a request to each of your defined backends for the clear.php file that is included. If you don't have a backend set, it defaults to 127.0.0.1. You can use hostnames if you wish.

Why did I not use drupal_http_request()? Simple. It forces the Host header to be the same as the site that you are connecting to, which would not work in this case. A simple patch to core can fix it, but that needs to be fixed in 8 before it can be backported to 7, and the current direction here seems somewhat unclear - #1664784: drupal_http_request() has problems, so allow it to be overridden. It's probably pretty likely anyway that if you're using APC, you probably have the access or a host that's willing to run cURL.

pounard’s picture

I have multiple questions:

  • Are you sure about drupal_http_request()?
  • Why didn't you wrote a menu router item instead?
  • How does access checks are done? Are you sure you're not opening a potential DDoS vector?
  • Your clear.php file actually wipe out the whole APC cache, quid for multisite? For a PHP install that hosts multiple PHP sites (not only Drupal)? It should trigger cache flush calls using the API so that each backend would clear only its own cache, hence the menu router item instead.
  • When this clear.php file or menu item router, it could attempt a lock/semaphore named upon the local host public IP (in order to avoid conflict) that ensures that multiple calls won't happen at the same time. Not sure if this has a real use?
MrHaroldA’s picture

It also looks like clear.php isn't protected by anything (like a cron key or something) so everybody can flush all APC caches for an entire server ...

Anonymous’s picture

drupal_http_request() would have been my first choice, but it resets the 'Host' header whether you set it or not. I personally patch core to fix that in some dev environments when I don't want to rely on /etc/hosts or can't rely on DNS when testing APIs. So until that is actually fixed in core, we are stuck using cURL, fsocketopen, or stream sockets. The main reasons I used cURL are that I'm more comfortable in cURL and find it more readable (personal preference).

Re: menu router item - We can do that instead. This was just a 20 minute quickie to prove that the concept would work.

Re: Access checks - We could have it check for a key that is set similar to the cron key. Or we could even just use the cron key. I wasn't sure what way others would want to go there. Once again, proof of concept that we can clear APC through Drush.

Re: Clearing all of APC - once again, just an attempt to prove the concept of clearing APC through cache would work. I admit I didn't look deep to see how the module normally cleared APC. I was hoping to spark some more serious discussion toward a resolution rather than just continuing in circles.

I can make some more time later this week (probably Wednesday or Thursday) to work on it if no one else has the time. I'm staying pretty swamped, and this doesn't really seem too difficult to resolve.

pounard’s picture

Oh OK, sure this is a nice PoC.

Whether we choose a router item or a PHP file, we must add access control: the cron key like security might be enough thought we must ensure it might not be caught by external eyes: trying to enforce HTTPS if present (disabled per default but configurable) could be a good idea.

Why I prefer the menu item router is because the PHP file can be moved out, using a router item will ensure that Drupal is properly bootstrapped instead of having us to bootstrap it in our own clear.php file: this will be less error prone.

sun’s picture

However, you cannot bootstrap to a level/phase in which the router would work, because that involves cache lookups on various layers already, and the entire purpose of this is to flush potentially stale caches.

Thus, if you want to make it bullet-proof, I'm fairly sure you'll need a custom bootstrap that circumvents all caches.

In light of that, going with a apc_clear.php, potentially even including the prefix to clear, as well as a public/private key hash/token validation sounds more reasonable.

Lastly, just a question, feel free to ignore me: If this is about Drush only, then why not use Drush's built-in remote site commands? (which operate over SSH, if I'm not mistaken -- didn't use them myself yet)

pounard’s picture

This is not only for Drush, the problem exists when you hit the "clear cache" button via UI too. This must be Drush agnostic.

The router item is prone to stalled cache, but this specific router item is not supposed to change, so even considering that we might bootstrap an outdated Drupal, it'd still work.

I agree that having a plain php file instead could be better, but this must be in the Drupal root, side by side with index.php, else the session handling (or potentially other stuff) won't work because some core internals are deeply mis-using PHP superglobals and rely on the SCRIPT_PATH to setup some various hashes at various places (session_id() and session_name() are two of them).

So, until we don't have that fixed, we cannot allow ourself to attempt a Drupal bootstrap using a custom PHP file.

MrHaroldA’s picture

Status: Needs review » Needs work

@Brian: I took some time to familiarize myself a bit with the APC-module and it's workings so if you need any assistance please let me know!

Sylvain_G’s picture

Just run into this issue with a

LB
- Web 1
- Web 2

Configuration

Anoying message and some side effects

AFIK APC is the closest to PHP but not a Service, what you are trying to acheave is a bypass a hard limitation of APC

I'm considering moving my cache to a memcache instance on the backend Mysql server

MrHaroldA’s picture

Title: Make drush "cache not cleared" error message less annoying » Make "drush cc" clear cache on the webnode itself
Category: bug » feature
Issue tags: +drush
FileSize
3.41 KB

Here's a proof of concept and work in progress patch to enable Drush to clear caches on your web node(s).

It's full of @todo's, missing error checks, plausible security holes(!) ... but it seems to work! ;)

settings.php example, note the 'apc_nodes' entry:

$conf['apc_nodes'] = array('http://node1.server', 'http://node2.server');
$conf['cache_backends'][] = 'sites/all/modules/contrib/apc/drupal_apc_cache.inc';
$conf['cache_class_cache'] = 'DrupalAPCCache';            // stuff in the 'cache'-bin goes to APC
$conf['cache_class_cache_bootstrap'] = 'DrupalAPCCache';  // bootstrap stuff from APC

Please have a look at it and tell me if I'm heading in the right direction.

MrHaroldA’s picture

One issue I already found: when the site is in 'maintenance mode', the site will report a "503 Service unavailable" when trying to clear the cache.

Jorrit’s picture

I think the functionality to loop over nodes should also be invoked when clearing the cache from the site itself. If I clear an APC cache on server 1, it would be great if the APC caches on the other servers would be cleared as well.

MrHaroldA’s picture

@Jorrit: good idea!

I will also make the setting configurable so that you can capture it in a feature.

MrHaroldA’s picture

One issue I already found: when the site is in 'maintenance mode', the site will report a "503 Service unavailable" when trying to clear the cache.

xmlrpc keeps working, even when the site is in maintenance mode... Shall I rewrite my patch to use that?

vinmassaro’s picture

Any progress on this? It is a blocker for us to use the APC module in our university installation when these messages are output by Drush. Thanks.

MrHaroldA’s picture

@vinmassaro: I'm waiting for the maintainer to chip in on the XMLRPC -vs- menu callback issue. Once that's settled, a patch can be made pretty quick, maybe even next week ...

vinmassaro’s picture

@MrHaroldA: Ok, hope R.Muilwijk can chime in so this can move forward and a patch can be tested. Thanks!

likewhoa’s picture

@MrHaroldA just do it and reviews/revisions/RTBC will come ;)

MrHaroldA’s picture

@likewhoa ;)

MrHaroldA’s picture

Here's a patch to flush APC caches with XMLRPC. Configuration is the same as in #31 and even with the same @todo's, warnings and errors!

Where Art Thou, R.Muilwijk?!? ;)

(keeping it at 'needs work')

Anonymous’s picture

Just looking at the code (I haven't actually run it yet... staying extremely busy recently), this requires Drupal to answer for each "APC node" by hostname, correct?

pounard’s picture

I'm a bit suprised to see code into the APC class destructor, could this be a Drupal shutdown handler instead leaving the cache backend class untouched/untainted?

MrHaroldA’s picture

@pounard: I've added the code in the APC class destructor to gather all bins that should be flushed and only perform one request to each web node to flush them.

pounard’s picture

I still think this is doable (and generalizable to other backends than APC) by doing it in a more generic fashion outside of the cache object. I'll try to do a PoC later.

MrHaroldA’s picture

@pounard: Thanks! Looking forward to it!

vinmassaro’s picture

@pounard: thanks, looking forward to testing a patch.

pounard’s picture

I'm working on it, coming soon! :)

If I understand well there is two main problems:
1) Clearing APC cache locally (APC only)
2) Sync. all web nodes for "local" caches (Potentially all backends

I did in a separate module the 2) using the XMLRPC patch above. 1) works with this patch too (I guess).

MrHaroldA’s picture

@pounard: the main issues was that you can't clear APC cache from the commandline, the webserver needs to clear it's own cache. If the site is served by multiple web nodes, the cache should also be cleared on those servers.

crea’s picture

Using APC on multiple web nodes is a very bad idea (because of consistency issues caused by high concurrency). Supporting that is supporting bad design. Don't do that.

pounard’s picture

While I agree considering we continue in remote hit path, it could be safe, nice and fast using a different solution for invalidating caches than remote hit, a simple solution such as storing a timestamp in a shared storage that serves the purpose of invalidating everything created before would be very efficient. Problem is that current Drupal doesn't really allows this, it would need to be implemented outside of the cache backend but in the cache API underneath.

Sylvain_G’s picture

#50 is right, clearing other web node cache can have some side effects, especialy regarding the Form Build Id

regilero’s picture

#50 & #51, of course using apc as a cache backend when using several servers cannot and musn't be done for all caches. But some very early drupal bootstrap caches can fit quite well in apc cache, cache_class_cache or cache_class_cache_bootstrap.

crea’s picture

#53
If you still need standalone cache storage there's no point in making the whole configuration more complex just for some marginal speed gains. Simplicity wins any time.

regilero’s picture

#54. I wish you were right about simplicity. But hey, this is Drupal. There is no point in making separate cache bins handling if it is not to aplly separate cache policies on theses bins. And of course having a multifront drupal installation means you are running a quite complex instance which requires some complex fine tunning. And in all the small gains you can get with Drupal fine tuning; targeting local files caches on APC doesn't seems like a bad idea.

Of course I would prefer a more simple CMS which doesn't try to replace APC job and does not make IO directly on the fs, and doesn't try to manage a fiesystem cache map, and use php autoloaders to find files, etc. But simplicity is not a Drupal word.

pounard’s picture

Still working on it, but I don't have much time @work to polish my work. I will post a patch as soon as I can, can be in a few days or maybe next week.

crea’s picture

There is no point in making separate cache bins handling if it is not to aplly separate cache policies on theses bins.

Separate cache bins feature is mostly useful for separating things like "form cache" which is not a cache, actually. Otherwise in my experience it was always better to have all the bins in the same place.

But simplicity is not a Drupal word.

But we don't have to make it more complex, just because it's already. Do you have a proof that keeping those 2 bins in APC, compared to, say, memcache brings significant performance gains ?

MrHaroldA’s picture

Do you have a proof that keeping those 2 bins in APC, compared to, say, memcache brings significant performance gains ?

I do!

Memcache didn't boost any of our website's performances (even slightly lowered the requests/sec). It did however reduced the load on the database, so I guess high-traffic sites with lots of logged in users will benefit most from memcache. Using APC as cache_class_cache_bootstrap and cache_class_cache boosted requests/sec with 8-12%, depending on the size of the site.

Unfortunately we can't use APC because we can't clear the cache from Drush ;)

Anonymous’s picture

Do you have a proof that keeping those 2 bins in APC, compared to, say, Memcache brings significant performance gains ?

I can confirm what MrHaroldA says about the performance increase for those two bins, although I can't offer benchmark data. We currently host 12 sites, most with single web servers and some with dual web servers and one with 3. We only saw a reduced database load from a Memcache-only setup, and even a minor reduction in performance that was more pronounced when using multiple servers. Remember, Memcache requires a TCP connection if you're going distributed, and TCP connections have additional overhead to process as well as latency. We did not see the minor performance reduction when using a local Unix socket, but that's only good if you've only got one web server or if you're going to run multiple Memcache daemons (one on a TCP port, and one on a local Unix socket).

When placing those two bins in APC, we got around a 10% boost. Those bins probably change the least often, which is why I like putting them in APC. For the other bins which change more often and need to be consistent from web server to web server, I prefer having those in Memcache.

TwoD’s picture

I gave the #41 patch a test run but had a few issues.

Only actually clearing the cache in the class destructor risks keeping old data cached too long. For example; I create a cache instance for some table, read from the cache and update a couple of rows in the table. I naturally make sure to invalidate/delete the cached version of the updated rows. But then I hold on to the cache instance while doing more stuff which needs to fetch more rows from the same table, potentially including the rows I updated earlier. To be efficient in case someone else already fetched some of the data sets I'm interested in, I ask the cache first...

That could be solved by perhaps making the $drush_flush array keyed by the bin and cid and always look in there before returning any cached rows. I'd love to get around to trying that, but I couldn't right now and needed a quicker solution. So, I simply moved everything from the destructor into the clear method and everything related to $drush_flush.
Also made a few smaller tweaks to make debugging a bit easier and make a checks less verbose.

+++ b/drupal_apc_cache.incundefined
@@ -101,6 +112,38 @@ class DrupalAPCCache implements DrupalCacheInterface {
+    if (!empty($this->drush_flush) && isset($GLOBALS['conf']['apc_nodes']) && !empty($GLOBALS['conf']['apc_nodes'])) {

!empty($GLOBALS['conf']['apc_nodes']) is enough, no need for isset() first.

I'm currently using the code provided in the patch here on a production server, but I would not recommend anyone else doing it before a more thorough review has been done.

Btw, for some reason, clearing some user caches, like the menu cache, in APC takes several minutes and almost always times out when done over XMLRPC. I can't use Admin menu's shortcut to clear all caches either, that always times out... Anyone else having this issue with APC?

EDIT: I forgot to mention that the #41 patch also logged the "@todo" warning message on every Drush command which accessed the cached because Drush was available but there was nothing to flush in the destructor.

chrisjlee’s picture

Status: Needs work » Needs review
arosboro’s picture

I applied the patch in 60, and it works on a single server with apc running locally.

However, when I disable/re-enable a module upon using drush pm-enable, I get:

xmlrpc() error: (-32700) Parse error. Not well formed [error]

It's appearing multiple times when I use drush up too.

chadsten’s picture

Subscribe.

helmo’s picture

One quick minor remark.

+++ b/drupal_apc_cache.inc
@@ -38,6 +38,11 @@ class DrupalAPCCache implements DrupalCacheInterface {
+   * @var boolean
+   */

This variable could use a better name.. e.g 'has_drush'

R.Muilwijk’s picture

Just reviewed the patch from #60 and this are my thoughts:
- The patch naming is based on using Drush. The clearing is also only limited to Drush 'if ($this->drush)'. Is this really neccessary? For example if we have 4 webnodes and on webnode 3 some cid is cleared. Wouldn't we want this to happen on all nodes also?
- The XMLRPC Callbacks are called apc_drush_ though it's possible to also clear non APC caches.
- Instead of using Drupal's API cache_clear_all we are directly retrieving the object with _cache_get_object($bin). Is there a specific reason to do so?
- For going through the nodes the $GLOBALS['conf']['apc_nodes'] is used. Is there a reason we cannot use variable_get('apc_nodes', array())?
- When a particular webnode is temporary not available what would the default timeout be for a XML Request?

MrHaroldA’s picture

Status: Needs review » Needs work

@R.Muilwijk, what about doing a code sprint somewhere in the (near) future? We meet somewhere and code for a couple of hours? This issue is taking far too long to complete this way.

R.Muilwijk’s picture

MrHaroldA, fine with me. Though if I have answers to my questions I'm willing to commit ;)

Mixologic’s picture

The problem Im having, is that Drush/CLI php cannot clear the apc cache thats tied to the apache process, as they are in separate memory spaces, and as a result, apc module is going crazy with error messages on all my drushing.

So, to re-state what I think the problem is:
In order for drush to clear any caches, it needs to reach out to the webserver via http as thats the only method to get access to APC's opcode/user variable memory space.

A method whereby one can remotely clear a cache remotely could potentially be useful for another use case - cache invalidation across multiple web servers.

Challenges

1. For the site Im building right now, a 'drush cc all' invokes the clear method 56 times. Every time a variable is set or deleted a variable, thats another cache clear. So thats a substantial amount of http requests, and thats only if we're trying to propagate all these granular cache clearing events across a single webserver from drush to http. If we extend that to multiple web heads, we'll have N extra http requests for N web heads. All of that extra network traffic starts to look like the same performance tradeoff you'd get with memcached vs APC.

2. If we want to selectively empty caches, we have to have a way to determine whether a cache clearing event should propagate to other servers, or whether we are responding to an event that was propagated *from* another server, and therefore should not be propagated.

If we're using any drupal bootstrap at all (and we should, in order to have authentication and security), then we run into other circular issues: Example:

  1. A cache clearing event happens on a webhead.
  2. It propagates it to another webhead via xmlrpc
  3. The other webhead begins bootstrapping, and before we get to the cache clearing event, another cache clearing event is triggered. Should this now propagate to other servers?

If we stick to only managing CLI->HTTP its fairly easy to tell which environment we're in.

The use case for clearing APC based caches cache on multiple web heads is arguably more edge case (only useful for certain infrequently changing caches like cache_class_cache_bootstrap), and is broken now for both front-end and CLI uses (any cache clear action that uses APC as its backend is only going to hit whichever web head responds to the request). And making this work for a multi head APC enviroment basically involves re-inventing what memcached already does.

Attached is a modification of #60, that

  1. Dumps the entire cache if you are using drush on the first cache clearing event. (reducing the sheer volume of xmlrpc calls).
  2. Still loops through the apc_nodes, so in theory you could still use this "drush cc all" to accomplish a total cache flush of all your webheads.

Still a little concerned that *any* drush command that happens to make a cache clearing event in any form would trash the cache on a production server (for example, a drush command on a cronjob running every five minutes)... so.. this might not be exactly what we want without *some* sort of caveat.

pounard’s picture

I'm still not confortable with hitting with HTTP requests all frontends, this forces us to maintain a list of frontends, while using a notification system in a common storage would avoid us needing to do that.

Mixologic’s picture

So rather than using xmlrpc to flush the drupal caches, we could instead use the DrupalQueue to store up any cache clearing events that are happening on the drush side, and then as part of hook_init in apc.module we could pop the cache clearing events off of the stack.

This only really works for one frontend, because 'common storage' that all frontends can access = memcached.

APC is not architected to be a user data object store in a multi-headed environment, and while certain caches that are less affected by staleness and syncing issues can be implemented on top of them, I think its out of scope for this issue.

I went ahead and re-rolled using this method, and, its a *lot* cleaner. Much less code, drush commands will only dump the caches they're supposed to.

MrHaroldA’s picture

@Mixologic, that actually looks very promising! If, and when APC should cover multiple web nodes, it could even be expanded to store and claim in node-specific queues.

I wonder what R.Muilwijk has to say about this approach ...

Mixologic’s picture

Status: Needs work » Needs review

Should be needs review

crea’s picture

Sounds like an async cache clear. Really bad idea for consistency reasons.

Jorrit’s picture

The memcache module uses a 'generation' integer that is increased every time a cache bin is cleared, stored in a variable. It is used as part of the cache key, so increasing the integer invalidates the cache immediately. The downside is that the cache is actually not cleared, but using APC's expiration system this may be not a problem. Has this been considered?

MrHaroldA’s picture

Sounds like an async cache clear. Really bad idea for consistency reasons.

It also adds a query to every page load to check the queue.

The memcache module uses a 'generation' integer

Where is that integer stored? Or where should APC store that integer?

Jorrit’s picture

stored in a variable.

By that I mean the variable_get kind of variable. So ultimately: the database.

crea’s picture

I would like to vote for #74 as the most promising solution

Mixologic’s picture

@crea Im not sure what you mean by

"Sounds like an async cache clear. Really bad idea for consistency reasons."

Drush makes changes to the underlying database, thus making the cache in APC stale, and inconsistent with the underlying data. So when drush does this, it needs to trigger APC to invalidate its caches. In all instances where a cache is cleared as a result of a drush command, *not* clearing the cache in APC means you are already inconsistent with whats in the cache vs. what *should* be in the cache.

Relying on a variable is problematic.
1. Variables themselves are cached, so drush could update that variable and it would not be reflected on the front end.
2. That variable cannot be stored in APC, because the problem we're trying to solve here is that drush cannot access whats in APC..
3. Memcache can use a variable because drush is able to communicate with memcache.
4. Memcache is using the generation integer to solve the multiple node issue, not to solve 'how drush can communicate with memcache'.

Also

It also adds a query to every page load to check the queue.

It only about .5 ms to all my page loads. There are only two fundamental ways that drush can signal to the webserver that its caches are stale - passively with webserver polling (a database query on every pageload) or Actively with httpd connections. XMLRPC/HTTP/SOAP what have you is so terribly non-performant as to make drush useless, which is why I threw up that earlier patch that just dumped everything all at once.

crea’s picture

Mixologic
async cache clear is bad, because u trigger it in one request, and process in another. And between the requests you leave the system is in unpredictable state - drush got report of cache clear, but continues to work with the old cache.

Mixologic’s picture

@crea - I think you're not understanding the problem here.
Drush does not use the cache. Because we've configured the cache to use APC, every drush request will create and destroy its own APC cache, on every invocation of drush.

Drush makes changes to the database, which means that the http front end is continuing to work with the 'old/stale' cache.

Asynchonous cache clearing is *not* bad and is not a problem whatsoever. Asynchrous updates to the underlying database, without clearing all caches built from that database *are* bad, and that is the problem we're trying to solve here.

Finally, this isn't even asynchronous. As soon as drush reports that a cache bin needs to be cleared, the very next http invocation will clear that cache bin before it gets a chance to use the stale data.

btopro’s picture

need to do some more testing on #70 tomorrow but this seems viable. Initial tests seemed promising, will need to fill cache bins then try to destroy them via drush w/ data that I know will be stale.

btopro’s picture

confirmed this morning that #70 works. +1 committal and if the number increment on bins ever gets a patch that seems like a separate issue. I could see either method being used though I'm not sure if the number incrementing would cause fragmentation issues since you wouldn't be releasing the cache bin data you'd just be creating new ones.

MrHaroldA’s picture

Isn't it true that the cache is only emptied when a user requests an uncached page (so that hook_init() is called)? Which in theory could take up to a couple of hours?

And what if you're using multiple servers that have their own APC cache? Only the first server will get it's cache purged, right?

Just picking my brain on this ...

btopro’s picture

yeah, but this provides a possible solution of:

Run drush cc all
go to site via browser, login and bins will be cleared

vs currently there is no solution.

No clue on multiple servers, would need someone to step up and confirm if that works or not. Guessing no since it's just using the built in php APC functions for clearing data but I don't understand multi-server APC setup.

Another thought is is drupal bootstrapped enough in hook_boot to be able to run the init portion? I don't think this is an end-all-be-all patch, I just think it actually provides a potential solution.

MrHaroldA’s picture

In #70 Mixologic already mentioned "This only really works for one frontend"

#74 sounded perfect, but #78 noted that variables are cached too, so that's also not viable ...

I frankly don't know what to do next.

btopro’s picture

Maybe allow for capabilities like this via sub-modules, this being one possible solution?

Replace the current part that queues items w/ a call to a new hook like module_invoke_all('apc_cache_clean_up', $this, $cid, $wildcard), then move the hook_init() part of the patch to a new sub-module called apc_drush_cc_webnode (better name needed). Then other approaches discussed in this thread could be added to other, small sub-modules as potential solutions present themselves.

Limitations of this approach being needs to be hit w/ an auth'ed call to trigger the clear, doesn't always hit all bins (I think it did in my testing though) and doesn't clear across environments. Other approaches may have additional limitations but I feel like this is so specific to the type of environment you are deploying Drupal / APC in that a hook based approach may be the most flexible solution.

pounard’s picture

I still think that using some kind of variable (or a single SQL query) to check for the last purge timestamp would be a better solution than sending HTTP requests to all nodes, and moreover will be really failsafe. This would cost an extra query to the SQL backend (because we actually cannot use the variable table) but considering the time we can save at bootstrap by using APC this could be a good compromise.

The patch in #70 is not good because you can store and read cache entries way before it's being called, i.e. the cache_bootstrap bin for example.

MrHaroldA’s picture

I still think that using some kind of variable (or a single SQL query) to check for the last purge timestamp would be a better solution ... This would cost an extra query to the SQL backend

That could actually be hidden behind a setting for those who don't need/want it!

pounard’s picture

Yes of course

twistor’s picture

Issue summary: View changes
FileSize
4.21 KB

I have to agree that using the DB is the only sane way to do this. It adds a single db query, but it's fully indexed, so it won't even hit the table on Innodb.

Here's a quick idea of how this could work. Rather than add another setting to disable it, we could just check whether the module is enabled or not.

I'm not sure if I followed all the logic for NULL, *, empty, or whatever that Drupal does.

twistor’s picture

Of course that was too simple, module_exists() isn't available yet.

pounard’s picture

We can reduce the number queries by SELECT * FROM {last_purge}'ing directly only once during bootstrap instead of querying for each bin. There is between 10 or 20 bins, which makes 20 SELECT'ed lines. [EDIT: I am stupid I misread your patch, sorry...]

We need an option to be able to deport this SQL query into another backend (cache or variable) depending on how is the system. For example, if the site admin does not use APC for cache_bootstrap, we can use variables and therefore elminate this query. This system must also be optional (case where there is only one frontend). [EDIT: I misread twice, I have to buy me new eyes!]

Patch looks good.

twistor’s picture

My patch has exposed another bug.

If you clear cache from drush, then refresh, you'll see a lot of messages like "Warning: Invalid argument supplied for foreach() in field_attach_load()" and "Notice: Trying to get property of non-object in field_attach_load()".

This is because getMultiple() is setting all of the cache keys, even when they're false. This is not correct, or at least what people are expecting.

pounard’s picture

Nice catch.

Jooblay.net’s picture

testing

Anonymous’s picture

Hello folks, glad to see this issue is progressing. Looking forward to it.

Fengtan recently published Drush Webservice and I realized there might be some interest in what options that module might provide here. I haven't tried it yet, but I plan to use webservice for one project soon and I'm thinking I might run some tests to see if it would accomplish the same thing.

RaulMuroc’s picture

After apply latest module (comment #93) the warning disappeared for me. I guess it is correctly cleaning cache.

Jibus’s picture

Same as #97

Patch from #93 applied and worked nicely ! Thanks !

(Tested on a PGSQL database)

Mixologic’s picture

  1. +++ b/drupal_apc_cache.inc
    @@ -147,6 +179,13 @@ class DrupalAPCCache implements DrupalCacheInterface {
    +    if ($cache->created < $this->lastPurged) {
    +      $this->flush();
    

    Using a timestamp to determine whether or not a cache should be cleared can be problematic. If your CLI timezone variable is PST, and your webserver is using GMT, you may run into an issue where caches stay stale for 8 hours, for example. In drush we're explicitly clearing a cache - why dont we carry that explicit directive over to the front end and not depend on something unstable like a timestamp. Edit - timestamps are not subject to timezone issues. They are, by default, in GMT.

@pounards insight in #87 was spot on:

The patch in #70 is not good because you can store and read cache entries way before it's being called, i.e. the cache_bootstrap bin for example.

Attached is a patch that still leverages the database to record necessary cache clearing events (using drupal queue), and moves the cache clearing check into the cache constructor, much like the patch in #93.

Note that this patch is specific to drush: APC should *not* be used on multiple web heads. I mentioned in #70 that "This only really works for one frontend" because APC really only works for one frontend. APC is not a distributed key value store. Memcached and Redis are both great examples of a distributed key value store, and if you *need* to run a key value on multiple web heads, then you should use them instead of APC. But APC's key/value is faster! Its faster precisely because it does not have all of the logic and network overhead required to make it distributed. So, by the time you bake in everything necessary to make APC work with multiple webheads, you'll likely end up with equivalent or worse performance than either memcached or Redis.

Mixologic’s picture

FileSize
1.49 KB

Bad comments.

Mixologic’s picture

Mixologic’s picture

pounard’s picture

Using a queue for this is a very nice way of achieving. I do like it because it looks like a nice journaling mecanism. Thought it could be optimized a bit by clearing the queue once you claimed at least one wildcard clear order.

This behavior should still be enabled or disabled by a variable as the #93 patch does, instead of relying on drush presence, this would avoid people not using drush to end up with useless queue queries.

Another note, it would be simpler to claim only one queue item, and drop the full cache bin when having one, I doubt that people would switch huge cache bins such as field or page on APC: they would consume their RAM so quickly that APC would spend its time purging least recently used entries; Considering this we'd better not even bother trying doing things right by reading queued orders and reading them, but just clear completly the bin each time you got any order. That'd probably need some profiling but I seriously doubt anyone sane would give GB's of their RAM to APC.

Even thought I agree APC should not be used when you have multiple frontends, I continue to think that people might use it anyway, case in which the queue method won't work anymore because you drop the queued items when consuming it making them disapear for other frontends. And we're back to #93 for such environments.

Mixologic’s picture

Thought it could be optimized a bit by clearing the queue once you claimed at least one wildcard clear order.

True. I was watching it do 130 queue pulls to flush all the caches. At the same time, this isn't something that happens often - it just makes that next page after a drush clearing event take a few more milliseconds.

This behavior should still be enabled or disabled by a variable as the #93 patch does, instead of relying on drush presence, this would avoid people not using drush to end up with useless queue queries.

If so, then this needs to be something on a settings page, not a hidden variable that you would only find by looking at the code. (apc_show_debug is a good example of a hidden feature baked in).

I doubt that people would switch huge cache bins such as field or page on APC

Thats not a decision we should make. I have a site with minimal content on a server with 32 GB of ram. I have plenty of ram to support putting the entire cache in APC.

I continue to think that people might use it anyway

- why should invalid configurations be supported? People can also try to run APC using fastcgi or fcgid instead of php-fpm, and they would be shooting themselves in the foot, with a user cache per process. The only situation I can see where you might get away with using APC usercache with multiple processes or frontends is where you don't care about cache consistency at all. In which case, how would any of these patches help?

pounard’s picture

As I recall FPM processes share the same OPCode and user cache; The same goes for PHPCGI I supposed since it's the ancestor (not that different) of FPM. So it's actually not an invalid configuration.

It's not invalid to use a cache at the frontend level even if you have multiple frontends, the only thing you need is to have a good sync mecanism, #93 was not that bad in that regard. In that use case with this mecanism you have 1 extra SQL query per page but a lot of other remote queries could go to APC instead of a remote server, I mean if the sysadmin decided it was worth the shot after doing some measures, then it does not sound illegal at all to me.

Mixologic’s picture

As I recall FPM processes share the same OPCode and user cache; The same goes for PHPCGI I supposed since it's the ancestor (not that different) of FPM. So it's actually not an invalid configuration.

Thats not true. PHP-FPM fixed the problem that both fastcgi and fcgid had with having an opcode cache and user cache per process. The bug report is here: https://bugs.php.net/bug.php?id=57825 and note the comments, particularly the one from rasmus@php.net on 2011-07-08. More info on this stackoverflow thread: http://stackoverflow.com/questions/598444/how-to-share-apc-cache-between...

If you are trying to use an APC user cache on an apache system with fastcgi, or with fcgid, you are going to have data consistency issues between processes with multiple opcode caches (bad for memory consumption) and multiple user caches (bad for consistency). This should not be something we should try and support.

Other than this thread, I can find no examples of people advocating the use of APC on multiple front ends. In fact, one of the ways people *do* ensure consistency in APC caches is to wrap it in another distributed cache (http://fourkitchens.com/blog/2009/11/30/intelligent-memcached-apc-intera...). Every example I see all say if you want to use a key/value datastore, you cannot use apc if you need more than one webserver.

We cannot prevent people from doing foolish things and misconfiguring their webservers, but we certainly shouldn't give people the impression that this type of configuration is reasonable by adding weak support for it.

#93 only causes the front end to reset it's cached data if the cid = '*'. There are many cases where a cache would get cleared out by drush, yet not hit the * cid. This would result in cache data inconsistency.

pounard’s picture

Ok so I didn't recall correctly, I'm used to Nginx when using FPM, so I don't have that kind of problem on my own.

#93 does not handle other than wildcard flush, that's true, but it could just send a full purge each time, doesn't sound that bad if you use it on small caches such as the bootstrap bin.

torotil’s picture

Here is a patch based on #60 and addressing the issues raised in #68:

  • The callback can handle non-wildcard cache clears.
  • Only one XMLRPC request is made per drush invocation (for each node): Clear-events are stored in a request-wide variable and sent via hook_drush_exit().
  • It implements a simple mechanism to stop remote-cache-clears from triggering other remote-cache-clears: It sets $conf['apc_nopropagate'].

This is still drush specific but it is rather easy to generalize. The following steps would be needed:

  1. Remove the if ($this->drush) conditional for logging cache clears.
  2. Use another (or an additional) trigger than hook_apc_exit() for sending the XMLRPC requests.
  3. (optional) Introduce a new config variable that tells APC which node we are currently on. This is needed to avoid triggering remote cache clears on the currently active node. Perhaps such extra flushes wouldn't do any harm though.
torotil’s picture

FileSize
4.41 KB

This one also applies to 7.x-1.x.

Mixologic’s picture

FileSize
4.32 KB

This is great, I it solves the performance crush of too many xmlrpc requests, eliminates the need for any excessive hits to the database, and keeps the clears tightly synced.

Couple of things:

+++ b/drupal_apc_cache.inc
@@ -288,4 +297,40 @@ class DrupalAPCCache implements DrupalCacheInterface {
+    if (empty($GLOBALS['conf']['apc_nopropagate']) && !empty($GLOBALS['conf']['apc_nodes'])) {

$GLOBALS['conf']['apc_nopropagate'] would only ever get set during an XMLRPC callback, and as such would never be true. So this setting doesnt actually do anything, and continues to promote the idea that you can use APC on multiple servers.

+++ b/drupal_apc_cache.inc
@@ -288,4 +297,40 @@ class DrupalAPCCache implements DrupalCacheInterface {
+      foreach ($GLOBALS['conf']['apc_nodes'] as $apc_node) {

This introduces that problem mentioned in #69 - namely that now we have to maintain a list of nodes somewhere, which could be different between development and production, and there's no way to manage that list with this patch. It's just a hidden setting one has to put in settings.php.

+++ b/apc.module
@@ -1,4 +1,56 @@
+/**
+ * Implements hook_drush_exit().
+ */
+function apc_drush_exit() {
+  DrupalAPCCache::remoteFlush();
+}
+

This has to go in apc.drush.inc - hook_drush_exit doesnt get called in .module files.

Mixologic’s picture

FileSize
4.57 KB

Whoops. Forgot to include the apc.drush.inc file in the patch.

Exploratus’s picture

Tried #111. Seems to be working, notices are not showing up.

Ravenight’s picture

#111 seems to work for me too.

jewseppi’s picture

Category: Feature request » Bug report
Status: Needs review » Needs work

#93 resolves the CLI warning without any issue.

#109 resolves CLI warning but causes a new error.

'all' cache was cleared in /var/www/drupal/htdocs#default [success]
xmlrpc() error: (-11001) php_network_getaddresses: getaddrinfo [error]
failed: Name or service not known

torotil’s picture

#111 relies on the base_url to be set correctly. (ie. to be the correct URL for calling the site). Without any additional configuration drush will generate a temporary base-url http://sitename that won't resolve. That's why you get the network error.

#109 solves this by requiring the user to actively set the correct base_url (or optionally more than one).

Configuring drush to know the correct base_url is not such a bad idea in the first place. At least your cron-jobs should have that anyway. To do so you can either use the --uri/-l option of drush or configure a site alias.

In general I'd prefer to have a configurable list of urls to send the xmlrpc-requests to and let it fall back to array($base_url). That would open the door again for a multi-server solution (which IMHO isn't even that far away).

Mixologic’s picture

Configuring drush to know the correct base_url is not such a bad idea in the first place.

Perhaps all that is needed is to add a check to see if $base_url is not set (returns http://sitename) and add back in a drush error message that says "base_url is not configured, drush cannot clear the cache"

Can we please keep this issue on track and not continue to derailing it with ideas of 'multi servers'. The scope of this issue is to bridge the gap between drush and the webserver. Lets get that working and committed first.

There are myriad issues with having multiple instances APC user caches, and these patches are miles away from addressing all the synchronization, race conditions, and data integrity issues that can result from having multiple copies of the cache. All of those issues exist whether you are clearing cache from drush or from the front end, so realistically they can/should be solved independently, so if somebody wants to tackle all of that, can we open a separate issue for it?

torotil’s picture

During further tests I've had one issue with the xmlrpc callback: If I disable apc while it is still configured in the settings.php the webnode has a cache that doesn't know about apc's xmlrpc-callback. This means I'm unable to flush APC-caches on the webnode.

So I've attempted a few things at once:

  1. Use a separate bootstrap to clear the caches.
  2. Allow multiple nodes again.
  3. Clear caches of all nodes not only when called from drush.
  4. Avoid double cache clears on the very same webnode.

In my tests this seems to work just fine.

torotil’s picture

Here is the patch with a separate flush.php file but without multi-site capabilities.

torotil’s picture

Status: Needs work » Needs review
FileSize
4.5 KB

… and here is the XMLRPC based approach with an additional warning if the $base_url == 'http://sitename'.

Mixologic’s picture

FileSize
4.87 KB
375 bytes

119 - was missing the apc.drush.inc file - attached one + the interdiff

Also, added some require_once into that drush_exit as it would throw a Fatal error: Class 'DrupalAPCCache' not found if you were doing something other than cache operations.

118 - Im pretty sure we wouldnt want to add additional php files that respond directly to requests. It increases the attack surface area of a drupal install, so something like this should probably go through the security team- so Im not sure what we need a flush.php file for instead of just using xmlrpc.

117 - How would this mitigate race conditions, and how would you set up a test to show that they are mitigated?

torotil’s picture

119, 120: I've introduced a if (class_exists('DrupalAPCCache')) into the other patches for that reason. If the class was not loaded, chances are it wasn't used to flush any caches.

118: having to use a stale cache to be able to flush caches is a bad idea in general. as described above this lead to a race condition where I was not able to flush APC on the webnode at all (neither via drush nor directly). The extra bootstrap solves this by not using any caches (except variables for the cron-key and the bin-prefix). At least it reduces the number of caches that we use during a cache flush to a few known. We might still reduce that by using a HMAC for authentication instead of the form-key.

Regarding your security concerns:

  1. The extra file does nothing except if the right cron-key is passed.
  2. It uses only two user provided values: the clears array and the nonce (in the multisite variant). In theory someone (knowing the cron-key) could pass a manipulated clears variable instead. Note that that's possible for the xmlrpc callback too! - the only difference is the unserializing step. A HMAC based authentication could mitigate that risk though.

117: is prone to the very same race conditions as 118 and still less than 119.

torotil’s picture

FileSize
4.67 KB

Here is a HMAC based version of 118 with the extra class_exists() check. This is what I'll most likely use. The only thing that it relies upon is that the bin prefixes (if they are configured in the database) do not change.

Mixologic’s picture

FileSize
4.8 KB
431 bytes

119, 120: I've introduced a if (class_exists('DrupalAPCCache')) into the other patches for that reason. If the class was not loaded, chances are it wasn't used to flush any caches. - Thats a much better idea. Attached is that update.

118: / 122 : Unless Im misunderstanding something, what you describe is not a race condition, it's a misconfiguration issue that only happens if you have an invalid cache backend specified in your settings.php. Unless we're trying to make it so that the cache class can be used *without* enabling the apc module (and still support drush), which, I don't think is a requirement.

Security: $_SERVER['PHP_SELF'] is a field that contains user input. It would be pretty farfetched that an attacker would A. be able to create a directory named flush.php somewhere under document root, and B. upload a file named flush.php/dirname/includes/bootstrap.inc somewhere under that docroot. Its the kind of vector that, on its own is pretty trivial, but could be combined with other vectors allow for a breach.

117: I don't think we're talking about the same thing when I say 'race condition'. The fundamental reason multiple APC cache nodes is a bad idea is that caches are effectively data. Data that may get relied upon to create other data. If you are storing this data in two places, there is a very real possibility that separate requests will need to manipulate the same data in different ways. When that happens, you have a race condition where whichever request saves first will either be overwritten by the second request with now stale information.

torotil’s picture

118: / 122 : Unless Im misunderstanding something, what you describe is not a race condition, it's a misconfiguration issue that only happens if you have an invalid cache backend specified in your settings.php. Unless we're trying to make it so that the cache class can be used *without* enabling the apc module (and still support drush), which, I don't think is a requirement.

For us it is: APC has to be configured in settings.php. The module state is managed in the database. Usually our deployment does first configure all files and then bootstrap the database. Having to change the settings.php after installing requires an extra step and also complicates the usual dev-staging-production setup.

Security: $_SERVER['PHP_SELF'] is a field that contains user input.

$_SERVER['PHP_SELF'] is the path to the currently executed PHP script. If a user can manipulate that you have a really serious problem with your server setup.

It would be pretty farfetched that an attacker would A. be able to create a directory named flush.php somewhere under document root, and B. upload a file named flush.php/dirname/includes/bootstrap.inc somewhere under that docroot. Its the kind of vector that, on its own is pretty trivial, but could be combined with other vectors allow for a breach.

How would such a directory affect the script? The attacker would need to create an includes/bootstrap.inc somewhere between DRUPAL_ROOT/ and DRUPAL_ROOT/sites/modules/…/apc/. So if the attacker is able to hide arbitrary code somewhere in the docroot of your webserver (which means he has write-access to your docroot outside of a files directory): Why not just manipulate drupals index.php or create a separate php-script and call it? I think in almost all cases this comes down to the statement: If someone is able to execute arbitrary PHP-code on your server he can do so using by using a forged bootstrap.inc too.

torotil’s picture

117: I don't think we're talking about the same thing when I say 'race condition'. The fundamental reason multiple APC cache nodes is a bad idea is that caches are effectively data. Data that may get relied upon to create other data. If you are storing this data in two places, there is a very real possibility that separate requests will need to manipulate the same data in different ways. When that happens, you have a race condition where whichever request saves first will either be overwritten by the second request with now stale information.

That's the same with every cache. It's a tradeoff between speed and consistency. In theory we even have race conditions with drupals core cache and multiple simultaneous requests! (Request A doing several cache clears while B does something else that uses the partially flushed cache)

What exactly do you think is the essential difference between

  • drush communicating a flush to 1 webnode
  • drush communicating a flush to N webnodes
  • 1 webnode communicating a flush to (N-1) other webnodes

?

Mixologic’s picture

re: 124:
It's not clear to me if that is a problem with enabling the module in general, or if you have a non-standard deployment process that has issues with this.
Is it that if you enable this in settings.php, and you have in update_N hook to enable the module in production, that when deploying that code, it can't actually run the update hook because it's already enabled in settings.php? Or does drush just fail to clear cache that one time until the module is enabled?

$_SERVER['PHP_SELF'] is user submitted data.

if you go to
http://myserver.com/sites/all/modules/contrib/apc/flush.php/user/submitted/data/here , it will execute flush.php, and $_SERVER['PHP_SELF'] will be set to '/sites/all/modules/contrib/apc/flush.php/user/submitted/data/here'. Thats not a server configuration issue, thats by design and well defined behavior.

This makes $_SERVER['PHP_SELF'] vulnerable to XSS attacks if it is ever printed without sanitation. But In this particular case I was imagining a scenario where it was vulnerable to a directory traversal attack. i.e.
https://myserver.com/sites/all/modules/contrib/apc/flush.php/../../../../default/files/somedir.

Im unable to pass the directory traversal strings into $_SERVER['PHP_SELF'] as apache interprets them when looking for the script to execute, and no amount of encoding seems to be able to pass the traversal paths to PHP_SELF. - That doesnt mean it cant be done, just that I don't know how to do it.

In any case flush.php may be secure and have no vulnerabilities whatsoever. But the only way to know that is to have many people look at it, so if thats avoidable, its better.
I.e. Im not trying to have a security discussion about a specific issue, just that if you add a new way to access the server, you need to have a security discussion.

That's the same with every cache. It's a tradeoff between speed and consistency.

Its a tradeoff between speed and *currency*.

The drupal cache is architected in such a way as to maintain consistency. Im not aware of too many issues in the queue where people have complained about strange data corruption and the answer having been 'yeah, caching will do that to you'.

What exactly do you think is the essential difference between

  • drush communicating a flush to 1 webnode
  • drush communicating a flush to N webnodes
  • 1 webnode communicating a flush to (N-1) other webnodes

They probably all suffer from some consistency issues. In some iterations there wasnt a network dependency. xmlrpc adds that back in, so that all would be affected by a network outage. So if the network is down, drush cant flush the cache.

Probably the main difference I see there is that drush is likely to not be happening all the frequently, and so drush to it's own node is probably not that big of a deal, but when you start synchronizing the caches on normal usage, thats when there is likely to be a collision. But again, without going through every possible scenario in core, I cant say for sure that any of this is an issue.

In any case, Issues take so much longer to get resolved unless we minimize the issue scope. In this case it's still (I think) making drush work with 1 webnode. As we keep bolting on parts and pieces to support stuff thats out of scope for the issue, we're going to keep seeing more and more things that need to be added. There may be no issue whatsoever with running on multiple web heads, and my concerns are moot, but that doesn't change the fact that there isn't an admin interface to manage the nodes in these patches.

torotil’s picture

The drupal cache is architected in such a way as to maintain consistency. Im not aware of too many issues in the queue where people have complained about strange data corruption and the answer having been 'yeah, caching will do that to you'.

I'm pretty sure we have such race-conditions in drupal core. The reason it doesn't show is because the caches are inconsistent only for very short periods of time. And in almost all cases the results are harmless or not noticeable. Let's just take the menu cache and the bootstrap cache. During activation of a module cache_clear_all() is called. It first clears the bootstrap cache then the menu cache. If in the tiny fraction of a second another request is made to drupal it is possible that it will use the stale data of the menu cache while using the a current version of the bootstrap cache. In short: drupal cache clears are neither atomic nor transactional - so it can't be consistent in all circumstances.

Back to the actual problem:

$_SERVER['PHP_SELF'] being manipulateable is due to a rather peculiar behavior of apache (I'd expect my web-server to return a 404 in such a case). So you're right that in that case the server might execute a bootstrab.inc outside of the drupal-root.

Using $_SERVER['SCRIPT_NAME'] should be safe though.

Mixologic’s picture

I still do not see the need for flush.php other than the one time that you enable apc caching on your servers, which is non-standard from how I understand typical dev/stage/production deployments to work. Deploying code with the settings.php change in it, and then running update.php with an update hook that enables the module as part of the deployment process should work fine. If thats what you're doing, and it doesnt, then thats another story, but flush.php seems unnecessary.

twistor’s picture

I don't think any kind of web request synchronization will be remotely feasible.

@torotil, you seem to be confusing what consistency and race-conditions we're talking about. Consistency is only per cache bin. The menu cache and the bootstap cache are unrelated and know nothing about each other. Cache clears can absolutely be atomic and transactional. It depends on the cache backend. But, that should definitely be a goal. While yes, there are race conditions in Drupal core, they are considered extremely important and urgent issues and not primarily involving the cache system.

Using a web request could mean that caches are out of sync for multiple seconds. flush.php is an optimization that is a non-starter. Any kind of synchronization that goes through Apache/HTTP is a non-start though in my opinion.

I think the real way to solve this problem is to make the APC cache wrap a secondary cache backend. This could be redis, memcache, the db, whatever. In a similar fashion to #93, we could store the cache_clear timestamps in the wrapped backend. Any cache sets, will get written though to the secondary backend.

APC would make a single get from the wrapped cache backend to check if its data is expired, and if it is expired it would check the wrapped cache backend for a value. If it finds a value it stores it locally. If it doesn't, only then would it generate a cache entry. This would allow for a much higher level of consistency and have the added bonus of only making one server generate the values.

I know the patch I wrote in #93 didn't account for all of the various ways of clearing a cache, I stated so at the time. It was just a POC. All of that logic can be added to this design. I believe memcache is already doing something very similar with how it stores cache clear times in variables. It's complex, and pounard is probably correct in that it would be acceptable to just rebuild the whole cache. This design would at least limit that to a single server.

I know we can't force users to only use APC for smaller read-heavy caches, but we should instruct them to do so. If they don't, then they own the broken pieces.

Mixologic’s picture

@twistor: The architecture you're describing sounds similar to what facebook implemented a few years ago.
http://www.slideshare.net/guoqing75/4069180-caching-performance-lessons-...

There are a lot of slides in the presentation that appear to be blanked out, but they are there if you download the slideshow. Pretty informative, even for slides - they are from 2008, but I think most of the architecture still applies.

Most important is Slide 11, what they store in the APC cache. They also indicate on slide 16 that they store a key in memcached to use for invalidation.

I still firmly believe that all this discussion of making apc work on multiple webheads should be its own issue. Most of the use cases for apc as a cache backend are *not* going to involve multiple webheads, and trying to solve that problem is blocking the immediate need, which is make drush, on a single server, clear the in memory apc cache.

iamjon’s picture

Status: Needs review » Reviewed & tested by the community

Applied and used the patches from #123. Works as advertised. The only caveat is to add --uri at the end of the cache clear call
so instead of drush cc all -y you would do soemthing like drush cc all -y --uri="yoursite.com"

btopro’s picture

+1 if this works against 1 server; that's at least a solution of some kind instead of right now there being no capability to do this. more complex scenarios can be accounted for as a separate issue.

Sylvain_G’s picture

Anyway php 5.5 will go Zend Opcache, no more APC and apc user space stuff.

Mixologic’s picture

@Sylvain_G - php 5.5 only implements Zend's Opcache. The APC user cache will still be around as APCu (http://pecl.php.net/package/APCu)

APCu is functionally compatible with APC, and as such can be used interchangeably. See the README.md here: https://github.com/krakjoe/apcu

Besides, Drupal 8 is going to be on 5.4, which will be supported for a very, very long time.

Mixologic’s picture

Speaking of Drupal 8, just found out today that APCu is on its way to being part of core.

https://drupal.org/node/1807662

https://drupal.org/node/2231595

So, hey, there's that.

Sylvain_G’s picture

@Mixologic thanks for APCu, wasn't aware

php 5.4 will be end of life march 2015 with is NOT a very very ling time frame

Mixologic’s picture

@Sylvain_G - yes, php 5.4 will be EOL'ed but Drupal 8 will be around for a long time.

Drupal 6 is still technically supported, and it runs on 5.2.x, which was EOL'ed Jan 2011. There's always a time lag between when a version of PHP is eol'ed and when we can actually stop supporting it (when newer versions of Drupal come out).

Hopefully this will be much less of an issue with semantic versioning and D8 and we'll more likely be able to stay on top of updates. It already appears as though d8 is fully forward compatible with 5.6 (doesnt use any deprecated functions from php5.4 or php5.5)

Mixologic’s picture

Also, another useful thing to look at: D8 has a fast, inconsistent cache. Beejeebus just backported it to D7. https://drupal.org/project/schrodicache

btopro’s picture

is the inconsistent cache part supposed to be a joke cause people are anti-APC User bin? Not getting the "inconsistent cache" reference. Reviewed the code, looks like it's a cache backend for the user bin much like this module is (just back ported from D8 so much more object oriented, no UI or documentation). It also doesn't appear to have a solution to the issue raised in this thread since it just has a generic clear function for single vs all bins flushed.

Mixologic’s picture

Yeah, sorry, not a joke. I thought he was backporting the cache chaining features of here: https://drupal.org/node/1651206
and here: https://drupal.org/node/2231595

Looking at the code its, uh, yeah. Simple.

btopro’s picture

Oh I see what their intent is; basically cache in multiple locations and something like APCu with cache bins that get wiped under ram wiping would technically be inconsistent; yeah unfortunately that repo is currently very minimalist / not really that different from this project (yet).

Jibus’s picture

Any chance to get #123 committed ? :)

pimok3000’s picture

after patching the beta4 module with #123 using drush cc all -y i get :
xmlrpc() error: (-32700) Error. Not well formed!
The bunch of warnings about CLI are gone though!
Any ideas how to fix this?

Mixologic’s picture

torstenzenk: Patches are always intended to apply to the dev branch only, not to beta-4. I would back out that patch, upgrade to the dev version, then apply the patch.

btopro’s picture

#123 reviewed and tested. While there are times it can fall down I think providing a baseline to what is currently effectively broken is a better solution then providing no solution at all. Running this in production and very happy that this is playing nicely with my usual clean up routines.

pimok3000’s picture

@Mixologic: yeah of course i did use the deta4-dev, i accidently didn´t mention that, i´m sorry. the error stays the same.
i cleanly uninstall the whole module and downloaded the -dev version to be absolutely sure. Then i patch the module with this one first https://www.drupal.org/node/1984180 and then use #123 with the result i wrote about in #143

R.Muilwijk’s picture

The patch from comment 123 https://www.drupal.org/node/1565716#comment-8669945 looks promising. However shouldn't a check be done whether the the module exists? At the moment the setup is just to include the class and add it to your $conf array. The module is not required for the cache bin to work.

R.Muilwijk’s picture

Attached is #123 with a check whether the module is enabled. The only thing missing is a configurable $conf['apc_webnodes'] array for when you are running multiple web nodes and need to clear the cache on all of them using drush.

MrHaroldA’s picture

IIRC, the consensis was that you shouldn't run APC on multiple webnodes to begin with. It's been a while, though ;)

Mixologic’s picture

FileSize
542 bytes

Interdiff looks good.

Good catch on the module check.

I hadn't realized there were situations where hook_drush_exit could get called if the module were not enabled, but a drush help, for example, would trigger that call: http://drupal.stackexchange.com/questions/108038/does-hook-drush-exit-ge...

I would still call this patch RTBC, and create another issue to allow for the addition and configuration of multiple web nodes if you *really* know what you're going to be caching and can avoid race conditions, which, using drupal 7 is currently not possible without a core patch: https://www.drupal.org/node/1679344

R.Muilwijk’s picture

Status: Reviewed & tested by the community » Fixed

Commited to dev.

  • R.Muilwijk committed 9c9fdf1 on 7.x-1.x
    Issue #1565716 by Mixologic, torotil, twistor, MrHaroldA, TwoD, R....

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

mr.andrey’s picture

Just adding a note here for reference.

The following error:

xmlrpc() error: (0) php_network_getaddresses: getaddrinfo failed: Name or service not known                                       [error]
The base_url might not be set correctly try using the -l/--uri option for drush.                                                  [warning]

Is resolved by creating sites/default/drushrc.php with the following contents (with "mysite.com" being your actual site name):

<?php
$options['l'] = 'http://mysite.com';

Cheers,
Andrey.