Testing memcache.inc, minus sessions support, and all getMulti operations are failing (see screenshot).

I'm running pecl.php.net/memcache-2.2.5, memcached 1.4.5, Drupal 6.19.

My $conf settings are minimal:

$conf['cache_inc'] = './sites/all/modules/memcache/memcache.inc';
$conf['memcache_key_prefix'] = 'drupal6';

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

robertDouglass’s picture

Status: Active » Fixed

This has been fixed in two unrelated issues, one where the logic was optimized, and another where the improper use of REQUEST_TIME was causing misses: #900436: Optimize memcache::valid(), #925096: Use of REQUEST_TIME improper, incomplete

robertDouglass’s picture

Status: Fixed » Active

Ugh, I can no longer recreate the performance gain I was seeing! Damn!!!

Back to 2% hit rates.

rfay’s picture

I confirm almost all cacheMulti's are misses. Is there a workaround for this?

Jeremy’s picture

#rfay, what version of the module are you using? If latest dev, what is the ID (top of the file) of the memcache.inc that you're using?

rfay’s picture

Hi @Jeremy, I don't have $Id$ because I'm using git, but

Last commit was

Author: robertDouglass <robertDouglass>
Date:   Tue Sep 28 12:44:41 2010 +0000

    Updating README requirements

(I just pulled and now have two new commits:)

commit 6bd76fa90904dba0cbe66c14fa423a3ab99edaab
Author: robertDouglass <robertDouglass>
Date:   Wed Sep 29 11:47:00 2010 +0000

    #828664 by catch | Jeremy: Fixed Blocked user session regeneration.

commit a937eea61b32fa920a0805a92e30f0924ac77964
Author: robertDouglass <robertDouglass>
Date:   Wed Sep 29 11:23:01 2010 +0000

    #856276 by nycnate, srjosh | whytewolf: Fixed Call to undefined method Memca

Just tested and nearly all getMulti's are still misses.

get	cache	wstest-cache-variables	1
get	cache	wstest-cache-locale%3Aen-working	1
getMulti	cache	wstest-cache-.wildcard-cached41d8cd98f00b204e9800998ecf8427e	0
getMulti	cache	wstest-cache-.wildcard-cache2db95e8e1a9267b7a1188556b2013b33	0
getMulti	cache	wstest-cache-.wildcard-cache7ce8636c076f5f42316676f7ca5ccfbe	0
getMulti	cache	wstest-cache-.wildcard-cachece04bd8105c135f89b0dfd5d9c1e8c5d	0
getMulti	cache	wstest-cache-.wildcard-cache7c9cab8054b37b29093b2bacb1d19d1a	0
getMulti	cache	wstest-cache-.wildcard-cachef5ddaf0ca7929578b408c909429f68f2	0
getMulti	cache	wstest-cache-.wildcard-cachefb216d9e8791e63c8d12bdc420956839	0

Environment: Authenticated user behind varnish. Behavior is the same if you go directly to the Apache port. Single server configuration:

# Memcached configuration
$conf['cache_inc'] = './sites/all/modules/contrib/memcache/memcache.inc';
$conf['memcache_key_prefix'] = 'wstest';

Thanks for your work on this!

robertDouglass’s picture

@rfay's results are consistent with my testing. I _think_ that it is behaving as anticipated, only the number of misses to get one hit is very high.

rfay’s picture

I downgraded to PHP 5.2 and the behavior is the same. Just for interest's sake.

rfay’s picture

I'm doing some performance testing using jmeter and am quite confused by the results. If you want me to take this to a support request, I will happily, but it may be related to this issue.

I'm testing a single content page, with an authenticated user. My test subject and the test driver are on different machines on the same network (no real network issues to deal with here; that's not what I'm interested in).

I'm using the "simple" setup:

# Memcached configuration
$conf['cache_inc'] = './sites/all/modules/contrib/memcache/memcache.inc';
$conf['memcache_key_prefix'] = 'wstest';

My test subject is doing nothing except what I do to it.

So I hit a page and check the stats before and after (using memcached's telnet interface, the stats command). This page has been hit before; anything cacheable should be cached.

I get an increment of:
cmd_get: 72
get_hits: 72
get_misses: 363

It's my assumption that the 363 is the getMulti operations that all fail.

The worst thing about this is that I can't detect any performance improvement between using memcache and not using it (it may be worse). Using jmeter to check the delivery of the page alone (without embedded assets) I see times like these, which seem to say that memcache is degrading performance. (Each of shows the average response time of 25 runs on a single content-type page. The run of 25 was run 3 times, so that's the 3 results quoted in each cell.)

User Type With memcache Without memcache
Unauthenticated 27/31/38 21/23/20
Authenticated 173/186/193 199/161/160

Can you give me a clue here? Are my expectations off, or my configuration, or my technique?

Thanks,
-Randy

longwave’s picture

@rfay: can you test again with the patch from #926994: Condition in wildcards() function will always evaluate to TRUE due to variable name mistake? There is a typo in the function that calls getMulti which may fix this.

mindrok’s picture

Issue tags: +memcache low hit / high miss

I'm also having the similar issues with memcache.

Here is memcache stats from admin:

get cache cache-variables 0
set cache cache-variables
get cache cache-mollom_form_cache 1
getMulti cache cache-.wildcard-cached41d8cd98f00b204e9800998ecf8427e 0
getMulti cache cache-.wildcard-cache6f8f57715090da2632453988d9a1501b 0
getMulti cache cache-.wildcard-cache27c9d5187cd283f8d160ec1ed2b5ac89 0
getMulti cache cache-.wildcard-cache4fcb130eb3b47f0d3916d1cd92a5b4be 0
getMulti cache cache-.wildcard-cache68d90eb013f51c689f9ebea83a920858 0
getMulti cache cache-.wildcard-cache5d8d54232f52ff0a3f902b201995afb4 0
getMulti cache cache-.wildcard-cache41f90e722366a67d9109e557c8d51d55 0
getMulti cache cache-.wildcard-cachea31decf40b525bb9a0a2b40f8233c485 0
getMulti cache cache-.wildcard-cache38e843cb28ab8f29f3e434f5cea71326 0
getMulti cache cache-.wildcard-cachedc98ddb11285878211e24ea95a2eec29 0
getMulti cache cache-.wildcard-cache93108d73a40aa95909051afe989b3015 0
getMulti cache cache-.wildcard-cachef124b494a0ab1bba3cb6c546df240c1e 0
getMulti cache cache-.wildcard-cachef845595613012b1fcb6a27336308d466 0
getMulti cache cache-.wildcard-cache1a0ff825aafdf0916c50c328f4202ff4 0
getMulti cache cache-.wildcard-cache0a4e70ce6dc979c53e33449503f549d5 0
getMulti cache cache-.wildcard-cache389357c3c00d03a300eb8eb47923ef48 0
getMulti cache cache-.wildcard-cacheaebc3dbf551756f4febd6b4a6a7eefd8 0
getMulti cache cache-.wildcard-cachec6aad2ca52bfc6e58cd6b7bc263a599c 0
get cache cache-theme_registry%3Azen 1
getMulti cache cache-.wildcard-cached41d8cd98f00b204e9800998ecf8427e 0
getMulti cache cache-.wildcard-cachee358efa489f58062f10dd7316b65649e 0
getMulti cache cache-.wildcard-cache1fdc0f893412ce55f0d2811821b84d3b 0
getMulti cache cache-.wildcard-cache8fc42c6ddf9966db3b09e84365034357 0
getMulti cache cache-.wildcard-cachee886b5e8a9e6a8b1a306ac1efb18ec84 0

... continues into the thousands.

I have a less than a 2% hit rate. I've used an application similar to apc.php but specifically for memcache and I get the same dismal results.

My Environment:

Centos 5.5
Php 5.2.10
Drupal 6.19

I read about the Zlib issue listed on the install page of memcache on Drupal. I do have zlib enabled but compression is off.

Thanks in Advance.

StevenWill’s picture

I am having the same issue.

rfay’s picture

Jeremy’s picture

Please grab the latest CVS code and try again.

rfay’s picture

I get the same results with the latest commit:

commit cfd589a4e6bd80e4e46a9a7a0e1e9de74e095b80
Author: Jeremy <Jeremy>
Date:   Fri Oct 1 14:24:36 2010 +0000

    Bug #926994: typo in wildcards, performance.
Jeremy’s picture

How is memcache configured in your settings.php? How are you using wildcards?

rfay’s picture

#8 has my configuration, as completely as I know to explain - let me know what more you'd want.

I don't even know what a wildcard is, so if you'll instruct me or point me to the information source, I'll try to find out and let you know.

jmiccolis’s picture

I'm experiencing the exact same thing as rfay describes in #8 - I'm also not sure if my expectations are off, nor do I understand the purpose of the wildcard code. However, I am sure that my performance is worse. The only additional info I can give is that on my system I'm giving memcached 512 megs and it's using less than 5.

Is there any where (ticket/code comment/etc..) that summarizes that purpose and design of the wildcard infrastructure? The code itself is rather opaque.

SimonVlc’s picture

Same problem here using 2 clusters and 4 different bins.

Version 6.1.6 (tried too with dev). memcache-2.2.5, memcached 1.4.5, Drupal 6.19.

Ubuntu Lucid, PHP 5.3.

greggles’s picture

subscribe.

longwave’s picture

Status: Active » Needs review
FileSize
7.63 KB

When memcache_wildcards() is called, it assumes that any length of wildcard prefix could exist for a cache id; this explains the large number of getMulti cache misses, but most wildcard cache clears are for prefixes with a small number of different lengths. It may also check the same prefixes repeatedly on the same page load, when this is not necessary.

This patch stores the length of the wildcard whenever a cache clear occurs so it only has to check wildcard lengths that are known to have been cleared in the past, and then statically caches this subset of possible wildcards, which drastically reduces the number of calls to memcached.

It also removes the assumption that a wildcard is invalid after 28 days, which I think could cause old permanent cache entries to be wrongly returned in some circumstances, and changes the caching of temporary items to the minimum cache lifetime instead of 30 days (or doesn't cache them at all if no lifetime is set), which I think is the correct behaviour.

Finally, it only hashes cache ids over 160 characters, which has the helpful side effect of making the wildcard lookups more understandable when cache statistics are enabled.

longwave’s picture

Fixed a bug in the static caching, use this patch instead.

jcmarco’s picture

Yes! a big change with this patch.
From a 5% to a 75% hit rate.

rfay’s picture

Using #21 my hitrate (reported by memcache_admin) goes up significantly, but there are still a shocking number of misses (I'm not over 50% hits). Just reloading the same page time after time has a number of standard misses. That doesn't seem right.

Here's the top of the report for a simple content page being reloaded after many reloads:

Memcache statistics

Operation Bin Key Hit
get cache wstest-cache-variables 1
get cache wstest-cache-locale%3Aen-working 1
getMulti cache wstest-cache-.wildcard-cachelocale%3A 0
getMulti cache wstest-cache-.wildcard-cachelocale%3Aen-worki 0
getMulti cache wstest-cache-.wildcard-cachelocale%3Aen-work 0
get cache wstest-cache-context 1
getMulti cache wstest-cache-.wildcard-cachecontext 1
get cache wstest-cache-plugins%3Acontext%3Aplugins 1
getMulti cache wstest-cache-.wildcard-cacheplugins 0
getMulti cache wstest-cache-.wildcard-cacheplugins%3Acontext 1
getMulti cache wstest-cache-.wildcard-cacheplugins%3Acontex 0
get cache wstest-cache-css_injector%3Arules 1
getMulti cache wstest-cache-.wildcard-cachecss_inj 0
getMulti cache wstest-cache-.wildcard-cachecss_injector%3Aru 0
getMulti cache wstest-cache-.wildcard-cachecss_injector%3Ar 0
get cache wstest-cache-schema 1
getMulti cache wstest-cache-.wildcard-cacheschema 0
get cache_content wstest-cache_content-content_type_info%3Aen-working 1
getMulti cache_content wstest-cache_content-.wildcard-cache_content 1
bibo’s picture

Subscribe

kbahey’s picture

I am seeing the same thing, on a high traffic live site, so not sure if I can test this again, but here goes.

Hit rates are dismal after upgrading from 6.x-1.5 to 6.x-1.6

Hits:     61711028 (4.2%)
Misses: 1413121043 (95.8%)
Used: 771.4 MBytes (37.7%)


Current Items(total)             356513 (11892520)
Hits                           61711028
Misses                       1413121043
Request Rate (hits, misses)     4767.81 cache requests/second
Hit Rate                         199.50 cache requests/second
Miss Rate                       4568.31 cache requests/second
Set Rate                          38.45 cache requests/second

Applying the patch in #21 makes things better, but too early to say, since it been only a few minutes after restarting the memcached daemon:

Hits:   93905    (71.6%)
Misses: 37188    (28.4%)
Used: 76.9 MBytes (3.8%)

Current Items(total)        36852 (36889)
Hits                        93905
Misses                      37188
Request Rate (hits, misses) 375.62 cache requests/second
Hit Rate                    269.07 cache requests/second
Miss Rate                   106.56 cache requests/second
Set Rate                    105.70 cache requests/second

Not back above the 90% hit rate, but an improvement.

Jeremy’s picture

Status: Needs review » Needs work

It looks like the recent statistics changes to dmemcache.inc broke the new wildcard logic -- I'd not synced that file since the changes were made, and once I did wildcard lookups started failing in my tests. Need to debug this regression further...

kbahey’s picture

Hits are up to 86% now after 40 minutes from restarting memcached.

So #21 seems to be a good solution.

kbahey’s picture

Getting there.

Hits are above 90% now.

Hits:   1101345 (91.2%)
Misses:  106590 ( 8.8%)
Used:     205.3 MBytes (10.0%)

Current Items(total)    103004 (287950)
Hits                    1101345
Misses                   106590
Request Rate (hits, misses) 245.02 cache requests/second
Hit Rate                    223.40 cache requests/second
Miss Rate                    21.62 cache requests/second
Set Rate                     58.42 cache requests/second
longwave’s picture

Status: Needs work » Needs review
FileSize
7.4 KB

This is an updated patch which simplifies the static wildcard caching from the previous version, and also removes the need for memcache.inc to do any hashing at all, as dmemcache_key() does this anyway. It also restores the assumption that wildcards are invalid after 28 days and the temporary cache lifetime as now I think I understand why these were added in the first place.

Uncommenting line 244 will store wildcard counts of zero, which means the same as a cache miss, but improves the hit ratio even further - but this is not an indicator of performance.

kbahey’s picture

The hit rate has improved up to 94.7% for 3 hours of operation, with patch #21.

Hits: 2275562 (94.7%)
Misses: 126466 (5.3%)
Used: 236.0 MBytes (11.5%) 	 

Current Items(total)	118627 (597908)
Hits	2275562
Misses	126466
Request Rate (hits, misses)	229.68 cache requests/second
Hit Rate	217.59 cache requests/second
Miss Rate	12.09 cache requests/second
Set Rate	57.20 cache requests/second

I now applied the new patch in #29 and it seems to work. I will report back after it is running for several hours.

catch’s picture

#29 looks good, I did a quick port to D7 and ran a test script (via Jeremy from a previous issue), however it fails most of those tests. Not yet sure if it's the original patch or my port that broke them.

Posting the port and the test script for reference.

jcmarco’s picture

Testing new #29 with the same testing than before in #21 and it gives and initial hit ratio of 81%.
So it looks better.

longwave’s picture

Updated version of #31 attached for D7, which passes the test script.

robertDouglass’s picture

I haven't reviewed the patches yet. I did, however, run wget -m on my local site before and after patching, and in each case I get a 70% hit rate (against 6.x-1.x-dev)

jcmarco’s picture

Since #29 I get this error:
Warning: Wrong parameter count for max() in cache_get() (line 42 of xxxxx/sites/all/modules/memcache/memcache.inc).

and when trying to clear a selective cache (drush cc menu) I get this error:
WD php: Warning: Cannot use a scalar value as an array in memcache_wildcards() (line 255 of [error]
/home/webapps/clon/releases/20100905172231/sites/all/modules/memcache/memcache.inc).

The stored array in memcache_wildcard_timestamps is wrong, it seems that there are some problems evaluating the condition in line 254,
with this extra parenthesis it works fine and store the variable memcache_wildcard_timestamps with an array with lenghts and timestamps:

    if (!isset($wildcard_timestamps[$table][$length]) ||
      ( ($_SERVER['REQUEST_TIME'] - $wildcard_timestamps[$table][$length]) > ($wildcard_invalidate / 4) ) ) {
catch’s picture

Was in the middle of fixing #31 when I saw #33 posted, confirmed the test script passed, initial profiling looks pretty good as well.

Jeremy’s picture

I am certain that currently pre-patch the wildcard logic is busted (which likely is the cause of the performance/high-miss rate). I've not yet reviewed this patch to confirm that it fixes performance or basic functionality (at first glance I like some of the ideas, but have concerns about other changes), as I first plan to debug where things broke and restore basic functionality. In any case, we should hold off on before/after performance comparisons until basic functionality is restored as until then they're not going to measure actual performance.

Tracking this regression here:
#931036: Wildcard flushing not working

I apologize for the delay on resolving this issue, but I just returned to the US this weekend after several weeks traveling abroad and I'm quite distracted by life. I consider this a very important memcache module priority, but I likely won't get to it until Tuesday or Wednesday of this week.

longwave’s picture

Updated the patches again for both D6 and D7. Testers beware: the bin flush codepath is different depending on whether cache_lifetime is zero or non-zero, and you should restart your memcached between tests to get true results; I also had to delete memcache_wildcard_timestamps earlier in testing but that no longer seems necessary.

longwave’s picture

@Jeremy: what are your concerns regarding this patch? It should reduce cache misses (wildcards that will never exist), network usage (no multiple trips to memcached for the same wildcards), memory usage (static wildcard caching per table only, instead of per table and cid) and CPU time (no need to hash keys twice). However, I have only done basic testing, so there could be some implication that I haven't thought of.

kbahey’s picture

As promised, things are much better with #29, hit rate is 97% after 8+ hours.

Hits: 4694934 (96.9%)
Misses: 149902 (3.1%)
Used: 268.9 MBytes (13.1%)

Cache Information
Current Items(total) 133683 (1073607)
Hits 4694934
Misses 149902
Request Rate (hits, misses) 155.43 cache requests/second
Hit Rate 150.62 cache requests/second
Miss Rate 4.81 cache requests/second
Set Rate 34.44 cache requests/second

I can't test the others for now.

kbahey’s picture

Status: Needs review » Needs work

I saw the same error as @jcmarco in #35

So I tried the D6 one in #38, and there is still a notice:

This line:

  if (!empty($wildcard_timestamps[$table]) &&
      max($wildcard_timestamps[$table]) >= ($_SERVER['REQUEST_TIME'] - $wildcard_invalidate)) {

triggers a:

max(): When only one parameter is given, it must be an array in custom_cache.inc on line 44

jcmarco’s picture

Status: Needs work » Needs review

@kbahey: Try to delete the variable memcache_wildcard_timestamps directly from the db before testing because once it is set it never gets clear.

That one should be clear when cache_clear_all()

jcmarco’s picture

Patch from #38 has worst performance than #35 (#29 + fix).

About 55% hit rate with my "testing".

q0rban’s picture

Subscribe

kbahey’s picture

@jcmarco

I deleted that variable and restarted memcached. There are no more notices for the max() call.

Too early to say what the hit rate is. Will post back in a few hours.

kbahey’s picture

Using #38 for D6, 3.5 hours, with memcached restart, here are the statistics:

94.9%, so pretty good.

Used: 239.2 MBytes (11.7%)
Hits: 2409149 (94.9%)
Misses: 129030 (5.1%)

Cache Information
Current Items(total) 119072 (525557)
Hits 2409149
Misses 129030
Request Rate (hits, misses) 201.35 cache requests/second
Hit Rate 191.11 cache requests/second
Miss Rate 10.24 cache requests/second
Set Rate 41.69 cache requests/second

stephanbauer’s picture

65% for me. Restarted memcached after applying the patch in #38.

Uptime: 16 hours and 55 minutes

Used: 55.0 MBytes (43.0%)
Hits: 585402 (65.0%)
Misses: 314992 (35.0%)

Cache Information
Current Items(total) 32434 (78591)
Hits 585402
Misses 314992
Request Rate (hits, misses) 14.78 cache requests/second
Hit Rate 9.61 cache requests/second
Miss Rate 5.17 cache requests/second
Set Rate 1.29 cache requests/second

kbahey’s picture

Yes, after some time, it went back to 70%.

Here it is after 23 hours.

Hits: 16729404 (70.8%)
Misses: 6905033 (29.2%)
Used: 396.9 MBytes (19.4%)

Current Items(total) 189186 (3328940)
Hits 16729404
Misses 6905033
Request Rate (hits, misses) 282.00 cache requests/second
Hit Rate 199.61 cache requests/second
Miss Rate 82.39 cache requests/second
Set Rate 39.72 cache requests/second

longwave’s picture

As more cache clears occur with different wildcards over time, cache misses will increase and the hit ratio will decrease; this is unavoidable due to the lack of wildcard support in memcached, so the code has to simulate them by storing and checking more keys for each cache get. Cache request rate and actual site performance (ab/JMeter testing) are much better indicators of performance than hit ratio when evaluating this patch.

I think there is another optimisation that can be done to my patch - the wildcard length variable can be cleared per bin when the whole bin is cleared - but I don't have time to implement and test it right now.

kbahey’s picture

Without benchmarks, and running this on a live site, I can say that this last patch is adequate. Server load is below 1 or a bit above it. Better have that than what is in 6.x-1.6.

kongoji’s picture

@kbahey in #41:
try to change the code from line 43 in file memcache.inc

<?php
  if (!empty($wildcard_timestamps[$table]) &&
      $wildcard_timestamps[$table] >= ($_SERVER['REQUEST_TIME'] - $wildcard_invalidate)) {
    // Previously cached content won't have ->flushes defined.  We could
    // force flush, but instead leave this up to the site admin.
    $flushes = isset($cache->flushes) ? (int)$cache->flushes : 0;
    if ($flushes < memcache_wildcard_flushes($cid, $table)) {
      return FALSE;
    }
  }
?>

with

<?php
  if (!empty($wildcard_timestamps[$table])){
	  if(is_array($wildcard_timestamps[$table])===TRUE){
		$max = max($wildcard_timestamps[$table]);
	  } else {
		$max = $wildcard_timestamps[$table];
	  }
	  if($max >= ($_SERVER['REQUEST_TIME'] - $wildcard_invalidate)) {
		// Previously cached content won't have ->flushes defined.  We could
		// force flush, but instead leave this up to the site admin.
		$flushes = isset($cache->flushes) ? (int)$cache->flushes : 0;
		if ($flushes < memcache_wildcard_flushes($cid, $table)) {
		  return FALSE;
		}
	  }
  }
?>
iamer’s picture

What's the status with this ? I have a high traffic site that feels like it is being hit with the same bug. I had tested single cache_get and cache_set with a custom script but forgot about wildcards.

What kind of information do you need to help diagnose this?

Thanks.

robertDouglass’s picture

FileSize
1.59 KB

I think I've identified a fundamental problem in the getMulti design which might be contributing to the low hit rates in at least some cases. Attached is a patch that you can apply to the current 6.x-1.x code to see the problem in action. There is currently only one place where the cache is ever set with the .wildcard keys. It's in memcache.inc, and the abridged version of the code looks like this:

if ($flush) {
  dmemcache_set('.wildcard-' . $table . $hash, 1, 0, $table);
}

Thus getting the cache set depends wholly on the $flush variable, which is a very active thing - caches have to be flushed by the application. The proper behaviour is for a cache to be set if a get() was issued and it comes up blank. So I think the dmemcache_get_multi() function has to report back a total failure if none of the keys hit, and if total failure happens, a set() has to follow. This would be consistent with the behaviour of the rest of the caching.

To use the patch to see what I mean, apply it and then restart your memcached daemon. It will fail to hit the wildcard caches and it will also not set them. Then go to your performance page and hit flush caches, and viola, caches get set.

robertDouglass’s picture

FileSize
8.45 KB

I tried the D6 patch from #38, incorporated the fix from #51 and tested it. With that patch, getMulti is never ever called. I'm attaching a patch that sheds some light onto why, but it comes down to this:

function memcache_wildcards($cid, $table, $flush = FALSE) {
  static $wildcards = array();
  $result = array();

  $wildcard_timestamps = variable_get('memcache_wildcard_timestamps', array());
  $wildcard_invalidate = variable_get('memcache_wildcard_invalidate', MEMCACHE_WILDCARD_INVALIDATE);
  
  // This condition never passes because  is_array($wildcard_timestamps[$table]) is always FALSE.
  if (isset($wildcard_timestamps[$table]) && is_array($wildcard_timestamps[$table])) {
robertDouglass’s picture

Priority: Normal » Critical
Status: Needs review » Needs work

@kbahey you need to look at more than hitrate to see if things are working properly. They are not.

Marking this as critical as this issue renders the whole project worse than useless.

Nick_vh’s picture

robertDouglass, just a quick intervention but does this also happen with the cacherouter module and their memcache implementation?
As far as I know they use something similar with memcache

robertDouglass’s picture

@Nick_vh: they're wholly unrelated projects, so no, I don't expect this to affect anything other than the Memcache project.

kbahey’s picture

@robertDouglass

Yes, something is not right.

This is before the #38 patch, with 6.x-1.6, a severely broken release.

Free: 107.2 MBytes (10.5%)
Used: 916.8 MBytes (89.5%)

Hits: 79731054 (5.1%)
Misses: 1487986362 (94.9%)

Current Items(total) 134942 (12462623)
Hits 79731054
Misses 1487986362
Request Rate (hits, misses) 3226.23 cache requests/second
Hit Rate 164.08 cache requests/second
Miss Rate 3062.15 cache requests/second
Set Rate 25.67 cache requests/second

This is after the patch in #38 is applied.

Free: 1002.8 MBytes (97.9%)
Used: 21.2 MBytes (2.1%)

Hits: 76775 (75.9%)
Misses: 24425 (24.1%)

Current Items(total) 11786 (16265)
Hits 76775
Misses 24425
Request Rate (hits, misses) 221.93 cache requests/second
Hit Rate 168.37 cache requests/second
Miss Rate 53.56 cache requests/second
Set Rate 35.67 cache requests/second

Ignore the hit/miss ratio. Look at the request rate. Before the patch, it is 14.5X after the patch, with the bulk in the misses.\

Two sites I manage have this patch now, and unfortunately, both are very high traffic and I can't put debug stuff on them.

I am wondering how far the issue you identified goes. Is it post 6.x-1.5 or goes far back more than that.

robertDouglass’s picture

@kbahey - I'm recommending people to revert to 1.4 and disable the memcache_admin module (which generated the security release). I think the getMulti wildcard clearing was released prematurely. It's possible the regression came in the 1.5-1.6 release, but this isn't confirmed.

q0rban’s picture

Hi Robert. Is there a good/consistent way of benchmarking this, so I can test the effects of reverting to 1.4? You mention to kbahey that "you need to look at more than hitrate to see if things are working properly." What do you look at to ensure that it is working properly?

kbahey’s picture

@robertDouglass

I have reverted one client back to 6.x-1.5. I like 1.5 over 1.4 because it has the single bin feature that makes configuration so much easier. The site seems to be doing well, with the load and CPU usage being less than with 1.6.

For another client, they use a custom modified memcache.inc that replaces certain strings in the cached pages (described elsewhere why that is), and hence we made changes to memcache.inc in 1.6 and stuck with that. It seems they are too well tuned to see too much ill effects. Perhaps I should use 1.5 for them too and report back.

Yes, in both cases, I disabled memcache_admin, since it not really in use. memcache.php does a similar job or the memcached-tool with the stat option, from the command line.

bedlam’s picture

subscribe.

Jeremy’s picture

Priority: Critical » Normal

The basic wildcard functionality was fixed in this ticket:
#931036: Wildcard flushing not working

So the remaining issue is a performance optimization. In 1.5, the wildcard support performed well in the core use case, but didn't scale when deployed on a large Drupal 7 website. The logic in 1.6 was still being optimized when the release was made before it was ready (to address a security issue). That optimization work is still ongoing... At this point, I intend to review the optimizations that longwave has posted with the goal of making less multi-gets when we're validating against wildcard flushes.

Jeremy’s picture

Title: All getMulti operations are failing to hit cache » A high percentage of getMulti operations are failing to hit cache

@longwave: Well done! Your patch indeed gets rid of a huge number of multiget requests (especially in the core usage), greatly optimizing the overhead in this wildcard logic. I renamed some of the variables and added more documentation to better explain what's going on, but your logic is very much there. Committed to the 7.x branch, still need to backport this to the 6.x branch with my changes to keep the two branches in sync.

Folks will of course still see multi-get failures, but this is perfectly normal. And the more wildcard is used, the more they will see -- but this still manages to avoid a huge percentage of them. Testers should be validating basic functionality (that caches clear when they're supposed to) more than anything, reporting hit rates doesn't really help us.

7.x commit message;
http://drupal.org/cvs?commit=435380

Jeremy’s picture

Status: Needs work » Fixed

Backported to the 6.x branch:
http://drupal.org/cvs?commit=435406

Note, I renamed memcache_wildcard_timestamps to memcache_wildcard_flushes as the structure of that array changes and breaks things. If memcache had an install file we could have called a variable_del, alternatively we could have added logic to the code to detect the old format array and change it on the fly but that would be kruft. Renaming the variable seemed the cleanest solution.

If you want to keep your variable table clean of kruft, delete the old memcache_wildcard_timestamps variable from your database when upgrading to this release.

Jeremy’s picture

I forward ported the rename from wildcard_timestamps to wildcard_flushes, to provide a clean upgrade path for websites running the old code:
http://drupal.org/cvs?commit=435410

q0rban’s picture

Status: Fixed » Needs work

I forward ported the rename from wildcard_timestamps to wildcard_flushes, to provide a clean upgrade path for websites running the old code

This is a tricky problem to solve, since there is no Memcache module, per se, so we can't write a hook_update_N() function to remove the old variable. I think Memcache_admin could house this though. Best to clean up after ourselves where we can, and get that cruft out of the database.

Jeremy’s picture

My intention is to simply document the command in the release notes. I had thought of putting it in the memcache_admin module, but that seems out of place at best.

q0rban’s picture

Status: Needs work » Needs review
FileSize
1.33 KB

Attached patch adds an update function to memcache_admin.install that deletes the memcache_wildcard_timestamps variable.

Jeremy’s picture

I'll let Robert and the other memcache module maintainers weigh in as to whether or not this is a direction they want to go with the memcache_admin module.

q0rban’s picture

Yeah, I'm not sure either. It's definitely not ideal, but the alternative is to add just a stub module for Memcache that doesn't do anything but maintain any required database changes that occur (which really only should be managing creation/deletion of variables). This has problems of it's own, though, since A.) It may never get enabled and B.) Turning on the module wouldn't do anything, so support requests may come in on how to use the module and what it does.

Maybe there just needs to be some clear documentation with each release on SysAdmin tasks for upgrading. I'd just fear that that would get lost alongside all the other documentation that this module provides. :(

longwave’s picture

This update hook doesn't *need* to be run for memcache to work; leaving the variable in place will only waste a small number of bytes, but it is good to clear out cruft wherever possible, so I think memcache_admin is the best place to do this - I can't see any better option that won't be more confusing or add more documentation. Perhaps the query message in $ret should explain to the administrator running the update why the variable was deleted, rather than just logging the variable_del() line.

Jeremy’s picture

Status: Needs review » Fixed

I documented the cleanup of this variable in the release notes:
http://drupal.org/node/945518

Putting this in memcache_admin won't work, as there's no requirement that websites install memcache_admin. And installing it for the first time wouldn't invoke _update scripts. Documentation really looks like the only option available to us at this time.

greggles’s picture

You could also delete the variable in the hook_install() of memcache_admin.

I think most folks using the module are comfortable with other methods of deleting variables, so I don't see that as a huge benefit.

SimonVlc’s picture

Version: 6.x-1.x-dev » 6.x-1.7
Status: Fixed » Active
FileSize
26.98 KB

I just updated to memcache 6.x.-1.7 this morning, and I´m getting the same problem (minorished). With 6.x-1.6 the hits were 8%, and with 6.x-1.7 went up to 45%, but still getting tons of errors from getMulti variables.

Jeremy’s picture

Status: Active » Fixed

Reviewing your provided screenshot, this is now working exactly as it's designed. (Specifically, it's getting a subset of all possible sub-cids for a given cid, which are the ones that could potentially match a previous wildcard flush.) Yes the hit ratio goes down with this logic, but it's doing exactly what it's supposed to do. The new logic requires more gets to validate wildcards -- fortunately memcache optimizes this as much as it can by using multiGet's instead of gets. If anything, we could clean up the statistics to compare individual get statistics with combined multiget statistics as that's really what you care about -- but that would be done in another issue as a feature. Another option would be to store '0' counters when there is no wildcard flush for a given cid (as is done in one of the earlier patches in this issue), but that would serve no purpose other than inflating statistics.

For a better understanding of how the new wildcard logic works, see #937744: Explain, thoroughly, at a high level, how wildcard flushing is supposed to work..

More long term, I believe the proper long-term plan is to improve the memcache statistics pages, which should be done in another issue.

SimonVlc’s picture

Understood, thanks for your help Jeremy ;).

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

timtrinidad’s picture

I'm not sure if this behavior is part of the same problem (so I'm leaving the ticket closed for now):

1) Start filling out a multistep form.
2) In another window, call cache_get('storage_form-[id]', 'cache_form'); - this prints out the expected $form_state['storage'].
3) Call system_cron(); (which flushes the core cache tables, including 'cache_form')
4) Call cache_get('storage_form-[id]', 'cache_form'); again - this time, it returns FALSE.

After 4), a call to dmemcache_get('storage_form-[id]', 'cache_form') still returns the expected cache.

This behavior is noticed with v1.7, but not v1.5.

Essentially, with v1.7, multistep forms break whenever cron is run. I have seen arguments for removing cache_form from memcache (http://drupal.org/node/959200), but I couldn't figure out what changes were made between 1.5 and 1.7 that would cause this differing behavior.

rfay’s picture

Just a note here: cache_form is not really a cache table; clearing it is wrong, and I don't think system_cron() does that. There's was an issue to rename and clarify in D7 but I think it got moved to D8.

timtrinidad’s picture

I think a problem I found was that cache_clear_all was ignoring the "expire" column, as cache_get only inspects the $cache->created field. This means that system_cron would also be "clearing" cache_form (along with cache, cache_block, cache_filter, cache_page, and cache_menu), regardless of the 'expire' column. Is this expected behavior?

(moving to a new issue - http://drupal.org/node/966114)