I was notified by my hosting company that the databases on 2 of my websites were too big. Over 600MB per database!
I checked and 99% of the records were in the cache_metatag table.
I truncated it to reduce the size of the database but why is this table growing to over 300MB?

Both websites are relatively small. About 15 pages and a newsletter node with attachment once a week.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

DamienMcKenna’s picture

How many languages does the site have? How many records in you node table? How many users and terms are there?

tchurch’s picture

An example of one site (both are similar in design):

1 language, 73 node records, 3 users, 3 vocabularies, 4 terms between them.

DamienMcKenna’s picture

Do you get a lot of 404 errors on your site?

tchurch’s picture

Not many. Only 3-4 per day where people try to hack it but that should be normal.
How do the number of 404 errors matter?

Firemyst’s picture

Priority: Normal » Major

I've just been informed that cache_metatag is 4.5 GB for a site less than four months old. Why is cache_metatag not being cleared properly?

cimo75’s picture

I can confirm this one, my cache_metatags is 160MB right now

tomogden’s picture

Category: support » bug

My cache_metatag table has grown from 0 to 267MB in 34 days. That's like the Blob horror movie. I noted that the expire field is set to zero (0) on all 25,172 records. I am not finding any way to adjust this, and conventional clearing of caches has no effect.

Unless someone can point out a simple way to resolve this issue, I would say this is an unexpected behavior (translated "bug").

Firemyst’s picture

I've checked other cache tables (cache_field) and expire is also 0 on all items.

Drupal should delete if created + cache_lifetime < NOW() (unless cache_lifetime == 0), but it's not happening for the cache_metatag table. I don't know if there is a missing hook that is supposed to exist, or if the cache_set() isn't assigned properly. I'm new to looking into cache this deep.

Dave Reid’s picture

Is anyone not running cron regularly on their site? The cache_metatag will be cleared whenever drupal_flush_all_caches() is run which happens on cron or update.php, in addition to many other events.

Firemyst’s picture

I run cron every 30 minutes for a series of RSS pulls. I can confirm that cron is working.

DamienMcKenna’s picture

cache_clear_all, as called by system_cron, says:

If called with the arguments $cid and $bin set to NULL or omitted, then expirable entries will be cleared from the cache_page and cache_block bins, and the $wildcard argument is ignored.

Clearly the cache handling needs to be improved.

DamienMcKenna’s picture

Just to clarify my last comment - cache_clear_all() only removes expirable entries, but if Metatag isn't setting an expiration on the records then nothing will ever be expired.

DamienMcKenna’s picture

Folks, in your cache_metatag table, how many records do you have with a name that starts "output:global"?

cimo75’s picture

Probably most of them

HyperGlide’s picture

Checking one site I see opposite case where the cache_metatag table has 0 entries.

DamienMcKenna’s picture

@cimo75: Those are all 404 errors. I think we need better handling for 404 pages.

Dave Reid’s picture

@DamienMcKenna: Maybe we should implement our own cache clear on cron then, with added expiration information.

DamienMcKenna’s picture

@dave: We've got a patch to handle 403 pages better (#2090557: Don't load/cache data on 403 & 404 error pages), I think something similar is warranted for 404 pages too, that'd solve a good part of this problem.

tomogden’s picture

Thanks to everyone for these quick responses.

I've checked other cache tables (cache_field) and expire is also 0 on all items.

For me the cache_field table is the other table that is growing at a ridiculous rate. It's about the same size as my cache_metatag table.

Is anyone not running cron regularly on their site?

Yes, indeed. My cache is operating very well and clears some other tables that would otherwise cause issues.

Folks, in your cache_metatag table, how many records do you have with a name that starts "output:global"?

Not many. Most of mine start with "output:node", but there are 2 or 3 for each nid. Here are my counts:

  • Total records: 25,172
  • output:global: 2,204
  • output:node: 21,229
Dave Reid’s picture

Grr, I wrongly assumed that running cron ran drupal_flush_all_caches(), but it does not.

DamienMcKenna’s picture

@Dave: system_cron runs the following:

  $core = array(
    'cache',
    'cache_path',
    'cache_filter',
    'cache_page',
    'cache_form',
    'cache_menu',
  );
  $cache_tables = array_merge(module_invoke_all('flush_caches'), $core);
  foreach ($cache_tables as $table) {
    cache_clear_all(NULL, $table);
  }

Metatag implemen ts hook_flush_caches, so the cache_metatag table does get passed through cache_clear_all() to purge all expired records. However, as mentioned before, Metatag doesn't set an expiration date/time so nothing is expired.

DamienMcKenna’s picture

@tomogden: Does your site have multiple languages?

Dave Reid’s picture

@DamienMcKenna: Yeah for anything with output:* we should be not using the permanent cache and setting an expiration. That's not the intended use of the permanent cache.

tomogden’s picture

@DamienMcKenna: No, we have not done any I18N although there has been a temporary Mandarine page or two posted.

Most of the duplicate entries are identical, but for the time stamp. A third (6,864) of them have a 'https' variation in the URL.

Firemyst’s picture

I don't have direct access to the live database, and my DBA already wiped it with a UNIX cron so it would stay a manageable size. So far on my local VM, I see mostly output:global, but my VM is not the exact scenario that go.com is.

femtox77’s picture

Hi to all!!

Sorry for question...but where can I set expire in meta tag cache?

Thank you

reptilex’s picture

I can confirm this issue, too. I have a website with more or less 10000 nodes in two languages (english and german) and my cache_metatag table is 470 MB big, it has more than 81000 entries, which is 8 times the nodes and users I have. Can I truncate the table? How can I setup the expiration for this cache? Does it have to be this big? Especially for backups, this is a big problem. Something is not right. By the way I'm using the open graph module too. That's why I needed metatag for.

sahuni’s picture

Same for me : cache_metatag fills extraordinary quickly.
Perhaps a track : I deleted hundreds of nodes and my website has a lot of 404 errors.

reptilex’s picture

Frankly I checked what I was using this module for and created a theme function for it. I don't know how much I'm missing from it. But my articles have almost the same metatags now with the following code in the tempate.php file:

function mytheme_preprocess_node(&$variable){
   if($variables['node']->nid == substr($GLOBALS['_GET']['q'],5)){
	  $path = isset($GLOBALS['_GET']['q']) ? $GLOBALS['_GET']['q']: '<front>';
	  $link = url($path, array('absolute' => TRUE));
		//OG Type
		$element = array(
		    '#tag' => 'meta',
		    '#attributes' => array(
		        'property' => 'og:type',
		        'content' => 'article'
		    ),
		);
		drupal_add_html_head($element, 'meta6');
		//OG Url
		$element = array(
		    '#tag' => 'meta',
		    '#attributes' => array(
		        'property' => 'og:url',
		        'content' => $link
		    ),
		);
		drupal_add_html_head($element, 'meta5');
		//OG Description
		$descr_text_wo_tags = strip_tags($variables['node']->body['und'][0]['safe_value']);
		$descr_text = substr($descr_text_wo_tags,0,strpos($descr_text_wo_tags,'.')+1);
		$element = array(
		    '#tag' => 'meta',
		    '#attributes' => array(
		        'property' => 'og:description',
		        'content' =>$descr_text
		    ),
		);
		drupal_add_html_head($element, 'meta4');
		//meta description
		$element = array(
		    '#tag' => 'meta',
		    '#attributes' => array(
		        'name' => 'Description',
		        'content' =>$descr_text
		    ),
		);
		drupal_add_html_head($element, 'meta4');
		//OG Site name
		$element = array(
		    '#tag' => 'meta',
		    '#attributes' => array(
		        'property' => 'og:site-name',
		        'content' => t(variable_get('site_name', "Default site name"))
		    ),
		);
		drupal_add_html_head($element, 'meta3');
		//OG Title
		$element = array(
		    '#tag' => 'meta',
		    '#attributes' => array(
		        'property' => 'og:title',
		        'content' => $variables['node']->title
		    ),
		);
		drupal_add_html_head($element, 'meta2');
   }
}

Maybe someone finds this useful. I'm sorry for the meta tag team, but I don't need a table that is bigger than my content tables to add meta tags.

criscom’s picture

Same problem here:

A lot of slow queries in the data base caused by cache_metatag

# Time: 131116 11:38:41
# User@Host: abc001de2[abc001de2] @ [10.1.1.63]
# Query_time: 51 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
use abc001de2_domain;
INSERT INTO cache_views (cid, serialized, created, expire, data) VALUES ('views_data:views_entity_file:en', '1', '1384598261', '0', 'a:1:{s:5:\"table\";a:3:{s:4:\"join\";a:3:{s:12:\"file_managed\";a:1:{s:10:\"left_table\";s:12:\"file_managed\";}s:11:\"entity_file\";a:1:{s:10:\"left_table\";s:11:\"entity_file\";}s:17:\"views_entity_file\";a:1:{s:10:\"left_table\";s:17:\"views_entity_file\";}}s:11:\"entity type\";s:4:\"file\";s:5:\"group\";s:4:\"File\";}}');

# Time: 131116 10:53:39
# User@Host: abc001de2[abc001de2] @ [10.1.1.63]
# Query_time: 362 Lock_time: 0 Rows_sent: 865786 Rows_examined: 865786
use abc001de2_domain;
SELECT * FROM `cache_metatag`;

# User@Host: abc001de2[abc001de2] @ [10.1.1.63]
# Query_time: 300 Lock_time: 0 Rows_sent: 865809 Rows_examined: 865809
SELECT * FROM `cache_metatag`;

j0rd’s picture

I cleared caches yesterday on my install after I moved to redis in memory caching.

As of today, I have:

cache_metatag:output:global* is 63k entries.
cache_metatag:output:node* is 15k entries.

45k nodes, but every node has probably around 12 unique urls which relate to it, like node/123/comment node/123/review ....

Do we know why this is happening? Like metatag cache getting created for each 404 request or unique URL?

--- UPDATE

I've gone ahead and dumped my redis using redis-audit which goes over all keys in cache and summarizes them.

yt:cache_metatag:* is taking 70% of my redis memory, which is now up to over 1GB. So after two days, I have 700 MB worth of this data. Seems like way too much.

j0rd’s picture

So looks like anytime a URL is hit, a new global cache entry gets made:

For my website
/retreat/some-node-46235
(node page, makes a cache_metatag:node entry)

/retreat/some-node-46235/reviews
(page related to node, makes cache_metatag:global entry)

/retreat/some-node-46235/reviews?morecache=1
(page related to node, makes new cache_metatag:global entry)

/retreat/some-node-46235/reviews?morecache=12
(page related to node, makes another new cache_metatag:global entry)

/retreat/some-node-46235/reviews?morecache=123
(page related to node, makes a third new cache_metatag:global entry)

Any random 404 page also makes a new global cache entry.

So pretty much every URL (valid or not) creates a new global cache entry.

I did a dump on all my entries, and I have ~80k now, and each one is around 8.5k each. So that's around 670 MB.

Solutions

This probably needs to get resolved in such a way so that we minimize the amount of new global entries created, especially with regards to page arguments and 404s. Problem is some people will want to change their keys based on page arguments, so this would need to be configurable.

Size savings, while waiting for a more permanent solution

From a memory saving standpoint (or database size standpoint) is we could only save the values for global which are different then global, then load global and merge it with changes instead of storing everything. When global changes though, these would need to get invalidated.

XerraX’s picture

Priority: Major » Critical
Issue summary: View changes

I think this actually is a critical bug because it renders my system unusable (4.5 gb in db).

j0rd’s picture

Essentially you can DDOS any Drupal which uses metatag module.

Just hit a bunch of random links and fill up their disk.

Any site which uses redis as cache backend this becomes even easier, since you'll fill up their memory instead of disk.

If they're using memcache, you can force the site to get a bunch of cache misses, by populating it with cache_metatag data, but no other exploit there.

Might take a couple hundred thousand requests, but would be easy enough. On my site, a cache_metatag entry is around 8k.

1000 requests = 8MB
10,000 requests = 80MB
100,000 requests = 800MB
1,000,000 requests = 8GB

Death by 1000 paper cuts.

I'd say this is pretty critical myself as well. Especially since i'm using redis.

My solution is to simply get rid of this module and replace it with the snippet in #29.

DamienMcKenna’s picture

I'll work on a patch for this tonight.

DamienMcKenna’s picture

Issue tags: +Performance
DamienMcKenna’s picture

I've added an updated patch on #2090557: Don't load/cache data on 403 & 404 error pages that will help with 404 pages, please help review it.

DamienMcKenna’s picture

Status: Active » Needs review
FileSize
1.18 KB

Please test this out, after applying #2090557: Don't load/cache data on 403 & 404 error pages. I've completely scrapped using the current_path() in the cache cid, and it doesn't use a crazy-long hash of the cid parts.

Status: Needs review » Needs work

The last submitted patch, 38: metatag-n2062379-38.patch, failed testing.

DamienMcKenna’s picture

Version: 7.x-1.0-beta7 » 7.x-1.x-dev
DamienMcKenna’s picture

Status: Needs work » Needs review

38: metatag-n2062379-38.patch queued for re-testing.

DamienMcKenna’s picture

FileSize
3.07 KB

I'd missed some other $cid_parts uses.

DamienMcKenna’s picture

FYI between the patch from #42 and the one from #2090557: Don't load/cache data on 403 & 404 error pages, it should greatly reduce the number of items cached, especially 403/404 errors.

DamienMcKenna’s picture

FileSize
4.64 KB

Alternatively, this separates the cid logic into a separate function. It also standardizes on some cid values.

j0rd’s picture

I'll test the patches on production server (because i'm an idiot) when I get to the office on Monday. Thanks for your assistance with this. Because I can count the keys easily in redis (and you can to with database) I should be able to see how many these improvements remove.

Any plans on looking into query arguments, which could be tough. Those are not 404s, but rarely I assume people use them to create new metatag changes. Optional setting perhaps, turned off by default?

j0rd’s picture

Status: Needs review » Needs work

I installed the latest dev and this patch, and it completely broke my metatag data. Seems to return bad cache values.

Which leads me to believe, we're missing some simpletests to catch these things.

But overall, it completely broke everything. Seemed to return global caches for most stuff, and in my cache_metatag table, no new entries were getting created for nodes.

I reverted the patch, and things went back to normal (showing appropriate metatag data)

This needs work.

PS. Here's what my keys looked like after a couple minutes. The keys didn't seem to grow the longer the code was live


 1) "yt:cache_metatag:config:node:event:withoutglobal"
 2) "yt:cache_metatag:config:node:training:withglobal"
 3) "yt:cache_metatag:config:node:studio:withglobal"
 // #4 looks bad
 4) "yt:cache_metatag:"
 5) "yt:cache_metatag:config:node:retreat:withglobal"
 6) "yt:cache_metatag:ctools_export:metatag_config:node"
 7) "yt:cache_metatag:ctools_export_index:metatag_config"
 8) "yt:cache_metatag:config:node:teacher:withglobal"
 9) "yt:cache_metatag:ctools_export:metatag_config:global"
10) "yt:cache_metatag:config::withglobal"
11) "yt:cache_metatag:config:node:studio:withoutglobal"
12) "yt:cache_metatag:ctools_export:metatag_config:user"
13) "yt:cache_metatag:config:global:withglobal"
14) "yt:cache_metatag:config:node:teacher:withoutglobal"
15) "yt:cache_metatag:ctools_export:metatag_config:global:frontpage"
16) "yt:cache_metatag:config:node:retreat:withoutglobal"
17) "yt:cache_metatag:config:node:training:withoutglobal"
18) "yt:cache_metatag:ctools_export:metatag_config:taxonomy_term"
19) "yt:cache_metatag:info:en"
20) "yt:cache_metatag:ctools_export:metatag_config:view"

Looking through the code as well, I'm noticing you're truncating cid keys to 128, which will cause problems on keys which are longer. Previously was a 256 hash, which is probably the best solution to avoid conflicts. Although you'll have to use a shorter hash than 256 to account for cache prefix.

This is really a limitation imposed by the cache backends, and IMHO, something they should handle. Unfortunately cache_set with DatabaseBackend doesn't gracefully handle keys longer than 255, except for just truncating them. Other cache backends do handle this problem though.

Memcache has a key length of 250, not 255, and handles this gracefully by hashing CIDs longer then 250. Many also recompile memcache to get longer key lengths.

Redis has a key length of 2GB, and while, I think just ignores the problem entirely.

DamienMcKenna’s picture

The old cache values wouldn't work, but it should work after they're recompiled?

j0rd’s picture

After I installed the patch, I cleared my caches via `drush cc all`. I assume this would flush everything and start rebuilding from nothing.

Is there anything else, I needed to do to make sure this worked as expected?

DamienMcKenna’s picture

Status: Needs work » Needs review
FileSize
9.99 KB

I fixed the problem - bad arguments to substr(). #facepalm.

I've also merged much of the changes from #2090557: Don't load/cache data on 403 & 404 error pages as it was more appropriate in this issue, i.e. there are now wrappers for cache_get() and cache_set() that can be expanded upon later as needed. I've also tweaked metatag_entity_view() so it catches the fact that Panels (i.e. CTools) uses the view mode "page_manager" for the full entity display rather than 'full', the cid_parts logic has a better sorting of the items to (hopefully) make it easier to clear items and updated metatag_metatags_cache_clear() to match. Lets see if this works.

DamienMcKenna’s picture

FileSize
9.97 KB

A slight update that drops the 'language' cid_part and moves up the 'langcode' part.

jibran’s picture

+  if (empty($cid)) {
+    ddebug_backtrace();
+  }

debug code.

DamienMcKenna’s picture

FileSize
9.92 KB

Doh! Good catch - I removed the various kpr() calls but forgot this one.

j0rd’s picture

I've got some important stuff I've got to do early this week, but I should be able to test this on a large live install in a couple days.

Until then, if anyone else could test, please post your results.

Thanks again Damien for the patch.

HyperGlide’s picture

Is this patch required to release beta 8?

DamienMcKenna’s picture

@HyperGilde: Yes.

DamienMcKenna’s picture

FileSize
9.88 KB
DamienMcKenna’s picture

Status: Needs review » Fixed

I've tested this and it appears to be working just fine, so I'm committing it.

gonz’s picture

So is it safe to just Truncate the "cache_meta" table - install the 7.x-1.x-dev and let it run as normal?

Turns out my table bloated to 4.3 Gigs!

Sheesh

DamienMcKenna’s picture

Grabbing the -dev release and running the update scripts will clear out the cache table for you.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

j0rd’s picture

FileSize
19.18 KB

Attached is my stats from munin after applying this patch.

The last "bump" you see around "Week 5" is after applying this patch. Everything else is before, with me pruning stuff manually every couple of days.

We'll see if it eventually tapers out to flat...but so far looks like an improvement.

All the crazy spikes you see are because of metatag module. Nothing else but Drupal is used in redis.

j0rd’s picture

With regards to this change, once it's live those who upgrade are going to have to be careful.

Metatag will no longer differentiate pages with different arguments.

This nailed me upon upgrade. To resolve the problem, I had to re-add specific arguments for which I want the metatags to be changed. For me this was in-regards to apachesolr & facetapi filer arguments. I have different titles search pages with facets, which get indexed.

95% of my arguments I don't really care.

I think the patch can go ahead with a warning about this issue. To resolve this "problem" you have to implement this alter_hook

function mymodule_metatag_page_cache_cid_parts_alter(&$cid_parts) {  if(!empty($_GET['f']) && is_array($_GET['f'])) {
    $facets = array();
    foreach($_GET['f'] as $i => $facet) {
      if(strpos($facet, 'style:') === 0) {
        $facets[] = substr($facet, 6);
      }
    }
    if(!empty($facets)) {
      sort($facets); 
      $cid_parts['facets'] = implode($facets, '-');
    }
  }
}

While I don't recommend changing this how it's currently working, I do believe the cache CID should be implemented along side the "canonical URL" of the page.

DamienMcKenna’s picture

@j0rd: Please add a new issue to cover this, it might be worth adding a feature to control which URL arguments should be considered for the $cid_parts.

j0rd’s picture

Yes, I think if it was configurable to the end user, it would be easier for non-programmers to configure this.

Here's the new issue, i hijacked from someone else.
#2187985: Provide UI to control which URL arguments create different cache records

Greggg’s picture

Is it fixed in 7.x-1.0-beta9 version? I haave the same problem and i i'm afraid that metatags after upgrade will spoil website, all metatags.

DamienMcKenna’s picture

@greggg: If you checked the release notes you'd see the fix for this issue is included. I.e. yes.

That said you should always run module updates on a copy of your site first, but beta9 is working well for most users.

j0rd’s picture

FileSize
18.79 KB

PS. Even after all these fixes, metatag still uses too much data to be stored in memcache or redis. I would highly recommend a warning be added to exclude storing this cache bin in memory, and force it to stay in the database.

See attached JPEG. You can clearly see when I took metatag out of redis (hint: March, and Feb was testing these patches)

criscom’s picture

I can confirm the problem stated by j0rd. The metatag_cache table was still growing beyond 200 MB within a couple of hours. What I did was create a rule which clears the cache with every cron run, thus also clearing the metatags_cache table.

funana’s picture

What is the status of this? Does anybody know how to fix it?

mibfire’s picture

It is still very serious problem. Obviously running cron in every couple of hour is not the solution. My cache_metatag table is already 1 gb and still growing.

@DamienMcKenna, @Dave Reid could you check this issue again? How could we help you to figure out the problem? Thx

DamienMcKenna’s picture

StryKaizer’s picture

Just experienced a 7gb cache_metatag table (other cache-tables were fine).
clearing entire cache in UI cleaned up the 7gb, but it would be nice if cron (which runs frequently) would take care of this.

I'll post what the cache entries look like when I experience the same issue again.