Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I was notified by my hosting company that the databases on 2 of my websites were too big. Over 600MB per database!
I checked and 99% of the records were in the cache_metatag table.
I truncated it to reduce the size of the database but why is this table growing to over 300MB?
Both websites are relatively small. About 15 pages and a newsletter node with attachment once a week.
Comment | File | Size | Author |
---|---|---|---|
#67 | redis_used_memory-year.png | 18.79 KB | j0rd |
#61 | redis_used_memory-month.png | 19.18 KB | j0rd |
#56 | metatag-n2062379-53.patch | 9.88 KB | DamienMcKenna |
Comments
Comment #1
DamienMcKennaHow many languages does the site have? How many records in you node table? How many users and terms are there?
Comment #2
tchurch CreditAttribution: tchurch commentedAn example of one site (both are similar in design):
1 language, 73 node records, 3 users, 3 vocabularies, 4 terms between them.
Comment #3
DamienMcKennaDo you get a lot of 404 errors on your site?
Comment #4
tchurch CreditAttribution: tchurch commentedNot many. Only 3-4 per day where people try to hack it but that should be normal.
How do the number of 404 errors matter?
Comment #5
Firemyst CreditAttribution: Firemyst commentedI've just been informed that cache_metatag is 4.5 GB for a site less than four months old. Why is cache_metatag not being cleared properly?
Comment #6
cimo75 CreditAttribution: cimo75 commentedI can confirm this one, my cache_metatags is 160MB right now
Comment #7
tomogden CreditAttribution: tomogden commentedMy cache_metatag table has grown from 0 to 267MB in 34 days. That's like the Blob horror movie. I noted that the expire field is set to zero (0) on all 25,172 records. I am not finding any way to adjust this, and conventional clearing of caches has no effect.
Unless someone can point out a simple way to resolve this issue, I would say this is an unexpected behavior (translated "bug").
Comment #8
Firemyst CreditAttribution: Firemyst commentedI've checked other cache tables (cache_field) and expire is also 0 on all items.
Drupal should delete if created + cache_lifetime < NOW() (unless cache_lifetime == 0), but it's not happening for the cache_metatag table. I don't know if there is a missing hook that is supposed to exist, or if the cache_set() isn't assigned properly. I'm new to looking into cache this deep.
Comment #9
Dave ReidIs anyone not running cron regularly on their site? The cache_metatag will be cleared whenever drupal_flush_all_caches() is run which happens on cron or update.php, in addition to many other events.
Comment #10
Firemyst CreditAttribution: Firemyst commentedI run cron every 30 minutes for a series of RSS pulls. I can confirm that cron is working.
Comment #11
DamienMcKennacache_clear_all, as called by system_cron, says:
Clearly the cache handling needs to be improved.
Comment #12
DamienMcKennaJust to clarify my last comment - cache_clear_all() only removes expirable entries, but if Metatag isn't setting an expiration on the records then nothing will ever be expired.
Comment #13
DamienMcKennaFolks, in your cache_metatag table, how many records do you have with a name that starts "output:global"?
Comment #14
cimo75 CreditAttribution: cimo75 commentedProbably most of them
Comment #15
HyperGlide CreditAttribution: HyperGlide commentedChecking one site I see opposite case where the cache_metatag table has 0 entries.
Comment #16
DamienMcKenna@cimo75: Those are all 404 errors. I think we need better handling for 404 pages.
Comment #17
Dave Reid@DamienMcKenna: Maybe we should implement our own cache clear on cron then, with added expiration information.
Comment #18
DamienMcKenna@dave: We've got a patch to handle 403 pages better (#2090557: Don't load/cache data on 403 & 404 error pages), I think something similar is warranted for 404 pages too, that'd solve a good part of this problem.
Comment #19
tomogden CreditAttribution: tomogden commentedThanks to everyone for these quick responses.
For me the cache_field table is the other table that is growing at a ridiculous rate. It's about the same size as my cache_metatag table.
Yes, indeed. My cache is operating very well and clears some other tables that would otherwise cause issues.
Not many. Most of mine start with "output:node", but there are 2 or 3 for each nid. Here are my counts:
Comment #20
Dave ReidGrr, I wrongly assumed that running cron ran drupal_flush_all_caches(), but it does not.
Comment #21
DamienMcKenna@Dave: system_cron runs the following:
Metatag implemen ts hook_flush_caches, so the cache_metatag table does get passed through cache_clear_all() to purge all expired records. However, as mentioned before, Metatag doesn't set an expiration date/time so nothing is expired.
Comment #22
DamienMcKenna@tomogden: Does your site have multiple languages?
Comment #23
Dave Reid@DamienMcKenna: Yeah for anything with output:* we should be not using the permanent cache and setting an expiration. That's not the intended use of the permanent cache.
Comment #24
tomogden CreditAttribution: tomogden commented@DamienMcKenna: No, we have not done any I18N although there has been a temporary Mandarine page or two posted.
Most of the duplicate entries are identical, but for the time stamp. A third (6,864) of them have a 'https' variation in the URL.
Comment #25
Firemyst CreditAttribution: Firemyst commentedI don't have direct access to the live database, and my DBA already wiped it with a UNIX cron so it would stay a manageable size. So far on my local VM, I see mostly output:global, but my VM is not the exact scenario that go.com is.
Comment #26
femtox77 CreditAttribution: femtox77 commentedHi to all!!
Sorry for question...but where can I set expire in meta tag cache?
Thank you
Comment #27
reptilex CreditAttribution: reptilex commentedI can confirm this issue, too. I have a website with more or less 10000 nodes in two languages (english and german) and my cache_metatag table is 470 MB big, it has more than 81000 entries, which is 8 times the nodes and users I have. Can I truncate the table? How can I setup the expiration for this cache? Does it have to be this big? Especially for backups, this is a big problem. Something is not right. By the way I'm using the open graph module too. That's why I needed metatag for.
Comment #28
sahuni CreditAttribution: sahuni commentedSame for me : cache_metatag fills extraordinary quickly.
Perhaps a track : I deleted hundreds of nodes and my website has a lot of 404 errors.
Comment #29
reptilex CreditAttribution: reptilex commentedFrankly I checked what I was using this module for and created a theme function for it. I don't know how much I'm missing from it. But my articles have almost the same metatags now with the following code in the tempate.php file:
Maybe someone finds this useful. I'm sorry for the meta tag team, but I don't need a table that is bigger than my content tables to add meta tags.
Comment #30
criscomSame problem here:
A lot of slow queries in the data base caused by cache_metatag
# Time: 131116 11:38:41
# User@Host: abc001de2[abc001de2] @ [10.1.1.63]
# Query_time: 51 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
use abc001de2_domain;
INSERT INTO cache_views (cid, serialized, created, expire, data) VALUES ('views_data:views_entity_file:en', '1', '1384598261', '0', 'a:1:{s:5:\"table\";a:3:{s:4:\"join\";a:3:{s:12:\"file_managed\";a:1:{s:10:\"left_table\";s:12:\"file_managed\";}s:11:\"entity_file\";a:1:{s:10:\"left_table\";s:11:\"entity_file\";}s:17:\"views_entity_file\";a:1:{s:10:\"left_table\";s:17:\"views_entity_file\";}}s:11:\"entity type\";s:4:\"file\";s:5:\"group\";s:4:\"File\";}}');
# Time: 131116 10:53:39
# User@Host: abc001de2[abc001de2] @ [10.1.1.63]
# Query_time: 362 Lock_time: 0 Rows_sent: 865786 Rows_examined: 865786
use abc001de2_domain;
SELECT * FROM `cache_metatag`;
# User@Host: abc001de2[abc001de2] @ [10.1.1.63]
# Query_time: 300 Lock_time: 0 Rows_sent: 865809 Rows_examined: 865809
SELECT * FROM `cache_metatag`;
Comment #31
j0rd CreditAttribution: j0rd commentedI cleared caches yesterday on my install after I moved to redis in memory caching.
As of today, I have:
cache_metatag:output:global* is 63k entries.
cache_metatag:output:node* is 15k entries.
45k nodes, but every node has probably around 12 unique urls which relate to it, like node/123/comment node/123/review ....
Do we know why this is happening? Like metatag cache getting created for each 404 request or unique URL?
--- UPDATE
I've gone ahead and dumped my redis using redis-audit which goes over all keys in cache and summarizes them.
yt:cache_metatag:* is taking 70% of my redis memory, which is now up to over 1GB. So after two days, I have 700 MB worth of this data. Seems like way too much.
Comment #32
j0rd CreditAttribution: j0rd commentedSo looks like anytime a URL is hit, a new global cache entry gets made:
For my website
/retreat/some-node-46235
(node page, makes a cache_metatag:node entry)
/retreat/some-node-46235/reviews
(page related to node, makes cache_metatag:global entry)
/retreat/some-node-46235/reviews?morecache=1
(page related to node, makes new cache_metatag:global entry)
/retreat/some-node-46235/reviews?morecache=12
(page related to node, makes another new cache_metatag:global entry)
/retreat/some-node-46235/reviews?morecache=123
(page related to node, makes a third new cache_metatag:global entry)
Any random 404 page also makes a new global cache entry.
So pretty much every URL (valid or not) creates a new global cache entry.
I did a dump on all my entries, and I have ~80k now, and each one is around 8.5k each. So that's around 670 MB.
Solutions
This probably needs to get resolved in such a way so that we minimize the amount of new global entries created, especially with regards to page arguments and 404s. Problem is some people will want to change their keys based on page arguments, so this would need to be configurable.
Size savings, while waiting for a more permanent solution
From a memory saving standpoint (or database size standpoint) is we could only save the values for global which are different then global, then load global and merge it with changes instead of storing everything. When global changes though, these would need to get invalidated.
Comment #33
XerraX CreditAttribution: XerraX commentedI think this actually is a critical bug because it renders my system unusable (4.5 gb in db).
Comment #34
j0rd CreditAttribution: j0rd commentedEssentially you can DDOS any Drupal which uses metatag module.
Just hit a bunch of random links and fill up their disk.
Any site which uses redis as cache backend this becomes even easier, since you'll fill up their memory instead of disk.
If they're using memcache, you can force the site to get a bunch of cache misses, by populating it with cache_metatag data, but no other exploit there.
Might take a couple hundred thousand requests, but would be easy enough. On my site, a cache_metatag entry is around 8k.
1000 requests = 8MB
10,000 requests = 80MB
100,000 requests = 800MB
1,000,000 requests = 8GB
Death by 1000 paper cuts.
I'd say this is pretty critical myself as well. Especially since i'm using redis.
My solution is to simply get rid of this module and replace it with the snippet in #29.
Comment #35
DamienMcKennaI'll work on a patch for this tonight.
Comment #36
DamienMcKennaComment #37
DamienMcKennaI've added an updated patch on #2090557: Don't load/cache data on 403 & 404 error pages that will help with 404 pages, please help review it.
Comment #38
DamienMcKennaPlease test this out, after applying #2090557: Don't load/cache data on 403 & 404 error pages. I've completely scrapped using the current_path() in the cache cid, and it doesn't use a crazy-long hash of the cid parts.
Comment #40
DamienMcKennaComment #41
DamienMcKenna38: metatag-n2062379-38.patch queued for re-testing.
Comment #42
DamienMcKennaI'd missed some other $cid_parts uses.
Comment #43
DamienMcKennaFYI between the patch from #42 and the one from #2090557: Don't load/cache data on 403 & 404 error pages, it should greatly reduce the number of items cached, especially 403/404 errors.
Comment #44
DamienMcKennaAlternatively, this separates the cid logic into a separate function. It also standardizes on some cid values.
Comment #45
j0rd CreditAttribution: j0rd commentedI'll test the patches on production server (because i'm an idiot) when I get to the office on Monday. Thanks for your assistance with this. Because I can count the keys easily in redis (and you can to with database) I should be able to see how many these improvements remove.
Any plans on looking into query arguments, which could be tough. Those are not 404s, but rarely I assume people use them to create new metatag changes. Optional setting perhaps, turned off by default?
Comment #46
j0rd CreditAttribution: j0rd commentedI installed the latest dev and this patch, and it completely broke my metatag data. Seems to return bad cache values.
Which leads me to believe, we're missing some simpletests to catch these things.
But overall, it completely broke everything. Seemed to return global caches for most stuff, and in my cache_metatag table, no new entries were getting created for nodes.
I reverted the patch, and things went back to normal (showing appropriate metatag data)
This needs work.
PS. Here's what my keys looked like after a couple minutes. The keys didn't seem to grow the longer the code was live
Looking through the code as well, I'm noticing you're truncating cid keys to 128, which will cause problems on keys which are longer. Previously was a 256 hash, which is probably the best solution to avoid conflicts. Although you'll have to use a shorter hash than 256 to account for cache prefix.
This is really a limitation imposed by the cache backends, and IMHO, something they should handle. Unfortunately cache_set with DatabaseBackend doesn't gracefully handle keys longer than 255, except for just truncating them. Other cache backends do handle this problem though.
Memcache has a key length of 250, not 255, and handles this gracefully by hashing CIDs longer then 250. Many also recompile memcache to get longer key lengths.
Redis has a key length of 2GB, and while, I think just ignores the problem entirely.
Comment #47
DamienMcKennaThe old cache values wouldn't work, but it should work after they're recompiled?
Comment #48
j0rd CreditAttribution: j0rd commentedAfter I installed the patch, I cleared my caches via `drush cc all`. I assume this would flush everything and start rebuilding from nothing.
Is there anything else, I needed to do to make sure this worked as expected?
Comment #49
DamienMcKennaI fixed the problem - bad arguments to substr(). #facepalm.
I've also merged much of the changes from #2090557: Don't load/cache data on 403 & 404 error pages as it was more appropriate in this issue, i.e. there are now wrappers for cache_get() and cache_set() that can be expanded upon later as needed. I've also tweaked metatag_entity_view() so it catches the fact that Panels (i.e. CTools) uses the view mode "page_manager" for the full entity display rather than 'full', the cid_parts logic has a better sorting of the items to (hopefully) make it easier to clear items and updated metatag_metatags_cache_clear() to match. Lets see if this works.
Comment #50
DamienMcKennaA slight update that drops the 'language' cid_part and moves up the 'langcode' part.
Comment #51
jibrandebug code.
Comment #52
DamienMcKennaDoh! Good catch - I removed the various kpr() calls but forgot this one.
Comment #53
j0rd CreditAttribution: j0rd commentedI've got some important stuff I've got to do early this week, but I should be able to test this on a large live install in a couple days.
Until then, if anyone else could test, please post your results.
Thanks again Damien for the patch.
Comment #54
HyperGlide CreditAttribution: HyperGlide commentedIs this patch required to release beta 8?
Comment #55
DamienMcKenna@HyperGilde: Yes.
Comment #56
DamienMcKennaRerolled to accommodate #2168343: Clear entity_cache after recent db updates.
Comment #57
DamienMcKennaI've tested this and it appears to be working just fine, so I'm committing it.
Comment #58
gonz CreditAttribution: gonz commentedSo is it safe to just Truncate the "cache_meta" table - install the 7.x-1.x-dev and let it run as normal?
Turns out my table bloated to 4.3 Gigs!
Sheesh
Comment #59
DamienMcKennaGrabbing the -dev release and running the update scripts will clear out the cache table for you.
Comment #61
j0rd CreditAttribution: j0rd commentedAttached is my stats from munin after applying this patch.
The last "bump" you see around "Week 5" is after applying this patch. Everything else is before, with me pruning stuff manually every couple of days.
We'll see if it eventually tapers out to flat...but so far looks like an improvement.
All the crazy spikes you see are because of metatag module. Nothing else but Drupal is used in redis.
Comment #62
j0rd CreditAttribution: j0rd commentedWith regards to this change, once it's live those who upgrade are going to have to be careful.
Metatag will no longer differentiate pages with different arguments.
This nailed me upon upgrade. To resolve the problem, I had to re-add specific arguments for which I want the metatags to be changed. For me this was in-regards to apachesolr & facetapi filer arguments. I have different titles search pages with facets, which get indexed.
95% of my arguments I don't really care.
I think the patch can go ahead with a warning about this issue. To resolve this "problem" you have to implement this alter_hook
While I don't recommend changing this how it's currently working, I do believe the cache CID should be implemented along side the "canonical URL" of the page.
Comment #63
DamienMcKenna@j0rd: Please add a new issue to cover this, it might be worth adding a feature to control which URL arguments should be considered for the $cid_parts.
Comment #64
j0rd CreditAttribution: j0rd commentedYes, I think if it was configurable to the end user, it would be easier for non-programmers to configure this.
Here's the new issue, i hijacked from someone else.
#2187985: Provide UI to control which URL arguments create different cache records
Comment #65
Greggg CreditAttribution: Greggg commentedIs it fixed in 7.x-1.0-beta9 version? I haave the same problem and i i'm afraid that metatags after upgrade will spoil website, all metatags.
Comment #66
DamienMcKenna@greggg: If you checked the release notes you'd see the fix for this issue is included. I.e. yes.
That said you should always run module updates on a copy of your site first, but beta9 is working well for most users.
Comment #67
j0rd CreditAttribution: j0rd commentedPS. Even after all these fixes, metatag still uses too much data to be stored in memcache or redis. I would highly recommend a warning be added to exclude storing this cache bin in memory, and force it to stay in the database.
See attached JPEG. You can clearly see when I took metatag out of redis (hint: March, and Feb was testing these patches)
Comment #68
criscomI can confirm the problem stated by j0rd. The metatag_cache table was still growing beyond 200 MB within a couple of hours. What I did was create a rule which clears the cache with every cron run, thus also clearing the metatags_cache table.
Comment #69
funana CreditAttribution: funana commentedWhat is the status of this? Does anybody know how to fix it?
Comment #70
mibfire CreditAttribution: mibfire commentedIt is still very serious problem. Obviously running cron in every couple of hour is not the solution. My cache_metatag table is already 1 gb and still growing.
@DamienMcKenna, @Dave Reid could you check this issue again? How could we help you to figure out the problem? Thx
Comment #71
DamienMcKennaAnyone who's interested in this, please keep an eye on #2474427: Drop the output cache and #2420489: Restructure entity caching so it can be cleared more easily.
Comment #72
StryKaizerJust experienced a 7gb cache_metatag table (other cache-tables were fine).
clearing entire cache in UI cleaned up the 7gb, but it would be nice if cron (which runs frequently) would take care of this.
I'll post what the cache entries look like when I experience the same issue again.