{boost} table grows excessively [#1136792]

I've been looking for a reason for the slowness of my main website. We get thousands of hits every day and that makes for one since it needs to regenerate all those pages...

Now I checked the database and noticed that quite a few tables were enormous. The sessions table for one, which I have now fixed (i.e. the garbage collection was somehow turned off!) Then I moved on with the other tables and got to this one:

boost_cache

and before cleaning it, I checked it: 203,064 rows. On that website, I have less than 1,000 nodes, maybe 2,000 terms altogether, and some other automatic pages, say 1,000 of them (to be really large.)

Before that, I had the aggregator pumping some pages too. That grew quite a bit, but I have now uninstalled the module. So all those pages are gone. I think I had some 20,000 such pages. Even viewing large, say 50,000.

Total I get is: 1 + 2 + 1 +50 = 54 thousand. Not 203k... so I looked a little closer, you also cache "weird" entries. I guess you're not entirely responsible for those though.

Sample of "weird" filename (I know, it's code and they're trying to run an SQL command):

cache/normal/linux.m2osw.com/aggregator/sources/23_page=4 AnD 1=2
UnIoN SeLeCt CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,DaTaBaSe(),0x3a),
CoNcAt(0x3a,0x6b6f70656c65,0x3a,VeRsIoN(),0x3a,UsEr(),0x3a,
....

Okay, so... what now? Why is the boost_cache so big and so slow? I looked a little closer at the code. When you clear the cache, you don't delete anything. This is a big problem with PostgreSQL.

/**
 * Resets all entries in database.
 */
function boost_cache_clear_all_db() {
  if (BOOST_FLUSH_ALL_MULTISITE) {
    db_query("UPDATE {boost_cache} SET expire = %d", 0);
  }
  else {
    db_query("UPDATE {boost_cache} SET expire = %d WHERE base_dir = '%s'", 0, BOOST_FILE_PATH);
  }
}

The problem is that with PostgreSQL, the UPDATE command is very likely going to append to the table (i.e. mark the old row as unused and append a new valid row at the very end). With that method, the table grows forever. There is a vacuum process that will run on each table and remove unused rows. However, that process goes slower and slower as the number of rows grows. Handling of 203k is VERY SLOW.

The best way to keep table small is to delete unused rows.

Also, in your case, somehow, I had a row that had an expiration date of Feb 2010 (we're in Apr 2011!) Would that be considered as expired? Why wouldn't that row be reused?? I guess one reason is because I had many other rows with 0 in them.

Anyway, in case of PostgreSQL, I think that a DELETE instead of UPDATE to expiration = 0 would be a lot more effective. Also it looks like removing a module will not remove any of the pages in link with that module, and when that module generated thousands of entries in the cache, the cache will never be reduced back to a reasonable size. I'd expect my table to never grow over some 50k (Google finds only about 13k pages on my main site. Yeah... I have no idea how many pages we really have on this website.)

I'd like to know what you think on your end before offering a patch. If having empty entries (expire = 0) is very beneficial for MySQL, then maybe we can have a trade off and keep some of those. If larger than a certain number, then delete any extras.

Also, I'm still wondering, why would some rows be that old? (expired over 1 year ago)

Thank you.
Alexis Wilke

Comments

Comment #1

rimu CreditAttribution: rimu commented 25 October 2012 at 01:51

Yeah, I have this problem also.

Until something better comes along, I'm just going to add a bit of code that runs in hook_cron that executes a "DELETE * FROM {boost_cache} WHERE expire = 0". I wish I knew how to clean out {boost_relationships} also, but the table structure is less obvious, so...

function MODULENAME_cron(){

	db_query("DELETE FROM {boost_cache} WHERE expire = 0");
	db_query("OPTIMIZE TABLE {boost_cache} , {boost_cache_relationships} ");
	
}

Nasty, but it will do.

Comment #2

AlexisWilke CreditAttribution: AlexisWilke commented 28 October 2012 at 23:05

rimu,

My current fix is even harsher:

db_query("DELETE FROM {boost_cache}");

The other problem is that the table doesn't get cleaned up automatically. So if you never clear the cache, it grows anyway. Right now my table is 300,469 rows. So I guess there is still some sort of limit. But for a site with just a little over 10k pages, it seems that x30 in the cache still doesn't make sense.

Thank you.
Alexis

P.S. your OPTIMIZE expression is for MySQL and I clearly stated that I'm using PostgreSQL.

Comment #3

xurizaemon

he/him

Ōtepoti, Aotearoa 🏝

CreditAttribution: xurizaemon commented 1 November 2012 at 22:13

Title:	203,000 rows and counting...	» {boost} table grows excessively
Status:	Active	» Needs work

When dealing with similar issues (#934876: Avoid long {sessions} table locks) on an overgrown {sessions} table in session_expire module (which at that point deleted from cron using DELETE ... WHERE), sites could be non-functional during cron runs because of locks against the sessions table.

In that case, DELETE ... WHERE was aggravating the issue of oversized tables because it ran a single long cleanup query which locked the table. The committed fix was to iterate through looking for entries which suited removal. Not as quick to run, but caused less trouble.

Unsure if same applies to PostgreSQL.

Comment #4

AlexisWilke CreditAttribution: AlexisWilke commented 2 November 2012 at 22:42

Ah! I don't remember reading a comment about that... 9-)

That's an interesting work around. However, with 300,000 entries in a table, such iteration would always timeout. That's way too much, even 10,000 would certainly be too long.

At this point I did not do anything more to it, it gets deleted once in a while when a cache clear happens. At that point users accessing the site have a big slow down anyway.

Comment #5

xurizaemon

he/him

Ōtepoti, Aotearoa 🏝

CreditAttribution: xurizaemon commented 2 November 2012 at 23:32

The Session Expire fix also provides UI so you can clean up 100, 500, 1K, 5K sessions at a time, to account for the timeout issue you suggest. The settings just need to keep cache cleanup set to a higher average than the number of cache entries added.

Another solution for you may be to call drush boost-cache-clear-expired regularly (maybe a weekly task)? You'd need to account for filesystem permissions to clear files cached by the webserver user.

But ... if Boost isn't doing this as it ought, calling it from Drush may not work as expected either :)

Comment #6

rimu CreditAttribution: rimu commented 7 November 2012 at 23:52

A better way to optimize tables is by using http://drupal.org/project/db_maintenance

Comment #7

AlexisWilke CreditAttribution: AlexisWilke commented 9 November 2012 at 03:44

rimu,

Interesting. In most PostgreSQL installation we have a daemon that runs all the time and defragments automatically anyway. So I'm not so sure that this db_maintenance would help much in our situation. Thank you for pointing it out though.

Alexis

Comment #8

bgm CreditAttribution: bgm commented 9 November 2012 at 15:01

IIRC, entries are rarely deleted from boost_cache, since the URLs in that table can then be used by the crawler to refresh expired pages.

Ideally, in 6.x-2.x, we'd have the crawler in a sub-module (as in 7.x-1.x) and use the Queue API backport [1] to refresh expired pages, if necessary. (related #1785292: Cron Crawler Not Running)

[1] http://drupal.org/project/drupal_queue

Which means that bots can rather easily DoS a site by crawling bad URLs that generate valid URLs, such as /node?test={1...1000}.