In my quest in accelerating my main website, I found this ugly function.

First of all, my table is now 139,757 rows and growing. It looks like the function deletes only a very few (if any) rows from the table. This being said, I can, again, see extremely old entries in the table (timestamp of 1257142720 represents: Sun, 01 Nov 2009 22:18:40 -0800 -- why do I still have that in there?!)

/**
 * Given hash of url delete any old relationships.
 *
 * @param $hash_url
 */
function boost_cache_prune_node_relationship($hash_url) {
  // Grab all entires related to this URL; find ones that don't match the latest
  // timestamp and remove them.
  $records = 0;
  $result = db_query("SELECT hash, timestamp FROM {boost_cache_relationships} WHERE hash_url = '%s' ORDER BY timestamp DESC", $hash_url);
  while ($info = db_fetch_array($result)) {
    if ($info['timestamp'] < BOOST_TIME) {
      db_query("DELETE FROM {boost_cache_relationships} WHERE hash = '%s'", $info['hash']);
      $records++;
    }
  }
  return $records;
}

Although the logic is correct and you are likely to get the right count in records (which frankly doesn't matter unless you have a VERBOSE of 7 or more!) it is written in the slowest possible way. The following would be very much preferable:

function boost_cache_prune_node_relationship($hash_url) {
  db_query("DELETE FROM {boost_cache_relationships} WHERE hash_url = '%s' AND timestamp < %d", $info['hash'], BOOST_TIME);
  return db_affected_rows();
}

If MySQL returns the wrong number of affected rows, then well... too bad. Frankly, from what I've seen that works just fine. MySQL has a hard time with UPDATE, not DELETE.

Alright... So that's good, but like I mentioned in another post ([#]), when a URL disappear from your site, it sticks in your tables. That's annoying since that means the table continues to grow forever. I would suggest a delete from this table with the largest amount of time a valid entry can remain in the table.

Something more or less like this (example for keeping those relations for up to 1 week):

db_query("DELETE FROM {boost_cache_relationships} WHERE hash_url = '%s' AND timestamp < %d", $info['hash'], BOOST_TIME - 60 * 60 * 24 * 7);

I'm not too sure which delete relations make use of, but I'm sure you understand what I'm saying here.

Thank you.
Alexis Wilke

CommentFileSizeAuthor
#4 boost-6.x-fast_cache_pruning.patch924 bytesAlexisWilke
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

AlexisWilke’s picture

Btw, the bear minimum optimization would have been to break the loop once you reached an entry that was out of time since your sort the results...

/**
 * Given hash of url delete any old relationships.
 *
 * @param $hash_url
 */
function boost_cache_prune_node_relationship($hash_url) {
  // Grab all entires related to this URL; find ones that don't match the latest
  // timestamp and remove them.
  $records = 0;
  $result = db_query("SELECT hash, timestamp FROM {boost_cache_relationships} WHERE hash_url = '%s' ORDER BY timestamp DESC", $hash_url);
  while ($info = db_fetch_array($result)) {
    if ($info['timestamp'] < BOOST_TIME) {
      db_query("DELETE FROM {boost_cache_relationships} WHERE hash = '%s'", $info['hash']);
      $records++;
    }
    else {
      break;  // <- this way you avoid reading all the new relationships...
    }
  }
  return $records;
}

But of course, the SELECT should also check timestamp against BOOST_TIME. And then you can deduce that the DELETE would be enough.

Thank you.
Alexis Wilke

AlexisWilke’s picture

Status: Active » Needs review

Small mistake, it's $hash_url in the DELETE args:

function boost_cache_prune_node_relationship($hash_url) {
  db_query("DELETE FROM {boost_cache_relationships} WHERE hash_url = '%s' AND timestamp < %d", $hash_url, BOOST_TIME);
  return db_affected_rows();
}
wiifm’s picture

Cross linking the issue from drupal.stackexchange.com http://drupal.stackexchange.com/questions/8154/perfomance-problem-when-p...

AlexisWilke’s picture

Assigned: Unassigned » AlexisWilke
FileSize
924 bytes

To help others, there is a ready made patch. 8-)

This does not delete very old relationships (yet). I am still waiting for comments by a Boost developer to understand/know whether/how a relationship cache becomes out of date and whether it can be deleted whenever it is smaller than BOOST_TIME.

Thank you.
Alexis Wilke