Given I'm looking into optimising Drupal (as per http://buytaert.net/drupal-database-interaction).

Recently, my attention was drawn to the user.module (for unrelated reasons) and in function user_access() I noticed this:-

 $result = db_query("SELECT DISTINCT(p.perm) FROM {role} r INNER JOIN {permission} p ON p.rid = r.rid WHERE r.rid IN (%s)", implode(',', array_keys($account->roles)));

Ah, a DISTINCT possibility for optimisation :)

On closer inspection, the use of DISTINCT here seems inappropiate. Here's my reasoning:-

the "perm" column is of type LONGTEXT and, if you actually look at the data it contains hardly any rows are the same, making them pretty much DISTINCT anyway. They could be the same, and since DISTINCT is used, MySQL is forced to create a temporary table and then scan all rows we "weed out" any dups. Then throw away the temporary table. But what for? Read on in the PHP....

    while ($row = db_fetch_object($result)) {
      $perm[$account->uid] .= "$row->perm, ";
    }

All that is actually happening here is producing one large comma seperated string based on all possible permissions. Following that, we have

  if (isset($perm[$account->uid])) {
    return strpos($perm[$account->uid], "$string, ") !== FALSE;
  }

  return FALSE;

So, all we are doing is returning a boolean result based on whether a given permission string exists. Now, strpos() is a pretty fast function, it just needs to scan a one-dimesional string of characters and find the first match. If DISTINCT were removed from the SQL statement, then the search string may be bigger. But I think I know where I'd like to apply this filter (in this instance). I'd believe it better to offload the temp table comparison in the database and instead load up the strpos() function with more data. After all, it only has to find the first match (or end of string and false).

I believe I know which is faster. But, there's another issue. In fact, it's not really a performance enhancer. All we are doing is lowering the burden for MySQL internally but we are potentially increasing the burden on data retrieval (we may get more rows) and we may be asking PHP to scan a longer string. But, I think we should be looking to lower MySQL (or PG) internal overhead. Bringing back the data is pretty trival so that just leaves asking PHP to potentially do more work inside strpos(). I know where I'd like to place (move) the burden. You can scale a site by adding more webservers (therefore more distributed PHP power) but you're stuck with one database machine.

If you can potentially move load from the DB server to the Webserver, I think it should be worth exploring.

best regards,
--AjK

CommentFileSizeAuthor
user.module_r1.634_patch01903 bytesAjK
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dries’s picture

I could benchmark this and/or I could let you know how much time is spent in this function. I should be able to do that tomorrow.

Maybe the best solution is to get rid of the comma-separated string and to normalize the database by creating another table.

mfb’s picture

The real issue I see is the non-ideal data model for permissions, as dries mentions. But yes, it does seem that DISTINCT is not needed here. Few sites would have a large number of different roles, all with the same permissions.

moshe weitzman’s picture

be careful about normalizing. one optimistic patch was submitted a year or so ago (maybe more) and it was found to perform worse. hopefully someone can find the url

drumm’s picture

I expect this will work, but since it is a tradeoff, it would be nice to see a benchmark.

Although it would be nice to see this table properly normalized, I think that is outside the scope of something which can be immediately useful.

killes@www.drop.org’s picture

Version: 4.7.2 » x.y.z
Category: task » feature

labelling as feature and moving to cvs.

AjK’s picture

Has this issue been superceeded by http://drupal.org/node/73874 ??

If so, I guess it can be closed.

regards,
--AjK

bdragon’s picture

Version: x.y.z » 6.x-dev
Status: Needs review » Closed (duplicate)

Superceded. If this is really a big performance issue, we can resurrect #73874.