The 3rd slowest query on drupal.org today is this:

SELECT name, filename, throttle FROM system WHERE type = 'XXX' AND status = XXX ORDER BY weight ASC, filename ASC;

This patch adds an index on type,status. Please review.

Comments

JirkaRybka’s picture

Seems easy, but needs review from someone more experienced. I have one question, though: Should there be also some upgrade path?

dmitrig01’s picture

yes - we need a hook_update_n

david strauss’s picture

Status: Needs review » Needs work

The index ought to be (type, status, weight, filename). With the given patch, the DB will still need to sort at query-time.

moshe weitzman’s picture

@David - Do we need to change or replace either of the existing indices on the table?

david strauss’s picture

Does anything actually use the weight index?

david strauss’s picture

EXPLAIN result before: no index used and "Using where; Using filesort"

EXPLAIN results with my suggested index: uses my suggested index and "Using where"

We should also add the index (type, status, bootstrap, weight, filename) if we want the bootstrap module listing to also be optimized.

david strauss’s picture

Assigned: moshe weitzman » david strauss
StatusFileSize
new508 bytes

Here's an updated patch.

david strauss’s picture

StatusFileSize
new794 bytes

Whoops, I forgot the diff parameters.

david strauss’s picture

I think this patch is a candidate for backporting to D5.

david strauss’s picture

One more thing. :-) This patch needs to implement an upgrade path from D5. What is the standard for writing hook_update_N() patches?

JirkaRybka’s picture

I think that function system_update_6025() in modules/system/system.install makes a good example. We probably need to add a new similar one at the end, with number higher by one than the last. (This is just a quick note from me, no docs consulted!)

bjaspan’s picture

StatusFileSize
new1.41 KB

David's patch creates an index that is too long for MySQL. Each character in a utf-8 field counts as three bytes (because that is the max it can be (or so says the mysql manual, I thought a utf-8 char could be four bytes)), and the type and filename fields are both varchar(255) or 765 bytes each as an index column. MyISAM only allows 1000 bytes and Innodb 767 bytes.

The type field can only ever be module, theme, or theme_engine, so I set the index to be on a 12-character prefix. This makes the index work with MyISAM but presumably not with Innodb (since the filename field by itself takes 765 index bytes).

The filename could in theory be the full 255 bytes but I set it to 200 bytes to make it work with Innodb. However, this changed the way mysql uses the index so "Using filesort" returns to the EXPLAIN. My guess is that this is because we are sorting by filename and mysql can't use a partial-prefix index for sorting (because it may not uniquely identify all the unique values of the column).

So, we have a choice: We either make the filename column shorter or we cannot index it in combination with any other columns for Innodb. Unless I'm missing something.

I'm attaching a patch that includes a system_update_N() function but it still needs work.

bjaspan’s picture

Another thought: We could make the filename column not be UTF-8; if it uses 1-byte chars (which I think all current Drupal code does, and we could insist on), we won't have the space problem.

This would require enhancing Schema API with a 'charset' element on fields. Probably not too hard, but maybe too late for D6.

Chris Johnson’s picture

For all enumerated types columns in the databases, we should be using sane maximum lenghts, e.g. char(12) instead of the less efficient varchar(255). That would help us to avoid problems like this one.

I also agree with making some selective columns not be UTF8. Any columns where the type is an enumerated string controlled by core code only is a candidate for being not UTF8.

Columns which may have additional type values added by contrib modules would be open for debate. The PHP code and SQL is already ASCII-only; requiring that contrib module developers stick to ASCII enum type strings is not particularly onerous.

@David Strauss: the only use of the weight column and its index that I can find are in includes/module.inc, function module_list() where there are 2 SELECT statements which do ORDER BY weight.

chx’s picture

Assigned: david strauss » chx
Status: Needs work » Needs review
StatusFileSize
new2.1 KB
bjaspan’s picture

chx's patch in #15 uses option 1 from my comment in #12: he shortens the filename column to 200 characters (and the type column to 12 characters). That will solve the MySQL maximum index length problem and quite probably a 200-character filename limit is fine. I have not yet tested the patch.

webchick’s picture

Status: Needs review » Needs work

Let's get some docs in there about why we're choosing such an odd maxlength size.

dries’s picture

If the type can only be 'module', 'theme' or 'engine', we could consider to use numeric IDs instead of strings?

chx’s picture

Status: Needs work » Needs review
StatusFileSize
new2.2 KB

Dries, I would rather limit the changes we make right now to such things... we are supposed to be frozen. D7, OK?

merlinofchaos’s picture

I would rather not bother changing the 'type' column; but instead, in D7 we should (IMO) remove it entirely and split the themes from the modules. It seems very silly to have them both together in a table whose name makes little sense. I would much rather have a 'modules' and a 'themes' table. Then modules wouldn't even need a type index. But that's the future, not today; the point being, any attempt to change it now would be a stopgap measure that would either prevent valuable future work or just get completely redone anyway.

david strauss’s picture

Can we use partial indexes in the Schema API? If so, we can just use those for the WHERE portions of the index. Status, for example, only needs one byte.

I'm also not sure what the problem is with the indexes I gave. I'm using them right now on InnoDB tables.

david strauss’s picture

It also seems really silly to sort on filename. I'm not sure we can make the change in D6, but it seems like the actual module name is a more apt choice for sorting, especially because you can install modules to more than one directory.

bjaspan’s picture

David's statement in #21 that the full index is working for him on Innodb lead me to a quick Google search that reveals that the MySQL manual is wrong. http://bugs.mysql.com/bug.php?id=13315 says that it is an index COLUMN whose maximum allowed length is 767 bytes but an InnoDB multi-column index is allowed to be 3500 bytes in length in total. The length limit for MyISAM tables still seems to be 1,000 bytes (as evidenced by the fact that when I tried to update my database to use David's original index, the ALTER TABLE statement failed due to the limit).

This means that my patch in #12 will work for MyISAM and InnoDB, and I also tested it with pgsql. Someone should review it and mark it RTBC.

chx's patch in #19 will presumably also work fine (I haven't tested it) but his patch changes the database schema (by changing max varchar widths) whereas mine only adds/removes indexes. I have no strong feelings but ISTM that adding/removing indices is a smaller overall change to make during code-freeze.

JirkaRybka’s picture

StatusFileSize
new1.25 KB

I tested the #12 patch:
- Patch was rolled with Windows-style line endings (CR+LF), so didn't apply on my Linux environment. I re-created the changes manually and proceeded to testing.
- The update function was doing too much: It dropped indexes added by your previous patches only, not existing on a normal install. So with real site's data, update.php said:

* user warning: Can't DROP 'modules'; check that column/key exists query: ALTER TABLE drup_system DROP INDEX modules in /home/jirux/test-webspace/nat5end2/includes/database.mysql-common.inc on line 447.
* user warning: Can't DROP 'bootstrap'; check that column/key exists query: ALTER TABLE drup_system DROP INDEX bootstrap in /home/jirux/test-webspace/nat5end2/includes/database.mysql-common.inc on line 447.

So attaching a new patch, with the above fixed.

JirkaRybka’s picture

StatusFileSize
new2.24 KB

This is supposed to be about performance, so I did also try to do some benchmarking. It was not exactly the proper benchmarking environment (I'm rather new to benchmarking, and I did it with real site's data too, being most interested about impact to THAT site - see attached file for details).

Full page requests didn't show any significant difference (well, I didn't really expect much there), but with normal page-caching on (which is the most common case on my site), there seems to be about 3.5% improvement. Worth to have, it all sums up in the end...

catch’s picture

Priority: Normal » Critical

This still applies cleanly, and system_update_6035 is still valid. Bumping to critical as a sister issue to: http://drupal.org/node/164532 from which I've just removed the system components which duplicated this one but without upgrade path.

The latest patch is a re-roll of #12 - which ought to be good to go once another set of eyes has looked at it:

Bjaspan: This means that my patch in #12 will work for MyISAM and InnoDB, and I also tested it with pgsql. Someone should review it and mark it RTBC.

Still needs review from someone more experienced.

david strauss’s picture

@JirkaRybka Was your testing done with MySQL's query cache off?

JirkaRybka’s picture

Status: Needs review » Reviewed & tested by the community
StatusFileSize
new1.97 KB

Thanks David, I overlooked that. Now I re-tested it with query cache OFF, seems to give 7% for cached pages on the same config now.

Still, I'm far from understanding what else might affect the benchmarking, or what my Linux desktop might be doing in the background, but I dare say that there IS a gain for cached pages, given the repeated positive results.

The patch applies cleanly, changes only just indexes, gives a bit of extra speed, and upgrade path works fine too, so let's try to RTBC this! :)

gábor hojtsy’s picture

Status: Reviewed & tested by the community » Fixed

Committed #24, thanks for the improvements. Larger schema changes and optimizations are for Drupal 7 for sure.

Anonymous’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.