As I interpret it, the way MySQL works in terms of index's is that it looks for the left most key in the index.

If this is the case, what is the point of nid and status having thier own index in the table when they're the left most keys of the node_status_type index and the primary key?

Should they just be dropped out the database?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

sime’s picture

As I interpret it...

I've never heard of this. Can you provide a link?

.s

nicholasThompson’s picture

http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Specifially, this section:
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html#id2839936

It explains how the ordering of keys in indexes effects how they're used.

sime’s picture

This bit?

MySQL cannot use a partial index if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

So sometimes the node_status_type index is not used (if the query is not using all columns in the index). But that doesn't make it obsolete: if all three columns (status, type, nid) are used in the query, then the index is used fine.

That's my reading of it.

nicholasThompson’s picture

If you group 3 columns together in an index as such (status, type, nid), then if you did:
SELECT * FROM node WHERE status = 1

then that index will be used as status is the first column. That index WONT work for:
SELECT * FROM node WHERE type = 1
or
SELECT * FROM node WHERE nid = 1

but it WILL work for

SELECT * FROM node WHERE status = 1 AND type = 1
SELECT * FROM node WHERE status = 1 AND type = 1 AND nid = 1

I dont think it will work for:
SELECT * FROM node WHERE status = 1 AND nid = 1
as type comes after status, not nid.

Therefore there seems to be no point having an index (status, type, nid) and also an index (status) as bother are the leftmost columns.

I'm not sure, however, what happens if you had:
SELECT * FROM node WHERE status = 1 AND vid = 1

as vid is not an key in that index... Does it look for an index with both status and vid in it or does it look for two seperate indexes?

sime’s picture

I don't think your interpretation is correct. I don't think indexes are automatically created with left-most columns. (Order obviously matters, sometimes a multi-key index is not used, but that is no reason to drop an index.)

But I'm not an expert. At least we have fleshed out the issue and maybe someone can cast the deciding vote.

nicholasThompson’s picture

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

Thats from the mysql documentation link I gave above (comment #2).

According to that, the order of creation of the keys in an index does matter.

But I'm not an expert.

me neither - lets hope someone here is :-)

Thanks for your feedback Sime. Often argueing a point like this helps clarify it further.

James Harvard’s picture

njt1982 is correct – the 'nid' and 'status' indices are redundant.

If the nid or status column is the only column in a WHERE clause, then MySQL can use the primary key (nid,vid) or node_status_type (status, type, nid) index respectively, as per the quote from the manual above.

MySQL's EXPLAIN SELECT command shows which index MySQL is using. If you drop both the 'nid' and 'status' indices ...

EXPLAIN SELECT * FROM node WHERE nid = 1;
# MySQL uses the primary key (nid,vid)

EXPLAIN SELECT * FROM node WHERE status = 1;
# MySQL uses the node_status_type index (status, type, nid)

HTH,
James Harvard

nicholasThompson’s picture

So how do we go about getting this into 4.7.3?

As I understand it (this is more hear-say than solid, so please correct if wrong) having excess indexes is potentially a fairly big waste of resources as it can delay insert and update commands due to write's being much slower than reads.

Chris Johnson’s picture

I don't know how big a performance hit it is, but that's essentially correct. Normal inserts and updates will be delayed while MySQL rewrites the indexes affected. Getting rid of them will eliminate a bunch of disk writes in many cases.

pwolanin’s picture

I'm not a MySQL guru by any streatch, but even I noticed complaints by phpmydamin regarding these duplicate indices and there are a couple previous bug reports:

http://drupal.org/node/59843

James Harvard’s picture

So is it just a matter of someone producing a patch for the /database/database.4.x.mysql files? Also I presume that an update function would be needed to drop the indices on existing installations - is that just a matter of submitting a patch too, or does one of the admins have to do it?

killes@www.drop.org’s picture

Version: 4.7.2 » x.y.z

Right, a patch is all we need. Not sure we will apply it to 4.7, but certainly to CVS HEAD. Changing the version.

nicholasThompson’s picture

So 4.7.4 will be the next release to see this?

killes@www.drop.org’s picture

if we do not get a patch no release will ever see it. :p

We generally try to avoid database updates in the stable branch, so I am unlikely to apply a patch for 4.7.4 since it doesn't really break anything.

nicholasThompson’s picture

true - so should this be assigned to you now?

sime’s picture

Normal course of action - someone with the ability and motivation may provide a patch. When and if that will happen is not guaranteed.

pwolanin’s picture

If you can fully enumerate the pointless keys, I'll try to turn that into a patch.

pwolanin’s picture

Title: Pointless database keys » Pointless and missing(?) database keys

Ok, just playing with the Devel module.

On the /node/add page on of the slow queries was

12.67 0 list_themes SELECT * FROM system WHERE type = 'theme'

Looking at the schema, there is no index on the type column or status columns. Is there a downside to adding them? Is it too few rows to be helpful?

pwolanin’s picture

Ok, here's an initial list of indices that look pointless:

node: nid since PRIMARY is (nid,vid)

node: status since node_status_type is (status, type, nid)

term_hierarchy: tid since PRIMARY is (tid, parent)

term_node: tid since PRIMARY is (tid, nid)

phpmyadmin thows up error messages for all of these while browsing the tables.

nicholasThompson’s picture

Nice spot on the extra ones. Might be worth doing a check on all the indexes in the database. They're probably not killing the CMS on any great level, but they're certainly not helping.

AFAIK, if you have a column with only a few different datatypes (for example: yes, no, maybe) then its not worth having an index. However, I've been wrong before.
There were some usefull tips brought up at another forum I use...
http://forum.thegamecreators.com/?m=forum_view&t=85090&b=2

pwolanin’s picture

Status: Active » Needs review
FileSize
2.08 KB

patch attached that removes the 4 indicies listed above.

pwolanin’s picture

FileSize
2.13 KB

oops- stray comma in the above SQL- use this one instead

drumm’s picture

Status: Needs review » Needs work

How does this affect the results of EXPLAIN ... on some relevant queries? Can you paste before and after results?

pwolanin’s picture

I think a couple examples are layed out already above: http://drupal.org/node/76040#comment-119974

However, here are a couple more examples:

current schema:

mysql> EXPLAIN SELECT * FROM node WHERE status = 1
+----+-------------+-------+------+-------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys           | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+-------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | node  | ALL  | status,node_status_type | NULL |    NULL | NULL |  162 | Using where |
+----+-------------+-------+------+-------------------------+------+---------+------+------+-------------+

after dropping the status index:

EXPLAIN SELECT * FROM node WHERE status = 1;
+----+-------------+-------+------+------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys    | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | node  | ALL  | node_status_type | NULL |    NULL | NULL |  162 | Using where |
+----+-------------+-------+------+------------------+------+---------+------+------+-------------+

current schema:

EXPLAIN SELECT * FROM node WHERE status = 1 AND nid > 50;
+----+-------------+-------+------+-------------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys                       | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+-------------------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | node  | ALL  | PRIMARY,node_status_type,nid,status | NULL |    NULL | NULL |  162 | Using where |
+----+-------------+-------+------+-------------------------------------+------+---------+------+------+-------------+

after dropping nid and status indices:

EXPLAIN SELECT * FROM node WHERE status = 1 AND nid > 50;
+----+-------------+-------+------+--------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys            | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+------+--------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | node  | ALL  | PRIMARY,node_status_type | NULL |    NULL | NULL |  162 | Using where |
+----+-------------+-------+------+--------------------------+------+---------+------+------+-------------+

I think what this shows is that there is no point in having a status index at all, since it's never being used!

and finally, current schma:

EXPLAIN SELECT * FROM node n JOIN term_node tn ON tn.nid=n.nid WHERE n.status = 1 AND tn.tid = 3;
+----+-------------+-------+------+-------------------------------------+---------+---------+------------+------+--------------------------+
| id | select_type | table | type | possible_keys                       | key     | key_len | ref        | rows | Extra                    |
+----+-------------+-------+------+-------------------------------------+---------+---------+------------+------+--------------------------+
|  1 | SIMPLE      | tn    | ref  | PRIMARY,nid,tid                     | PRIMARY |       4 | const      |    7 | Using where; Using index |
|  1 | SIMPLE      | n     | ref  | PRIMARY,node_status_type,nid,status | nid     |       4 | dp1.tn.nid |    1 | Using where              |
+----+-------------+-------+------+-------------------------------------+---------+---------+------------+------+--------------------------+

deleting the 3 indices as above (n.ni, n.status, tn.tid)

EXPLAIN SELECT * FROM node n JOIN term_node tn ON tn.nid=n.nid WHERE n.status = 1 AND tn.tid = 3;
+----+-------------+-------+------+--------------------------+---------+---------+------------+------+--------------------------+
| id | select_type | table | type | possible_keys            | key     | key_len | ref        | rows | Extra                    |
+----+-------------+-------+------+--------------------------+---------+---------+------------+------+--------------------------+
|  1 | SIMPLE      | tn    | ref  | PRIMARY,nid              | PRIMARY |       4 | const      |    7 | Using where; Using index |
|  1 | SIMPLE      | n     | ref  | PRIMARY,node_status_type | PRIMARY |       4 | dp1.tn.nid |    2 | Using where              |
+----+-------------+-------+------+--------------------------+---------+---------+------------+------+--------------------------+

Here there is some difference in the outcome.

nicholasThompson’s picture

Nicely explained!

Another issue is that having the extra keys causes extra file system writes. Although these are small, if a system is under heavy load and you're trying to make the very most of what you have then you dont want to be wasting precious resources.

pwolanin’s picture

FileSize
3.33 KB

A couple more examples attached. I'm trying to decipher these results. It looks to me that mysql may have to do a little more work with these keys removed?

magico’s picture

Title: Pointless and missing(?) database keys » Optimize database tables keys and indexes
magico’s picture

Title: Optimize database tables keys and indexes » Optimize/fix database tables keys and indexes
Version: x.y.z » 5.x-dev

This is a priority!

More information about duplicate indexes:
* http://drupal.org/node/58942
* http://drupal.org/node/65456

mdupont’s picture

Status: Needs work » Closed (fixed)

Now that D5 has long been released and reached end-of-support, and that D8 is in development, I assume this issue can be safely closed.