Last updated August 26, 2009. Created by grub3 on August 21, 2009.
Edited by Damien Tournoud. Log in to edit this page.
Multi-column indexes can be rather tricky to understand. An index on several columns is not symmetric and doesn't work equally on every field. For example, an index on a dual field:
CREATE INDEX foo ON table_name (bar1, bar2);is not equal to two indexes on single field:
CREATE INDEX bar1 ON table_name (bar1);
CREATE INDEX bar2 ON table_name (bar2);A database server can only use this index to optimize conditions that are on the leftmost part of the index. For example, the following query will not use this index:
SELECT * FROM table_name WHERE bar2 = 'foo'But those two will:
SELECT * FROM table_name WHERE bar1 = 'foo' AND bar2 = 'bar'
SELECT * FROM table_name WHERE bar1 = 'foo' AND bar2 > 'bar'Some databases claim that a dual index works equally right and left. This is not portable and probably not true, as multi-column indexing may result in a sequential scan in complex queries:
Reference: http://www.postgresql.org/docs/8.4/static/indexes-multicolumn.html
A multicolumn B-tree index can be used with query conditions that involve any subset of the index's columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
According to http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html:
MySQL cannot use an index if the columns do not form a leftmost prefix of the index.
A good decision on wether to use multi-column indexes will take into account the cardinality of the data. Sometimes a multi-column index is simply the right answer, particularly if the values in the first column of the index do not limit the rows sufficiently.
Also in where exists clauses the rdbms will not even load the row if the primary key is the composite index, but it will have to.
Indexes are best thought of like caches, you use the right caching mechanism for the job, and you only do it when you can demonstrate performance improvement.
Therefore the proposed solution to enhance Drupal is to hunt for sequential scans and add single indexes when needed.
Comments
This seems wrong: But it does
This seems wrong:
According to http://www.postgresql.org/docs/8.3/static/indexes-multicolumn.html a multicolumn "index can be used with query conditions that involve any subset of the index's columns".
And this seems wrong:
According to http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html "MySQL cannot use an index if the columns do not form a leftmost prefix of the index."
There may be some edge cases but overall the MySQL and PostgreSQL multicolumn index implementations seem fairly compatible in my reading of the docs.
According to : A multicolumn
According to :
Thanks for pointing out this page, which I did not know.
My point of view comes from the repetitively checking query plans on a large 500.000 database and it seems that dual indexes are not used by the planner when searching on single columns. The database will prefer a sequential scan.
So in most cases it would be safer to set two seperate indexes instead of a dual index.
I am going to rewrite this
I am going to rewrite this page. I believe that we should make a minimal use of dual indexes and prefer single indexes.
You're remarkably wrong, but
You're remarkably wrong, but documentation comments are the wrong place for this debate.
multi-column indexing may
multi-column indexing may result in a sequential scan in complex queries with multiple JOINs. Thus removing the interest of indexing. I added the information on the page.
Please note
Please note that I rewrote all this page, so that it now kind of make sense.
Thanks Damien. I would also
Thanks Damien.
I would also notice that multiple indexes cannot be used in JOINs, when single indexes can. Therefore, to make sure that a table can be joined easily, it is better to have multiple indexes.