There is no reason for the thread_id column to be in pm_index. In there, it is duplicates *a lot* because it does not change for the same mid.

So we should move it to the pm_message column.

Follow-up of #1192250: Slow queries and other performance bugs

CommentFileSizeAuthor
#3 issue-1193232-move-thread-id.diff11.9 KBte-brian
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

te-brian’s picture

Started working on a patch for this. I wrote an update script to add the thread_id field and migrate the data. I haven't removed the thread_id from pm_index (and there may be reason to keep it in both tables).

Here is a quick preview of one query done against my development data-set.

explain SELECT pm.mid FROM gac_pm_message pm INNER JOIN gac_pm_index pmi ON pmi.mid = pm.mid WHERE pmi.thread_id = 24023 ORDER BY pm.timestamp DESC LIMIT 1;
+----+-------------+-------+--------+-------------------------------+--------------+---------+-------------+-------+---------------------------------+
| id | select_type | table | type   | possible_keys                 | key          | key_len | ref         | rows  | Extra                           |
+----+-------------+-------+--------+-------------------------------+--------------+---------+-------------+-------+---------------------------------+
|  1 | SIMPLE      | pmi   | ref    | PRIMARY,messages,participants | participants | 4       | const       | 40518 | Using temporary; Using filesort |
|  1 | SIMPLE      | pm    | eq_ref | PRIMARY                       | PRIMARY      | 4       | gac.pmi.mid |     1 |                                 |
+----+-------------+-------+--------+-------------------------------+--------------+---------+-------------+-------+---------------------------------+


mysql> explain SELECT pm.mid FROM gac_pm_message pm WHERE pm.thread_id = 24023 ORDER BY pm.timestamp DESC LIMIT 1;                                              +----+-------------+-------+------+---------------+-----------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key       | key_len | ref   | rows | Extra                       |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+-----------------------------+
|  1 | SIMPLE      | pm    | ref  | thread_id     | thread_id | 4       | const |  185 | Using where; Using filesort |
+----+-------------+-------+------+---------------+-----------+---------+-------+------+-----------------------------+


mysql> explain SELECT pm.mid FROM gac_pm_message pm WHERE pm.thread_id = 24023 ORDER BY pm.mid DESC LIMIT 1;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | pm    | index | thread_id     | PRIMARY | 4       | NULL |  167 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

As you can see dropping the join is an immense improvement.. and if we can order by mid instead of timestamp then its a complete win!

To give you an idea of how important improving even just this query is .. here is this query being run on my live data :(

# Time: 110627 23:20:23
# User@Host:
# Query_time: 7.806810  Lock_time: 0.000097 Rows_sent: 1  Rows_examined: 1385350
SET timestamp=1309231223;
SELECT pm.mid FROM gac_pm_message pm INNER JOIN gac_pm_index pmi ON pmi.mid = pm.mid WHERE pmi.thread_id = 24023 ORDER BY pm.timestamp DESC LIMIT 1;

Ouch!! 1,385,350 rows examined.

The reason we may still need thread_id on pm_index is queries like this that are using a very specific index:

"SELECT COUNT(*) FROM {pm_index} WHERE thread_id = %d AND is_new = 1 AND recipient = %d AND type IN ('user', 'hidden')"
te-brian’s picture

Still working on some stuff... however ... I discovered that the above query is not from privatemsg.module. It is something that is in one of my custom site modules that displays the latest replies to a user's pms. If I want to keep the functionality I have, I may still need to add thread_id to pm_message, but from my full comb-over of the code base, it seems that a majority of the queries that involved thread_id are only on the pm_index table, or join the pm_index table to other tables like the pm_tags_index table.

I think in only a few circumstances the pm_index table is joined to the pm_message table (where only the thread_id is needed, and not other fields).

This still warrants some investigation and will obviously be affected by any future implementation of pm_thread table.

te-brian’s picture

Here is the patch I am doing my testing with. Like I said .. there isn't much in the module to actually warrant the thread_id move. It may have been a false alarm raised by my custom code.

Berdir’s picture

I pretty much came to the same conclusion, the queries that profit from this are the message load query and a query in pm_attachments. Other queries would get worse.

However, this might change after we can get #744374: Introduce a pm_thread table and #647212: Load participants in thread list in a single but separate query in, so we probably want to postpone this until those two issues are resolved.

oadaeh’s picture

Issue summary: View changes
Status: Active » Closed (won't fix)

This issue is being closed because it is against a branch for a version of Drupal that is no longer supported.
If you feel that this issue is still valid, feel free to re-open and update it (and any possible patch) to work with the 7.x-1.x branch (bug fixes only) or the 7.x-2.x branch.
Thank you.