I am really tired banging my head at the absent pm_thread table. It makes Views integration big pain. I suggest to make following changes:
1) Create "pm_thread" table with columns containing thread properties, which currently are calculated by aggregate queries.
2) Move thread subject (which is really a thread property, not a message property) from "pm_message" to "pm_thread". Maybe move something else too.
3) Update "pm_thread" on every pm_message and pm_index change. Wrap the whole action (including hook invocation) in transaction. Do not run aggregated queries but instead query pm_thread directly.
Please let me know what do you think about it, cause that sounds like it's easier to rewrite it from the scratch :-/
| Comment | File | Size | Author |
|---|---|---|---|
| #51 | 744374.patch | 26.9 KB | berdir |
| #47 | pm_thread5.patch | 26.97 KB | berdir |
| #46 | pm_thread4.patch | 26.33 KB | berdir |
| #40 | pm_thread3.patch | 25.56 KB | berdir |
| #35 | pm_thread2.patch | 30.44 KB | berdir |
Comments
Comment #1
litwol commentedwhat features/queries depend on current schema and which will benefit from schema you are describing ? unless there is a clear and thorough pros/cons list this issue will be set to 'won't fix'
Comment #2
berdirI have thought about this too. Patches are welcome and I will maybe work on this, but other things are currently more important, it is surely a candiate for a 2.x though.
We need to test this but I agree that this *could* make things a lot easier. However, note that there are several issues that will probably make this more complicated than you might currently think:
- We definitely need an entry in that table per user. Not all users may have received all messages, due to blocking and more obvious, they might have deleted some of the messages.
- I don't think we can (easily) use transactions in Drupal 6.... ;)
- We obviously need an upgrade function to initially fill up the pm_thread table.
- #555016: Planning for sending messages to roles. will make this more complicated... (we should get away with only adding personal (user/hidden) recipients to the pm_thread table though.
- Some users don't want threads at all, and currently, this is possible with a few hooks, see #727588: disable threaded view for example and I think such a table would break that. But maybe they could use privatemsg_views then for such a use case (Note that it is more complicated than just removing the group by).
- What about the flushing feature, can we maybe remove the row in that table if all messages of a thread have been deleted...
Agree too, I think we initially didn't do this because migrated data from D5 versions had a different subject per message and the initial idea of this module was to be able to switch between threaded view and single message view. We haven't done anything like that though and only a single subject is ever displayed, so...
Overall, I think it would be awesome if we can get this right, since the thread view could be a rather fast, simple query on a single table (except things like tags and so on..). Just thinking out loud what we'd need in that table
- thread_id (I think we can stick with the current approach that is, thread_id = mid of first message of that thread)
- uid
- number of (undeleted) messages
- number of unread messages
- last updated timestamp
- thread started timestamp (We currently offer that as a possible field to display in the message view, not enabled by default, though)
- Subject (only here)
- Anything else? We need to think about the recipients, but we can't just place them into the table directly, obviously... I suppose #647212: Load participants in thread list in a single but separate query could help here too.
Comment #3
litwol commented"and the initial idea of this module was to be able to switch between threaded view and single message view. " << this is still very much a wanted feature by some. although this could be a sub-module which implements a limit to how many messages can be per thread (ie 1 msg per thread), wasn't this part of the scope of quotas module ?
Comment #4
berdir@litwol
The thread list could *really* benefit from such a table, no need to set this to won't fix :)
Comment #5
berdirWe're too fast ;)
Yes, quota is a possible solution, but that will "break" (by never displaying it) the reply feature. Another option could be privatemsg_views module, if it gets stable enough.
Comment #6
crea commented@Berdir
I like your positive feedback. Actually Views / VBO integration is so painful that I even had an idea of complete rewrite. I don't need many features of PM but I like simple code and tight integration, which means automatically that I should use Views and VBO. But I also don't like to duplicate efforts so that's why I didn't start my own module before submitting this issue.
Privatemsg duplicates that combo. For example, you have hook_privatemsg_thread_operation() which repeats hook_node_operation() pattern. Funny thing is, VBO supports hook_OBJECT_operation() too! That means, as soon as you described your object to VBO, it can automatically invoke your hook_OBJECT_operation()! The only thing it requires is that you work on "base table" and "primary key" is "object ID".
If you instead of reimplementing existing modules' functionality tried to integrate with Views+VBO, you would be surprized how EASY it is to have UI out of the box with very little amount of code. The only limitation that stops that magic to happen are "hidden" thread properties and lots of GROUP BY queries. Views needs threads table to use as "base table", and VBO needs primary key of base table to pass object ID to it's operations. Currently I can integrate VBO, but I still show message list, reduced to 1 msg per thread by tricky JOIN. Also there are 2 types of actions: thread-related and msg-related, and I can't tell VBO which ones to use, since I always work on single "base table", and VBO will probably always show all actions as available (small usability problem).
In fact, "refresh on update" pattern is a common one. In most cases it's more efficient to update counts and other aggregate values on INSERT and UPDATE, instead of using heavy SELECT queries. Flag module, for example, updates flag counts on every flag operation. The only problem could be inconsistency between tables, if an operation fails in the middle.
It is as easy as db_query('begin|commit|rollback'). There can be problems if you use nested transactions (hidden auto-commits) but this could be workarounded simply by providing transaction as an option for advanced users, disabled by default. Am I missing something ?
Not only that, but you could also throw away own query builder once full Views integration is available.
Comment #7
berdirI am fully aware that Views integration is very hard with the current database schema, that's why I "refused" to work on it ;)
I also agree with you that it's a common pattern, I've just outlined some things that we need to handle when doing this. As I wrote, I've thought about exactly the same things too.
Only if we replace every single query with a view and require Views. And I don't like either of this ;) Note that I'v already thrown away the query builder in D7 in favor of DBTNG.
What about the fact that MyISAM (Which is probably serves more than 90% of the sites with Privatemsg out there) doesn't support transactions? ;) Again, different story for D7 which provides db_transaction() and defaults to InnoDB.
Comment #8
berdirOne last thing for today...
Just wondering, can VBO handle multi-column primary keys? I wouldn't be surprised if not and even with the pm_thread table, you still need both thread_id and uid. That's one of the most problematic things in a private messaging system. Almost every action/display is user-specific, among other things, it makes caching pretty much useless.
Comment #9
crea commentedWell, MyISAM users are sort of "turn it on and hope that it works". There are many places in Drupal with inconsistency windows simply because Drupal itself doesn't use transactions or foreign key features. Privatemsg already has such window by having separate pm_message and pm_index tables which could end up inconsistent.
We could also have "self-healing" logics, such as peforming consistency checks on cron once per day.
UPD: We may not even need transactions, but rather locking framework to lock on thread_id
Comment #10
crea commentedI think in current state it can't. It simply uses Views' primary key of "base table", and Views needs real field, not compound key.
Yes, for simple actions you don't need uid: in most cases it's current user. But for advanced ones, such as when you want to perform action "on behalf", you would need uid too. So we could make a primary key not "tid" field, but "tuid" which would mean combination of "tid" and "uid". Same as Flag module which has "fcid" which is sort of abbreviation for "flag content id" which means "flag instance applied to single piece of content". I guess this all needs more thought.
VBO operations and actions also can have parameters, which could allow to provide additional info such as uid, I guess.
As for caching, Views has caching already. You could easily cache both message listings and thread listings by implementing own cache plugin which inserts $uid in the cache id and clears cache when needed.
Comment #11
crea commentedBtw one small issue which I discovered during Views integration: with current DB schema, because old messages in threads can be deleted, you can't find whether a user was an author (sent anything) in a thread or simply recipient. You can only know if he was a participant. That means we may need to store some history, even after message is deleted.
Comment #12
crea commentedWe could make 2 additional tables instead of 1:
1) "pm_thread" with thread_id as primary key which would hold thread properties same for all users (e.g. "subject", "updated").
2) "pm_thread_index" or something like that, with compound primary key ("thread_id" + "uid") which would hold thread properties per user.
I think this is even better approach than single table.
Comment #13
berdir- Missing transaction support is not an argument against a pm_thread table. I simply meant that we cannot use/rely on transactions. We already insert multiple rows as you've said. Locking might be possible/required, but that requires Drupal 6.16+
- Yes, privatemsg.module only supports actions for the current user as well, so that would be enough for now.
- Using two tables would be a more normalized approach, but the only thing you could store there is the subject (last updated or even started might not be the same for all participants). Not sure if that's the right approach, we need to test it.
- What I said in #5 was wrong, privatemsg_quota allows to use the reply form and then sets the thread_id so that a new thread is created.
Comment #14
crea commentedWhat's the problem with that ? Updating to the latest minor release is a common "best practice" since you also get critical bugfixes.
"Last updated" would be same for all users. As uid-specific values I suggest "last sent" and "last received" columns in the "pm_thread_index" table. These values are very useful for sorting: when user replies to a thread, you don't want the thread to go up in his "Inbox", but _do_ want it to go up in his "Sent". Thus in the end you want all 3 timestamps ("last updated", "last sent", and "last received") to sort differently.
Another candidate for "pm_thread" table: "number of participants" column. It is same for all users too and could be useful for sorting too.
Comment #15
berdir- No, last updated is not the same for all users. For example, imagine a thread between user A, B, and C. A sent a message to B and C. B replies, but C has blocked messages form B. So only A receives that reply and last updated is only updated for A and B, not C.
- I've just commited a tag based solution for inbox handling, which also allows to remove threads from the inbox
If we just add calls to lock_* functions, then it will die with an fatal error for users with older core versions (and that will result in support requests), so we need to either require them to upgrade (D6 doesn't allow to specific versions in info files) or simply only use the locking feature if the required functions exist.
Comment #16
berdirAgain, just to be clear. What I'm talking about are not reasons against a pm_thread table but things that we need to be aware of and handle correctly when implementing this.
Comment #17
crea commentedHmmm. ok. Then probably global "last updated" doesn't have any usage (unless you want privileged user who can't be blocked browsing messages). But we still need user-specific "last updated".
Comment #18
naheemsays commentedsubscribe.
Don't we need the UID for everything? in order to not "leak" messages or am I missing something?
EDIT -
1. Considering the limits mentioned of this approach, would it still help with views integration?
2. How about views 3? I hear that has "group by" support. does that make things easier?
3. Can Views be forced to abandon its data collection method and to call the privatemsg queries/functions instead?
Comment #19
crea commentedThat doesn't mean you don't need a user - you do, but you don't need to provide one to action as an argument. Actions simply use current user. There could be advanced actions such as "make message X unread for user Y" but there are none currently.
What limits exactly ?
Well, it could make things sort of easier, BUT it still leaves us with threads as hidden objects in the database. I think having separate table(s) for threads will also help in understanding this module by other developers. So this change could benefit not only to Privatemsg Views.
That simply doesn't make sense. Views is a very tightly integrated system, and it's query builder is the heart of the system.
EDIT: In Views 3.x you *can* provide your own query backend, but I think that makes things MUCH more complicated than having proper DB schema from the start ;)
Comment #20
naheemsays commentedJust ignore me there, I was trying to figure things out and get them to make sense in my head.
In the case of a pm_thread table, would there also be a pm_index?
if so, would there need to be a need for calling an additional table for things like showing threads (or listing them)?
How would the queries be altered?
Comment #21
berdirWell, we do have a read all private messages permission (which is one thing I forgot to mention in my first response and this will need to be resolved somehow). Not sure how we want to handle that, we'd have to fully duplicate your pm_thread and pm_thread_index tables except of uid (only index) and subject (only pm_thread). Or simply handle these the old way somehow.
Yes, pm_index stays as it is now. The only difference to the existing schema would be the removed subject in the pm_message table. pm_thread would just contain the aggregated data of pm_index/pm_message (except a few things like participants) so that we could query that information faster. That information is updated when a change in pm_index/pm_message happens. You can think of pm_thread as the result of the current list query (again, except some things like participants) for all users.
I can't really parse the first question :)
Regarding the second, the new list query would more or less be like that (given that we only have pm_thread and not pm_thread_index) "SELECT * FROM {pm_thread} WHERE uid = %d ORDER by last_updated". Of course, many other queries probably have to be updated too.
In general, this approach would make data changes a bit slower, because we have to keep pm_index up to date but data selection, especially the thread list *a lot* faster.
Comment #22
crea commentedAnother candidates to the "pm_thread_index" (per-uid) table: "fist message id", "last message id", "first unread message id".
Comment #23
naheemsays commentedhm... This could potentially allow people to rename the thread in their message view (and only there) if they wish... cool potential feature?
So what columns do we want?
Thread_id
uid
subject
started
last_updated
unread_count
Would that be enough? (would it make some queries possible, others faster?)
Just moving (the topic forward to actuals so that it eventually gets done... if we decide to.)
Comment #24
berdirPostponing this on #555016: Planning for sending messages to roles. and #502664: remove duplicate records from pm_index and add primary key, I will look into this after these two are in. If someone else wants to try it, go for it but I will require a huge re-roll (as in go to "Start" and try again) after 555016 is commited, so it might be better to help there first.
Comment #25
berdirAlso, updating the title to reflect the actually proposed change
Comment #26
berdirCross-linking: We probably also want a last_author column, especially if #771676: who was the last user responding to a private message? (message answered flag) gets in.
Comment #27
berdirThe two issues this depended on are now commited, so setting back to active.
However, I played around with it a bit and I'm not sure if we can make it work. The update costs might just be too big. For example, with the roles recipients, it is now possible to send a message to 100k or more users. And even if we don't handle the participants somehow in that table, the update costs are huge if someone answers to that thread unless we can come up with very intelligent ways to handle these cases.
Comment #28
iLLin commentedHey Berdir,
Is this still on the table? I am debating on doing this but don't want to waste effort if this won't work. Thoughts?
Comment #29
berdirIf we can get it working, it would be awesome, but I'm not sure about that, see #27.
I can upload the code I have if you're interested in working on it.
Comment #30
iLLin commentedYes, give me what you got and I will get this going.
Comment #31
iLLin commentedMaybe we can setup a branch? If not no worries, I'm digging into code now.
Comment #32
berdirCan't do that on drupal.org because branches/tags are limited to be used for releases.
If you're familiar with git, I'd suggest you use this git mirror: http://git.drupalfr.org/cgi-bin/gitweb.cgi?p=contributions-new-date/priv...
You can then push your changes to a github project, that's what I'm doing/have done when working on bigger things like porting modules to D7. That allows me and others to follow your work.
Attaching the patch of what I've done so far. I've made an initial definition of the pm_thread table and started to work on getting the information into that table when new messages are posted. You have to be aware that this is really non-trivial stuff because of all the fancy features we're having in the 2.x-dev branch (for example, sending a message to a role and inserting real recipients in a batch run or hook_cron().
I would follow the following while implementing this:
1. Define (and extend if necessary) the schema, should be mostly done
2. Extend the API to update the pm_thread table when new messages are sent
3. Update the load/list/count queries to use the pm_thread table
4. Write update function to fill pm_thread for existing messages
5. remove subject column from pm_message (including necessary changes).
We have a pretty good test coverage, make sure to run the tests frequently, especially on step 3. Good luck ;)
If you have questions, ask here or join #drupal-games, where I usually am.
Comment #33
BenK commentedSubscribing...
Comment #34
YK85 commentedsubscribe
Comment #35
berdirGave this another try.
I think I was able to get this working pretty nicely. Here is what this patch does:
- Adds two new tables, pm_thread and pm_thread user. pm_thread contains the subject and the thread_id only. pm_thread_user contains information for a specific user like last_sent, last_received, count, unread_count.
- The pm_thread_user table is updated every time something changes in a thread: a recipient added, messages marked as read/unread, messages deleted, new replies and so on. There are two ways to update a thread: _privatemsg_thread_refresh($thread_id, $uid = NULL) allows to create or re-create the information for a thread, optionally limit to a specific user. Then there is _privatemsg_thread_update(), which is an API that allows to change stuff. For example, deduct 'count' by one if a message is deleted for all users. Or update last_sent to a timestamp for the author of a message
- Then, the list and unread queries read directly from that table, which is a lot faster than the old queries.
- There is an update function that fills pm_thread and pm_thread_user
It is safe to test this patch because no information is deleted (yet, we want to remove the subject column in pm_message later on), so please do!
TODO:
- Right now, there is still a group_concat subquery in the list, I wasn't able to remove that yet, see also #647212: Load participants in thread list in a single but separate query.
- I think we need to add all recipients to pm_thread_user (and rename the table to pm_thread_recipient), this will also make the participants query from above faster and easier
- The update function currently consists of two INSERT INTO ... SELECT ... queries. They are quite fast but with a lot of data, they could maybe time out. On my laptop (pretty fast, with ssd and stuff), a database with 900k rows in pm_index, the pm_thread query took 2,2s and the pm_thread_user 8,8s. Please try this out on your data set.
All tests are passing (including sending to roles and so on!) but our tests don't cover everything, this needs *a lot* of testing.
Comment #36
BenK commentedHey Berdir,
Looking forward to testing, but quick question: How does the latest patch address the issues you raised in comment #27?
--Ben
Comment #37
berdirI haven't done an tests with 100k recipients yet, but it should work fine. The main reason is that the information in pm_thread_user doesn't depend on the number of recipients so each added recipient is handled on it's own and doesn't need any additional processing after all recipients were added.
Comment #39
te-brian commentedre #36, #37
Also, I believe role send-outs are done with a batch api implementation.. so a lengthy processing time can only be expected if you are sending a message to your whole site :)
Reads should be OK because only one user's view of the threads is ever rendered (afiak)
Might try this out now that I am finally getting back up to date with the latest commits.
Comment #40
berdirUpdated the patch
- Used git diff because I have multiple local commits
- Rewrote the _privatemsg_thread_update() to an OOP-API, similar to DBTNG. This *might* break PHP 4 compatibility but honestly, I don't core. If you still use PHP 4, you're really on your own nowadays :)
- Added an IF() to avoid the above test exceptions. Not sure why they happen though, everything passed locally. different mysql version, maybe..
Comment #41
BenK commentedI successfully applied the patch in #40, but when I ran update.php, I received the following notices at the top of the update results page:
warning: array_merge() [function.array-merge]: Argument #2 is not an array in /Users/benkaplan/git/drupal-6.20dev/update.php on line 174.
warning: Invalid argument supplied for foreach() in /Users/benkaplan/git/drupal-6.20dev/update.php on line 338.
--Ben
Comment #42
BenK commentedI'm not sure if this is an issue with this patch or just an existing bug in the module... But I noticed that even if a user has disabled sending/receiving private messages, he is still able to reply to existing private message (even though the message at the top of the screen says he can't).
--Ben
Comment #43
BenK commentedAlso, I noticed that if you are a participant in a thread with someone who has disabled messages, then the following message is displaying at the bottom of the thread as raw HTML code like this:
Again, I'm not sure if this is actually related to the patch or a pre-existing bug.
--Ben
Comment #44
BenK commentedI tested sending, receiving, and replying after applying the patch (including doing so with individual users and with roles) and everything seemed to work smoothly. Deleting messages and blocking/unblocking users worked well, too.
The only thing I noticed was a slight discrepancy in the count of new messages. When testing, I sent two new messages on the same thread. The recipient showed 1 new message (for the thread as a whole) in the navigation menu and at the top of /messages. But in the "Messages" column at /messages, it showed 2 new messages (one for each new message on the thread). I can't remember if this was an issue that pre-dated this patch, but it's slightly confusing, I think, to the end user. Not sure if the new pm_thread table can help fix/clarify this.
--Ben
Comment #45
berdir#41: Will update that, it's missing a return array(); call.
#42: The view thread page is mostly unaffected by this patch. Can you verify if it's the same without the patch? If so, I suggest you open a new issue.
#43: I think that's dealt with in #1109524: Message outputs escaped html when recipient has disabled messages
#44: That is the old thing we've discussed already. The number displayed in the menu is the number of new/updated threads while the other number is # of new messages. I know that can be confusing. The problem is, the other way round is just as confusing or even more so: When it would report two new messages, you go to /messages and only see a single updated thread.
Comment #46
berdirNew patch:
- added return array() to fix the update error
- Also added last_sent information during the upgrade
Comment #47
berdirYet another new patch.
Fixes the count column in the update and refresh query and also handles the last_sent column in the refresh query, makes it unecessary to run the last_sent update query on a new thread.
Comment #48
catchSubscribing.
Comment #49
berdir#47: pm_thread5.patch queued for re-testing.
Comment #51
berdirRe-rolled the patch.
Note, when you had this patch applied before, you will need to remove the pm_thread* tables or set the schema version manually to 2608 to avoid conficts.
Comment #53
berdir#51: 744374.patch queued for re-testing.
Comment #54
marinex commented#51: 744374.patch queued for re-testing.
Comment #55
nagiek commentedI think for best testing, we need to apply #502666: devel_generate functionality.
Comment #56
berdir#51: 744374.patch queued for re-testing.
Comment #58
berdirWorking on a re-roll.
I've tested the upgrade path with 15k threads, 70k messages and lots of recipients. Update took ~7min, most of it in a single query which wrote 57k rows into pm_thread_user.
This is obviously not an option as there are much bigger sites out there. Which means that it needs to be split up by threads and use batch api.
That said, the upgrade did work well.
Comment #59
robloachPushing to critical as Berdir mentioned this should get in before alpha1 comes out for both 6.x and 7.x.
Comment #60
igorik commentedHi Berdir,
can you please share with us if there any news about this issue (D7)?
It looks like that this is release blocker for 3 years,
can you please let us know about current status of this issue and what/how can help you to close this and move on?
Would be great to have fixed it and move on.
Thank you and have a nice day
Igor
Comment #61
duaelfrWe need a release for translations purpose.
Is this issue still so important ?
Comment #62
sol0matrix80 commentedis this patch in the new 7x2 dev or devs dont care about this anymore is ?
Comment #63
muschpusch commentedi think it's more a lack of time but what's with you? Wanna provide a patch for the new branch?
Comment #64
ptmkenny commentedThe patch is not in the latest 2.x-dev; it needs to be re-worked for 7.x, which is likely to be a lot of work. It would be great if someone from the community stepped forward to do this, as the maintainer is currently quite busy :)
Comment #65
opdaviesComment #66
oadaeh commentedI'm moving this to the 7.x-2.x branch, as 6.x is no longer supported and this feature is still desired.
Comment #67
ivnishComment #68
ivnishWe need to focus on the version for Drupal 9 (which I have 90% ready), and not the ancient versions for Drupal 7, which no one wants to support for more than 5 years.
Comment #69
andypostDrupal 7 is maintained, and the issue is 12 years old... it must have comment why it's outdated
@ivnish don't hijack issues if you have no clue in them, moreover respect other follower's time to prevent banning you
Comment #70
andypost@ivnish please focu on whatever you want to but respect other people
Comment #71
berdirYes, no reason to close all old D7 issues, just ignore them.
Comment #72
ptmkenny commentedYeah, Drupal 7 version is still in active use on many sites, including mine.
You can easily filter for issues of a certain branch, so just bookmark a page with the 7.x issues hidden if you don't don't want to see them.
Comment #73
ivnishAutomatically closed because Drupal 7 security and bugfix support has ended as of 5 January 2025. If the issue verifiably applies to later versions, please reopen with details and update the version.