Background
As an introduction, the comments module uses a thread column to properly sort threads not only in timestamp order but also based on what parents the thread has. When a comment has no parents (i.e. is a comment on the posting node) it has just the value, e.g. "148". These codes are generated using the int2vancode and vancode2int functions, and the goal is to have them sort in order as a string. Since strings sort differently than numbers, this works out rather ingeniously.
When the comment is about an existing comment, the thread value gets a dot and a second vancode is generated for what is now one level below the comment. So, lets say comment 148 gets a comment, its thread will be "148.00". If a second comment on 148 is added, it would be "148.01", and so on. Now if a comment on 148.01 is added, its thread would be "148.01.00".
The issue
I have a posting with a large number of comments (hence the vancode "148" in my example since they start at "00") . To be fair this came from my phpBB forum via the phpbb2drupal module I recently patched to fix how it assigned thread IDs.
I had a comment 148, a comment on 148 "148.00", and a comment on 148.00, which generated a "148.00.00".
The next comment made was on the original posting. What thread should it get? My guess is "149", but the software gives it "52mbrwh" instead.
The code should take the first value from before the first dot (if there are any dots) and then increment that, but instead it takes the whole value, dots and extra levels and so on, and increments it.
Am I missing something?
The fallout
This thread field is so large, and the number of threads is probably so few for a given topic, that this is never likely to be a problem. My work on the phpbb2drupal vancode generation problem just happened to have me wandering around the database long enough to notice this.
Should I make a patch and submit it?
Comment | File | Size | Author |
---|---|---|---|
#106 | 97327-d7.patch | 5.57 KB | xjm |
#103 | 97327-103.patch | 5.66 KB | xjm |
#102 | 97327-102.png | 7.95 KB | haydeniv |
#101 | 97327-99-2-postgres-test.png | 13.47 KB | haydeniv |
#100 | 97327-99-postgres-test.png | 9.7 KB | haydeniv |
Comments
Comment #1
PMunn CreditAttribution: PMunn commentedHere's a patch that takes care of the problem.
Comment #2
chx CreditAttribution: chx commentedI second the motion. I had similar problems with the new menu code which utilizes vancode, and in there I vancoded per level which is essentially the same as what this patch does. I won't RTBC any patch at 4am but I have a feeling this one is RTBC.
Comment #3
Gurpartap Singh CreditAttribution: Gurpartap Singh commentedThere seems no such issue with Drupal 5.x HEAD version. Attached is a screenshot of the thread row in comments table.
(Have not tested on Drupal 4.7.x.)
Comment #4
Gurpartap Singh CreditAttribution: Gurpartap Singh commentedHere's the screenshot.
Comment #5
Gurpartap Singh CreditAttribution: Gurpartap Singh commented$thread = int2vancode(vancode2int($max) + 1) .'/';
vancode2int() should take care taking out the part before the first dot. Have tried it on 3 digit thread values too.
Comment #6
Steven CreditAttribution: Steven commentedUm, is there an actual problem described here?
The conversion functions exist only as simple casts form and to integers. Properly using them to generate tree structures is the responsibility of the caller.
Comment #7
PMunn CreditAttribution: PMunn commentedWhile I'm happy that this doesn't occur in the 5.0 beta, the original problem in 4.7 remains. That's what the original patch is for.
Comment #8
PMunn CreditAttribution: PMunn commentedTagging as 4.7.4.
Comment #9
PMunn CreditAttribution: PMunn commentedAnybody want to vet/commit this for 4.7.4 so we can put this behind us?
Comment #10
Caleb G2 CreditAttribution: Caleb G2 commentedPatch seems to work on 4.7 head. Screenshot here:
http://highervisibilitywebsites.com/comment-ids.jpg
Will let someone else RTBC though...
Comment #11
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI've done the following:
Created a node
created 500 comments on that node
added followups to one of the comments
added a new comment on the node.
That last comment's thread is 1dx/ and looks perfectly normal.
I think that the import script you mention messed something up.
Comment #12
PMunn CreditAttribution: PMunn commentedKilles, you tested with this patch on 4.7.4 or without it?
I tested the scenario described in the initial posting, so I did see a problem.
Comment #13
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedwithout the patch
Also, since the code is the same in 4.7 and 5RC the issue should pop up in both versions.
Comment #14
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedHere is a modified version of generate-content.php which does what I did for my test, e.g create a node with 500 comments.
Beware: It will remove all other content from your site.
Comment #15
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedconsidering my extensive testing without any hint towards a bug, I mark this "won't fix".
Comment #16
PMunn CreditAttribution: PMunn commentedFinally I've got a very clear description of the problem. This came to me while examining the comment.module in the new 4.7.5 release.
Here's the code that creates a thread ID for a new comment that does not have a parent comment. That is, it is a comment on the node itself.
On Line 583 we see:
This code is wrong in the following circumstance:
- The last comment in a node is a comment-on-a-comment.
- Someone attempts to add a comment on the node.
If the last comment in a node is a comment-on-a-comment, its vancode has a dot in it. For example, "0a" is a thread ID of a comment on a node, while "0a.01" is the thread ID of the second comment on the 0a comment.
The code above increments the whole thread ID, and not the first segment of the thread ID when adding a new comment to the node. This is the bug.
By getting the first segment of the maximum, incrementing it, and using that as the new comment's thread ID in this circumstance, we ensure the next thread ID is correct.
See this in action with this PHP code in a node:
which generates this result:
Incidentally, the comment module also starts comments on a node at vancode "01" instead of "00". The attached patch fixes that as well in the 4.7.5 comment.module.
Comment #17
Gerhard Killesreiter CreditAttribution: Gerhard Killesreiter commentedmoving to HEAD to get more reviews. The code is the same in 5 and 4.7.
Comment #18
Gurpartap Singh CreditAttribution: Gurpartap Singh commentedThere isn't any problem with this in 5.x as discussed above. Pointing, in case the code is same.
Comment #19
PMunn CreditAttribution: PMunn commented5.1 seems to still have this problem.
I added a new node.
First comment on the node has a thread of 01/, and that's ok.
First reply on the First comment has a thread of 01.00/, and that's ok.
Second comment on the node has a thread of 2101/. This is too high.
First reply on the Second comment has a thread of 2101.00/, and this seems ok.
Third comment on the node has a thread of 410101/, and this is getting too big.
The following patch fixes it, again.
Comment #20
Zen CreditAttribution: Zen commentedPaul,
Thanks for persevering with this issue :) However, trying to replicate this issue in 5-dev with your provided steps:
does not realise any errors.
A dump of my comments table:
Please provide concrete steps to manually reproduce this on a clean database.
-K
Comment #21
PMunn CreditAttribution: PMunn commentedI took a look at the code in the 5.x comment.module and it looks identical to my patch. Looks like someone integrated it in without noting it here.
Do I need to wait for whenever enough security holes are found to release a 5.2 to get this into production?
Comment #22
PMunn CreditAttribution: PMunn commentedThe fix must have been abandoned, because 5.2 is now broken the same old way. The patch is identical for this version, but I've regenerated it anyway.
Pretty please integrate my fix? Please?
Once again, the test case is:
I added a new node.
First comment on the node has a thread of 01/, and that's ok.
First reply on the First comment has a thread of 01.00/, and that's ok.
Second comment on the node has a thread of 2101/. This is too high.
Once I saw the comment ID in the URL for the first comment, I just did this to check it in the database and see what the thread value was being set as:
Broken way:
Fixed way:
Comment #23
gdavid CreditAttribution: gdavid commentedthe problem still in 5.6
I made another solution
Comment #24
PMunn CreditAttribution: PMunn commentedThis is still broken on 5.6 and 5.7. I don't know whether to scream or cry. Or both.
gdavid's patch for 5.6 works with 5.6 and 5.7, which is broken in the same way on the same lines. It's a simpler patch than my own previous patches, so maybe this one will make it in.
6.2 is also broken in the same way from the look of the comment module code, but its patch will need to cover different lines.
Who can commit this into the next release so we can move on with our lives?
Comment #25
catchThis is almost certainly broken in 7.x too, so I'm moving it there in order that it get committed first, backported to D6, then to D5.
Patch doesn't apply to D7 though so marking as Code needs work.
Comment #26
PMunn CreditAttribution: PMunn commentedThis bug is still present in 5.10.
Comment #27
PMunn CreditAttribution: PMunn commentedConfirmed as a bug in 7.x dev as of tonight 2008-08-19, here is the patch to fix it.
Comment #28
PMunn CreditAttribution: PMunn commentedAnd this patch is for 5.10 in case anyone else wants it.
Comment #29
kscheirerI just tried the test in @22 on a drupal HEAD install, and everything seemed to work fine without the patch in @27. Is this related to php or other configuration settings in some way then? I got the same results after applying the patch, and the code does look a little safer, and the query should perform better with large numbers of comments, so I'm still in favor of committing to HEAD.
I added a new node.
First comment on the node has a thread of 01/
First reply on the First comment has a thread of 01.00/
Second comment on the node has a thread of 02/
Comment #30
Damien Tournoud CreditAttribution: Damien Tournoud commentedI understand the bug, and agree with the solution (except that I don't get why the cast to int could ever be necessary). But I *can't* reproduce it.
PMunn could you give as much details about your setup as possible?
Comment #31
Damien Tournoud CreditAttribution: Damien Tournoud commentedThinking about this a little more, I'm not convinced there is a bug here.
'.' is supposed to be < '/', so '01.00' < '02/' because string comparisons are typically done in lexicographical order.
Maybe the string collation you use breaks one of these two assumptions (either '.' > '/' or the comparison is done differently). But this will break the whole thread sorting algorithm anyway, so you have a bigger problem on your hands.
Which database engine and collation do you use?
Comment #32
PMunn CreditAttribution: PMunn commentedThe full test case is first described above at #16.
The important thing is to do the test in this order:
Create a Node, let's call it NODE-1.
Comment on this node, call it COMMENT-1.
Reply on COMMENT-1, call this COM-1-REP-1.
Comment on the node, call it COMMENT-2.
Observe the Thread IDs in the comments table for the three comments on that one node. Most notably the COMMENT-2 thread ID will be wrong. It will still sort properly, yes, but the thread ID is still wrong.
The patch gets the highest thread ID of all comments without a parent comment, that is, it gets the highest thread ID that is directly pointing to the NODE. It then increments that thread ID. The original, flawed code increments the entire thread ID of the last comment.
I adopted gdavid's patch because it's smaller and seems to work just as well instead of my own patch before his and used it to make the 5p10 and 7dev patches. The tactics in either patch, mine or gdavid's, are the same across all versions of drupal right now. The comment module hasn't changed lately.
If my solution will break the sorting at some point let me know how and we can close this up for good.
Comment #33
Damien Tournoud CreditAttribution: Damien Tournoud commented@PMunn: can you give us as most specifics as you can about your environment? (that's what I called "setup", above), database engine, database configuration, PHP version, etc.
Comment #34
Damien Tournoud CreditAttribution: Damien Tournoud commentedSome more research:
Comment #35
PMunn CreditAttribution: PMunn commentedOkeydoke:
PostgreSQL 8.3.3
PHP 5.2.6
Database encoding is UTF8
Sitting on a Fedora Core 8 box running kernel 2.6.25.14-69.
Comment #36
Damien Tournoud CreditAttribution: Damien Tournoud commented@PMunn can you confirm that the comment ordering is wrong at display time with your setup (please try both ascending as descending ordering)?
Comment #37
PMunn CreditAttribution: PMunn commentedThe comment ordering is correct.
This is a bug that doesn't exhibit itself visually to the end user. It's just an observation that the thread IDs are being generated incorrectly internally.
Comment #38
kscheirerI still think this patch should go in, regardless of being able to reproduce the bug.
The code is clean and the sql query is much more correct with the patch, and will execute faster on sites with a lot of comment threads.
Comment #39
maartenvg CreditAttribution: maartenvg commentedPatch no longer applies.
I, too, can't reproduce this bug. I tried it on a clean HEAD install and got
MySQL 5.0.5, PHP 5.2.4, Ubuntu
I do think that
$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d', $edit['nid']));
should become
$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d AND pid = 0', $edit['nid']));
because that makes more sense (you only need the parentless nodes to determine the max) and is less work when looking at thready comments.
Comment #40
PMunn CreditAttribution: PMunn commentedmaartenvg, your code change to that line fixes the problem in 5.11.
Comment #41
catchComment #42
jonskulski CreditAttribution: jonskulski commentedNoted, moved to http://drupal.org/node/320784
Comment #43
Damien Tournoud CreditAttribution: Damien Tournoud commented@johnskulski: good catch, that's a regression introduced by DB:TNG... but this has nothing to do with the thread at hand. Please open a new one with your patch.
Comment #44
PMunn CreditAttribution: PMunn commentedHere is an updated patch for the 5.12 release. Same code change, but the row it's on is a different row number so some work must've been done on the comment module since the 5.11 release.
Comment #45
Anonymous (not verified) CreditAttribution: Anonymous commentedThe last submitted patch failed testing.
Comment #46
maartenvg CreditAttribution: maartenvg commentedFollowing HEAD.
Comment #48
lilou CreditAttribution: lilou commentedTest failure : #335122: Test clean HEAD after every commit
Comment #49
Gábor HojtsyI think we should do "AND" and not "and" (markind CNW for this) but otherwise looks like a simple code change. Should give the next comment thread the right incremented thread number instead of taking the whole thread structure from the last comment.
Comment #50
Gábor HojtsyBTW I've been pointed to here on the Hungarian Drupal conference this past weekend and got test results from gdavid there (on postgresql), which shows that the third comment on the same node does indeed get a wrong thread identifier.
Comment #51
maartenvg CreditAttribution: maartenvg commented#35 also refers to Postgresql, so I think we can safely assume that the problem lies there. And that is also the reason why I (and others) couldn't reproduce it, because most people use MySQL.
Attached patch with capitals for "and" per #49.
Comment #52
catchPatch looks great. Since this is environment dependent I don't think it's viable to write a test for it (and it's also not harmful, just a bit odd), so marking RTBC.
Comment #53
webchickI discussed this some with DamZ on IRC.
He makes the case that while the patch will fix the symptom of the problem (and is a performance improvement to boot), there is obviously still something funny going on with threading on some systems.
Drupal has 4 threading modes (threaded desc order, unthreaded desc order, threaded asc order, and unthreaded asc order), and while this messed up value in pgsql might not have user-facing consequences when using whichever mode phpbb2drupal uses (probably unthreaded asc), it likely would have user-facing consequences on one or more other modes.
We have a couple very simple tests for threading as part of testing comment replies, but we lack comprehensive tests for all threading modes for comments.
I'm therefore marking this CNW so that we can get those in place, which should hopefully help us figure out the "root" cause of the problem on pgsql at the same time.
Comment #54
PMunn CreditAttribution: PMunn commentedOn a side note, here's a patch that fixes this for 5.18, where it is still a problem. Patch does the same thing as previous one.
Comment #55
gdavid CreditAttribution: gdavid commentedthis is still problem in 6.x. It's not too good. This is 3year old problem. And only need " AND pid = 0" text to the module code. Only 12 characters (with spaces). It's ridiculous a little bit. What work need more to accept these changes?
Comment #56
wretched sinner - saved by grace CreditAttribution: wretched sinner - saved by grace commented@gdavid - see webchick's comments in #53. To commit this to 6.x, it needs to be committed to 7.x first. To commit it to 7.x, we need tests so that comment threading doesn't break in the future. If you can help write these tests, we can get this issue committed quickly
Comment #57
gdavid CreditAttribution: gdavid commentedthe root problem is it: (postgresql)
Why? look at this:
I hope this can help to show the ROOT problem, why working wrong the comment threading in PG.
Thanks.
Comment #58
PMunn CreditAttribution: PMunn commentedAre you trying to say that the forward slash is causing the sorting problem? Can we just change the terminator to another character? My results for running those hashchar calls, which doesn't look meaningful to me:
The code change is needed to create and properly increment the segments of the thread ID. This isn't disputed.
I get it that this seems to have exposed a sorting problem particular to PostgreSQL.
Comment #59
gdavid CreditAttribution: gdavid commentedI don't understand. Why dosen't enoght good this syntax?
$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d AND pid = 0', $edit['nid']));
why whould be better solution to change the terminator character to another? " AND pid = 0" 12 characters (with spaces) no other script, no transformation. it will run on mysql, postgres, oracle, mssql and others too.
Comment #60
PMunn CreditAttribution: PMunn commentedgdavid, I agree this fix is simple and should be implemented.
However, it seems that some have found a sorting problem in PostgreSQL even when all is working properly.
I vote we split that into a separate bug, and fix and close this one.
This bug, again, is simply that the thread ID increases too quickly. The code fix described should iron that out and with more testing perhaps the fixed code can contribute meaningfully to the second bug discussion.
All in favor?
Comment #61
PMunn CreditAttribution: PMunn commentedHere's a patch for 5.21.
Comment #62
AlexisWilke CreditAttribution: AlexisWilke commentedDarn! This was posted in 2006 and no one in Core accepted the fix?!?!?
I cannot believe it!
The fix is given in #1 and has not been applied since then. Who should I strangle here?!
I posted my own report here http://drupal.org/node/122098#comment-3112214 before finding this entry.
The function base_convert() ignores the periods on some versions of PHP (at least the two I have):
Why was that not applied since 2006???? You're wasting everyone's time here. #62 comments and still bogus! Super very simple fix that won't hurt anyone!
Well... At least I'm glad I can fix my own version of Drupal. 8-)
Comment #63
Damien Tournoud CreditAttribution: Damien Tournoud commented@AlexisWilke: actually, you are wasting everyone's time by bumping an old issue without doing anything to make it go forward.
As I identified back in #34 (!!), the ordering is *not correct* when using C collation. That means that the core of the issue is that *you need to create your database with a decent collation*.
The identifiers growing too fast is nothing more then a symptom of a wrong database collation, that will likely create other ordering issues.
Comment #64
AlexisWilke CreditAttribution: AlexisWilke commented@Damien,
No. There are two problems. The collation, which does not affect me at all, and the totally improper use of base_convert() as shown here:
Since in that case $max may be equal to something like '01.00.00' and then the +1 is assigned to a number defined as 10000 and not just 1 as expected.
You do not have to believe post #0 and #1, of course. The other issues are a problem for some people, but if you had applied fix in #1 you'd already solve a lot of problem on a lot of websites. Note that it is very destructive since the threads are completely broken by this wrong math.
The collation would not break the math.
And I will continue to waste your time until you fix that very problem. The collation can wait. It is much more complicated.
Note that I'm using PostgreSQL and did not have any of the collation problems reported. However, the PHP code problem, #0 and #1, sure does break my website.
Thank you.
Alexis
Comment #65
Damien Tournoud CreditAttribution: Damien Tournoud commentedNo, by definition if the collation is right, $max will be a top-level thread. As a consequence, the rest of your analysis is totally wrong.
Comment #66
AlexisWilke CreditAttribution: AlexisWilke commentedDamien,
You may want to try the SQL and see that it works perfectly putting the last thread last as expected.
However, if what you are saying is true, would you be so kind as to explain this one sort:
What is the difference with the MAX used for the INSERT and that other sort?! (I show D6, although D7 does the same sorting for display, D6 is easier to read.)
The order is the same and you see the threads as expected.
If that is wrong, then the threading is not working at all. Although, strangely, applying #1 fixes all the problems on my system.
The collation would not have any effect on the MAX() of the INSERT. The thread is sorted in the same manner in both places. Really, frankly. So it works the way intended. It returns the "wrong" value in the INSERT. I repeat myself, I know. But hey... after 3 and 1/2 years, you must be quite tired of this bug.
#50 perfectly shows the result of the problem. And again, apply #1 and it goes away.
Comment #1 you get "01/" as expected.
Comment #2 is a reply to comment #1, you get "01.00/" as expected.
And at that point, if you run "SELECT * FROM comments ORDER BY thread", it works as expected.
Comment #3, this is a new comment, you get "2101/", not what you'd want to get.
Yet, sort that, and it still works! Yes, try "SELECT * FROM comments ORDER BY thread" again, and it works just fine. Except that the number grew really fast (as the title of this thread points out). The expected number was "02/".
Note that at all the other levels, the PHP code does it right. Why not fix this root too? The sort is working until you have an overflow (as pointed out in the title... grows way fast, and then overflow!)
Now, you may wish that the MAX() in the INSERT work "better", just fix it like in #1 and stop being stubborn. (3 and 1/2 years?!)
Thank you.
Alexis Wilke
Comment #67
Damien Tournoud CreditAttribution: Damien Tournoud commented@Alexis: see #34:
This is the order you are supposed to get:
The
MAX(thread)
is02/
.Could you import this dataset in a table and give us:
SELECT thread FROM comments WHERE nid = xxx ORDER BY thread
SELECT MAX(thread) FROM comments WHERE nid = xxx
Thanks for moving this issue forward, very appreciated.
Comment #68
AlexisWilke CreditAttribution: AlexisWilke commentedDamien,
A long answer testing everything I could think of...
UTF-8 sorting not respected by MySQL
Got it. MySQL and PostgreSQL do not work the same way. Actually, as usual, MySQL does not work right at all. 8-)
First of all, as specified in http://api.drupal.org/api/function/db_check_setup/6 and pointed out in issue #349508: Require UTF8 database encoding, the PostgreSQL installation is expected to be done in UTF-8. Therefore, I have it that way. UTF-8 sees the period after the slash:
This may sound strange to some people since the period (0x2E) has a code that is smaller than the slash (0x2F). That's what collation does for you.
Testing databases
To test I created a table, added some thread entries and then run a SELECT. The CREATE + INSERT both work the same in both database systems:
Notice how I insert in disorder which means that a SELECT without ORDER is expected to give you the INSERT order (as long as you don't update anything.)
So, let's see with PostgreSQL. To my point of view, that works as expected, that's how I want to see the sorted threads in my website (i.e. comment 1, reply 1, reply of reply 1, comment 2, reply 1 of comment 2.)
However, when I test with MySQL I get it all wrong (personal point of view, of course):
This will not be sorted properly. It'd show me the most recent post first when I want the oldest and the most recent reply...
I suppose that the code to display the comments takes that in account in some ways because I get the threads right on my website with PostgreSQL...
Is it? Well... no. So in other words, the MySQL users have not noticed that everything was semi-up-side-down. Answers before the first question... Weird. And if you ask from newest to oldest, you also get a problem.
Could it be that the inverted sort works then?
This is the result of an invested thread (i.e. newest first). Now you get the main question first, but the last answer appears before the first. This has to be overly confusing to MySQL users.
Just in case, what I would expect as the correct output is what PostgreSQL outputs:
Although this is not clear either since the question will appear after all the answers.
Can we agree on the correct/expected output?
The correct sorting would probably be on a per thread basis. So something like this:
In D7 you should consider having one more column to simplify the code tremendously and make sure it is compatible (using SUBSTRING is not a good idea.) So, there is what I'd propose in the comment render of D6... Although that only works for the first 36 entries:
The ASC can be changed to DESC and the order on pages will still look coherent. i.e. Main question (new comment) and replies stay in order as I would expect.
Now, this works until the first number goes over "0z". (obviously, since 1, 2 means first and second characters.)
The very first character is the length so I guess we can use that to properly compute the size. And from the comment I've seen, the first character can be ignored (most certainly true). So I think that this would work:
except that substring() returns a string, so it needs to be CASTed in some ways... (in PostgreSQL, it's CAST(substring... AS int), but that does not work in MySQL.) That's definitively very complicated. With two columns, we'd solve the problem since the first column can be just an integer sorted normally. This represents the thread number. And the next column represents the replies.
Two column solution
Assuming that threads are t and replies are r, we'd get something like this instead:
t int, r text (anything after the first period).
t could be set to cid (I think that would be a good idea to allow finding all the replies to a specific thread.)
Can we fix the collation in MySQL?
Thinking about it, I checked the MySQL setup and I get this for characters & collations:
latin1_swedish_ci?! That's MySQL magic I guess. Yet, the tables are created with UTF8 as seen here:
Just in case, I tried to change the column text encoding as described here: http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html and the SELECT still does the same thing (exact same sorting; collation not respected...)
Okay, a search with Google brought this one: http://drupal.org/node/772678 and as usual, big surprise, MySQL is dearly broken... I'm so glad I don't use it! 8-)
May I suggest that we change the scheme in order to avoid collation problems? We can have an upgrade path to fix existing databases. The one good thing with MySQL is that people won't have had the overflow error happen to them.
Proposed solution
The following is what I see as a workable solution that works for both databases. In D7, you'd have to test with the others too (Oracle, SQL Server, SQLite, etc.) For that, I'd strongly advice for a test in the requirements() so we know at installation time that a database won't work right.
Two columns
Create two columns:
1. The thread identifier, an integer
2. The thread replies, still a text field
Thread identifier
The thread identifier can be set to cid. So, whenever you create a new comment, you set ("thread = %d", cid)
Replies receive the same cid as the thread (i.e. "parent most comment", practical to hide a whole thread!)
Replies field
This can pretty much keep the same scheme, however, to avoid collation problems, we can force the width of each "column" to a specific size and thus drop the period. I would suggest 6 digits in decimal (i.e. up to 1 million comments) to make totally sure, although if we still use base 36, it is reduced to 4 digits and we'd have 36^4=1,679,616 comments before running in a problem. So the field will grow larger than now, but not by much unless you get many replies to replies (i.e. "000000000000" would represent 3 levels!) but the sort works in all cases, whatever the collation (assuming that 0-9, a-z are always sorted as expected?!).
Code changes
This means code changes to all the modules that directly peek and poke in the comments table (i.e. Discuss This! is one I have worked on. Also I think that one does not access the table directly.)
Otherwise, it is mainly in the SELECT where you'd just add a column when sorted by threads and remove the SUBSTRING().
In the comment_save() function, it is fixing the else part which assumes that the thread value is incorporated in the reply field. Otherwise, the new column is very easy to manage since the following SELECT works as is:
SELECT MAX(thread_id) + 1 FROM {comment}
Upgrade of existing installations
This is of course the hardest part, making sure we can upgrade existing installs. Since PostgreSQL users who have "too many" comments in a threaded comment list have had their database destroyed, we cannot really help them without having a module helping them reorganize their comments, if they want to spend the time to do so. (I've done mine by hand... I had only one such page with too many comments.)
For all others, we should be able to fix the data pretty easily. And this the thread identifier would be set to cid, the too large problem vaporize automatically.
Note: we can tell that a PostgreSQL install was "destroyed" by checking whether there are some comments with "00/" which should never happen (you start with "01/".) So we could at least let people know that Drupal Core messed up on that one...
Comment #69
Damien Tournoud CreditAttribution: Damien Tournoud commentedI'm semi-happy you confirm what I suspected all along: that threading is severely broken on PostgreSQL.
You should take a look at the documentation of comment_render() that describes the design choices and assumptions of the current code, which is *not* broken but assume a certain collation.
Sadly, I'm pretty sure your analysis is wrong: it's not MySQL collation that is broken, but PostgreSQL's one. MySQL implements the standard Unicode collation algorithm. You can check (using this test page, for example) that
'.'
is supposed to be before'/'
.PostgreSQL doesn't support collation itself, but rely on the implementation in the GLibc. Sadly, this implementation seems broken in most versions of Linux I know. It's not always the case, if I repeat the test from #34 on Mac OS X (which is known to have pretty solid Unicode support), I get the expected:
We already knew that Collation support is a very weak point of PostgreSQL. This only confirms it.
That said, I'm not sure what is the way forward. Could we find a set of characters that can work on both broken and non-broken collations?
Comment #70
AlexisWilke CreditAttribution: AlexisWilke commentedAnother important note:
While we work on this patch and have it tested, etc. I would suggest that we apply patch in #1, even if we remove it later. Until then, PostgreSQL users will lose data, which I think is really bad.
My own change is slightly faster.
As for the sorting problem, as mentioned by someone, removing the '/' altogether or replacing it with a period would be a good idea for MySQL users. But that one I can care less... 8-) Plus, the wrong sort does not destroy any data.
Thank you.
Alexis
Comment #71
Sanbka CreditAttribution: Sanbka commentedI think that the problem with tree of comments on PostgreSQL can be temporarily solved by using the code (line 987 of comment.module)
and (line 741 of comment.module)
Comment #72
PMunn CreditAttribution: PMunn commentedPlease enact the #1 patch to ensure the integrity of the thread ID the way it is designed. This is all anyone seems to be asking for.
Once the thread ID generation is fixed, it follows that a new issue would be opened to deal with any problem PostgreSQL systems have with it, with an eye to a separate idea on reorganizing storage of threads and what kinds of display modes they support. Philosophically, at least with this problem in PostgreSQL, it seems to me this should be solved in Drupal instead of waiting for an external DBMS to fix it. If that means allowing MySQL to do the collation heavy lifting and having code for other DBMSs that don't collate properly, that's what supporting multiple databases means. You are in control of the destiny of this project, not PostgreSQL.
Waiting for someone else to fix their system could be problematic. You could end up writing a bug report and waiting 3 years and 8 months for acceptance of 10 lines of code that could fix the problem. For example.
Comment #73
Damien Tournoud CreditAttribution: Damien Tournoud commentedThis is not how this thing works. *You*, the users of Drupal on PostgreSQL, are on the driving seat for Drupal on PostgreSQL. Nobody will do it for you. Waiting for someone to fix Drupal on PostgreSQL for you could be problematic :)
Now that we have (at last!) confirmed that comment threading is severely broken on PostgreSQL (it took 3 years and 8 months, I do not congratulate you, PostgreSQL community), what do you suggest to fix it?
Comment #74
AlexisWilke CreditAttribution: AlexisWilke commentedDamien,
#1.
But, pretty please, wait another 4 months, so we get at 4 years first.
Thank you.
Alexis
Comment #75
PMunn CreditAttribution: PMunn commentedOhhh, it's been me all along! Me, the person who wrote a patch #1? Me, who can't commit that patch? Me, who is in favor of not escalating this patch into a whole attack on the comment module? Me, who is in favor of walking before running?
Comment #76
PMunn CreditAttribution: PMunn commentedBug is present in 6.20. Patch attached.
Comment #77
alpapan CreditAttribution: alpapan commented+1 that bug is there at 6.20 and HEAD
Comment #78
AlexisWilke CreditAttribution: AlexisWilke commentedalpagan,
This bug has been there for ages, but the comments author does not use PostgreSQL and therefore he pretty much said himself that he did not want to "fix something that works for him."
Maybe if you can have some political pressure over him? 8-)
Thank you.
Alexis
Comment #79
catchPatches are applied to the latest development version (or right at this minute current release) first, then backported to Drupal 6. Changing the status takes this out of the queue of most people who can look at it.
Who is the "comments author" that "doesn't want to fix" it?
To get this patch committed, here's what needs to be done:
1. Review the current comment tests to see if if they now cover all comment threading permutations to cover webchick's question in #53 (now more than two years ago, no-one complaining about this patch not being committed appears to have even looked at the tests, if you're lucky someone else may have written them for you while you were bickering).
2. Re-roll the patch so that it applies against HEAD, adding the missing tests if necessary.
3. Make the cast to int meet the conventions at http://drupal.org/coding-standards
4. What would really help would be to add a test that fails in HEAD on postgresql, but works with the patch applied, this would demonstrate that it actually fixes the bug and prevent future changes to comment module from causing a regression.
Comment #81
Zed Pobre CreditAttribution: Zed Pobre commentedI have updated the patch in #1 to Drupal 7.x-dev, applied it to my local installation, and verified that it fixes the data corruption caused by multiple comments on a PostgreSQL backend, which in turn fixed the threading issue I was seeing that caused me to find this page.
For reference, I'm testing against a local tree with one additional patch applied: the comment authorship fix by Bleen18 at https://drupal.org/node/936844#comment-3892522.
There is nothing in that patch to require a special cast to int; I assume that is in reference to one of the other patches. I went with this one because it was minimally invasive, easy to understand, and first in the list.
I don't know enough about the testing structure to add tests. If that's absolutely mandatory (and I agree it would be nice if someone has the time and knowledge), I must ask that someone else please look into it. The basic test sequence would be to add two comments with no parent, two comments each selecting one of the first two as parents, and then verifying that the contents of the thread column are correct.
I can push my fork of Drupal to GitHub if it would be useful to others to be able to download both patches at once until they are accepted upstream.
I am also reflagging this bug as Priority Major due to the fact that it is a data corruption bug that can destroy installations. I just spent an hour manually correcting corrupt entries, and I'm not done yet, and it's only been so fast because I am just this week starting to migrate data from a relatively small Drupal 5 installation and I've only copied a few nodes over.
Comment #82
alpapan CreditAttribution: alpapan commentedSorry Catch, my bad for changing the version number. I didn't know how it worked (now I do)
a
Comment #83
catchThis should be "Get the value at the correct depth to ensure blah blah blah." - there is no explanation why we're doing this.
Still needs a test.
Comment #84
Zed Pobre CreditAttribution: Zed Pobre commentedWell, I re-rolled the patch in my repository to lengthen the comment, but while I was trying to figure out how to build a test for this (which if I understand how things work is completely impossible on the current technique without refactoring comment_save), I think I got a better understanding of what vancodes are.
If I'm reading this right, the vancode describes just one segment of the entire comment ID, not the entire comment ID, i.e. comment IDS are
<parentvancode>.<childvancode>.<grandchildvancode>
, and so on. If that is the case, the more important problem is invancode2int
, not incomment_save
, and the better description of the core problem is thatvancode2int
has no sanity checks whatsoever -- not for invalid characters and not for the first digit actually being numeric and not for exceeding the length specified in the first digit -- and test cases *can* be theoretically built for vancode2int by attempting to throw invalid values at it and making sure that an expected result is returned. I'm going to redo this patch by forcingvancode2int
to conform to the vancode specification as described in the comments ofint2vancode
and see if I can build unit tests to match.That said, I think it's probably appropriate to attempt to keep a fix in both functions -- it's technically still a bug (albeit one for which it is impossible to build a unit test) for comment_save to attempt to run vancode2int on a string containing more than one vancode section.
I'm a little busy this week, but I hope to have time by this weekend.
Comment #85
remidewitte CreditAttribution: remidewitte commentedJust hit the bug with a D7 install.
Thanks Zed Pobre, the patch from April 19, 2011 at 7:50pm just works. Why this simple patch was not applied for D7.10 for example ?
Comment #86
xjm@remidewitte: The patch was not in 7.10 because no one made the necessary changes nor contributed an automated test for it yet.
Remaining tasks:
Tagging novice for any of the tasks above.
Comment #87
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedHere is the patch updated for the latest version. I'm not sure how to go about making the test but if anyone wants to provide guidance I can give it a shot.
Comment #88
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedComment #89
xjmThanks for the reroll. Looks okay to me, pending testbot of course.
We can't create a functional test for the Postgres part of it since that's tied to a specific enviroment. However, what we can do is test that the comment ID is incremented properly. So, the test should instead test the comment IDs themselves.
Based on the steps to reproduce in the summary, here's what I'd suggest for a test:
The last assertion will be the one we expect to fail before the patch and pass with it. The test case should go in
comment.test
, probably in a new class extendingCommentHelperCase
(as none of the other child classes looked particularly applicable when I skimmed just now.) We could call itCommentThreadingTestCase
or something.Edit: You'll probably want to inspect what the actual values are before and after the patch to make sure you understand the pattern and that I'm not just making stuff up. ;)
Thanks @NROTC_Webmaster for looking into it.
Comment #90
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedI'm attaching an attempt at a test for this.
I took the suggestions from xjm on how to make it and then copied some of the code from earlier in the test. The problem is that it passes with or without the patch and I'm not sure where to go from here.
Can anyone take a look at it and let me know what I need to do to make it work correctly.
Comment #91
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedComment #92
AlexisWilke CreditAttribution: AlexisWilke commentedxjm,
The test will pass with or without the patch if you run MySQL.
The problem underlined here is that the old code makes use of a "special" feature of MySQL (I'd call it a bug...) which doesn't work with PostgreSQL.
Thank you.
Alexis
Comment #93
xjmThe test I outlined does not check for a database error. It's supposed to check the generated ID code, independent of the database, at least as far as I understand. The previously posted test does not follow the STR from the summary.
Are the values of the IDs themselves tied to the particular database?
Comment #94
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedThe test should hopefully fail for anyone on PostgreSQL without the patch and then pass with it. I only have MySQL so I cannot really test to make sure of that but based on the description I think I have a good test.
Comment #95
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedAnd then hopefully it will still work with this one.
Comment #96
xjmFWIW, it should more than merely fail on PostgreSQL--the comment 5 assertion in particular should fail. It'd be worth outputting the actual values from assertEqual() rather than messages, I think.
I'll try to test the next time I'm on a machine where I have postgres installed if no one gets to it before then.
Edit: Alright, I just spent some more time reading the issue, and it looks like it's indeed the case that the assertions won't fail on testbot. (In #69 DamZ's example makes this fairly clear.) I still think the test is worth adding, though.
Comment #98
NROTC_Webmaster CreditAttribution: NROTC_Webmaster commentedI was having a few problems with git so hopefully this patch will apply. Thanks @xjm for all of your help on this.
Comment #99
xjmTests-only version for tester convenience.
Comment #100
haydeniv CreditAttribution: haydeniv commentedRan test in #99 on Postgres with the following results:
Screenshot:
Comment #101
haydeniv CreditAttribution: haydeniv commentedSpecific test fails:
Comment #102
haydeniv CreditAttribution: haydeniv commentedHere is the results of the test in #98:
Comment #103
xjmAttached just removes a couple of out-of-scope whitespace changes.
The patch now includes tests for comment threading. These tests do not fail on testbot because this is an environment-specific bug (related to a particular database behavior). However, the screenshots above illustrate the test failures (before the patch) and passes (after the patch) on Postgres, which will be useful some day if/when we have multiple test environments (which I know is not going to happen in the short term, but is a planned feature). Meanwhile, the test coverage codifies proper comment threading so that we don't break it in any other way.
Thanks NROTC_Webmaster and haydeniv for the awesome work on this issue!
RTBC.
Comment #104
webchickYAY! SO nice to have comment threading tests! Thanks too for the really detailed manual tests that help show those of us PostgreSQL-less how things are working. :) So wonderful to have this 6 year old issue nailed at last!
Committed and pushed to 8.x. Thanks! This will need a re-roll for D7.
Comment #105
xjmI'll backport this since I am responsible for the API change that makes it more than a simple reroll. :)
Comment #106
xjmGo testbot!
Comment #107
aspilicious CreditAttribution: aspilicious commentedLooks a sane backport!
Comment #108
webchickExcellent, thanks!
Committed and pushed to 7.x. Woohoo! Another major bug bites the dust. :)