Background
As an introduction, the comments module uses a thread column to properly sort threads not only in timestamp order but also based on what parents the thread has. When a comment has no parents (i.e. is a comment on the posting node) it has just the value, e.g. "148". These codes are generated using the int2vancode and vancode2int functions, and the goal is to have them sort in order as a string. Since strings sort differently than numbers, this works out rather ingeniously.

When the comment is about an existing comment, the thread value gets a dot and a second vancode is generated for what is now one level below the comment. So, lets say comment 148 gets a comment, its thread will be "148.00". If a second comment on 148 is added, it would be "148.01", and so on. Now if a comment on 148.01 is added, its thread would be "148.01.00".

The issue
I have a posting with a large number of comments (hence the vancode "148" in my example since they start at "00") . To be fair this came from my phpBB forum via the phpbb2drupal module I recently patched to fix how it assigned thread IDs.

I had a comment 148, a comment on 148 "148.00", and a comment on 148.00, which generated a "148.00.00".

The next comment made was on the original posting. What thread should it get? My guess is "149", but the software gives it "52mbrwh" instead.

The code should take the first value from before the first dot (if there are any dots) and then increment that, but instead it takes the whole value, dots and extra levels and so on, and increments it.

Am I missing something?

The fallout
This thread field is so large, and the number of threads is probably so few for a given topic, that this is never likely to be a problem. My work on the phpbb2drupal vancode generation problem just happened to have me wandering around the database long enough to notice this.

Should I make a patch and submit it?

CommentFileSizeAuthor
#106 97327-d7.patch5.57 KBxjm
#103 97327-103.patch5.66 KBxjm
#102 97327-102.png7.95 KBhaydeniv
#101 97327-99-2-postgres-test.png13.47 KBhaydeniv
#100 97327-99-postgres-test.png9.7 KBhaydeniv
#99 97327-tests-only.patch5.48 KBxjm
#98 Fix-data-corruption-in-PostgreSQL-databases-97327-97.patch6.82 KBNROTC_Webmaster
#95 Fix-data-corruption-in-PostgreSQL-databases-97327-95.patch7.14 KBNROTC_Webmaster
#94 comment-test.patch6.11 KBNROTC_Webmaster
#90 comment-test.patch4.75 KBNROTC_Webmaster
#87 Fix-data-corruption-in-PostgreSQL-databases-97327-87.patch1.29 KBNROTC_Webmaster
#81 0001-Fix-data-corruption-in-PostgreSQL-databases-when-mul.patch1.58 KBZed Pobre
#76 comment_module_6p20_20110105_largethreadidfix.patch940 bytesPMunn
#61 comment_module_5p21_threadidsgrowtoofast-D5.patch656 bytesPMunn
#54 comment_module_5p18_threadidsgrowtoofast.patch656 bytesPMunn
#51 97327_comment_thread_parent.patch1003 bytesmaartenvg
#46 97327_comment_thread_parent.patch1003 bytesmaartenvg
#44 comment_module_5p12_threadidsgrowtoofast.patch656 bytesPMunn
#40 comment.module.threadidsgrowtoofast.5p11.patch689 bytesPMunn
#28 comment_module_5p10_threadidsgrowtoofast.patch944 bytesPMunn
#27 comment_module_7dev_20080819.patch949 bytesPMunn
#23 comment.module_largethreadfix.5p3.patch938 bytesgdavid
#22 comment.module.largethreadfix.5p2.patch693 bytesPMunn
#19 comment.module.largethreadfix.5p1.patch694 bytesPMunn
#16 comment.module.largethreadfix_0.patch1.13 KBPMunn
#14 generate-content.php_0.txt7.79 KBkilles@www.drop.org
#4 largethreadissue.jpg67.7 KBGurpartap Singh
#1 comment.module.largethreadfix.patch699 bytesPMunn
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

PMunn’s picture

Title: comment thread numbers or vancode IDs growing unnecessarily quickly » comment thread IDs grow too fast
Status: Active » Needs review
FileSize
699 bytes

Here's a patch that takes care of the problem.

chx’s picture

Version: 4.7.4 » 5.x-dev
Category: feature » bug

I second the motion. I had similar problems with the new menu code which utilizes vancode, and in there I vancoded per level which is essentially the same as what this patch does. I won't RTBC any patch at 4am but I have a feeling this one is RTBC.

Gurpartap Singh’s picture

There seems no such issue with Drupal 5.x HEAD version. Attached is a screenshot of the thread row in comments table.

(Have not tested on Drupal 4.7.x.)

Gurpartap Singh’s picture

FileSize
67.7 KB

Here's the screenshot.

Gurpartap Singh’s picture

Status: Needs review » Active

$thread = int2vancode(vancode2int($max) + 1) .'/';

vancode2int() should take care taking out the part before the first dot. Have tried it on 3 digit thread values too.

function vancode2int($c = '00') {
  return base_convert(substr($c, 1), 36, 10);
}
Steven’s picture

Status: Active » Closed (won't fix)

Um, is there an actual problem described here?

The conversion functions exist only as simple casts form and to integers. Properly using them to generate tree structures is the responsibility of the caller.

PMunn’s picture

Status: Closed (won't fix) » Needs review

While I'm happy that this doesn't occur in the 5.0 beta, the original problem in 4.7 remains. That's what the original patch is for.

PMunn’s picture

Version: 5.x-dev » 4.7.4

Tagging as 4.7.4.

PMunn’s picture

Anybody want to vet/commit this for 4.7.4 so we can put this behind us?

Caleb G2’s picture

Patch seems to work on 4.7 head. Screenshot here:

http://highervisibilitywebsites.com/comment-ids.jpg

Will let someone else RTBC though...

killes@www.drop.org’s picture

I've done the following:

Created a node
created 500 comments on that node
added followups to one of the comments
added a new comment on the node.

That last comment's thread is 1dx/ and looks perfectly normal.

I think that the import script you mention messed something up.

PMunn’s picture

Killes, you tested with this patch on 4.7.4 or without it?

I tested the scenario described in the initial posting, so I did see a problem.

killes@www.drop.org’s picture

without the patch

Also, since the code is the same in 4.7 and 5RC the issue should pop up in both versions.

killes@www.drop.org’s picture

FileSize
7.79 KB

Here is a modified version of generate-content.php which does what I did for my test, e.g create a node with 500 comments.

Beware: It will remove all other content from your site.

killes@www.drop.org’s picture

Status: Needs review » Closed (won't fix)

considering my extensive testing without any hint towards a bug, I mark this "won't fix".

PMunn’s picture

Status: Closed (won't fix) » Needs review
FileSize
1.13 KB

Finally I've got a very clear description of the problem. This came to me while examining the comment.module in the new 4.7.5 release.

Here's the code that creates a thread ID for a new comment that does not have a parent comment. That is, it is a comment on the node itself.
On Line 583 we see:

        if ($edit['pid'] == 0) {
          // This is a comment with no parent comment (depth 0): we start
          // by retrieving the maximum thread level.
          $max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d', $edit['nid']));

          // Strip the "/" from the end of the thread.
          $max = rtrim($max, '/');

          // Finally, build the thread field for this new comment.
          $thread = int2vancode(vancode2int($max) + 1) .'/';
        }

This code is wrong in the following circumstance:
- The last comment in a node is a comment-on-a-comment.
- Someone attempts to add a comment on the node.

If the last comment in a node is a comment-on-a-comment, its vancode has a dot in it. For example, "0a" is a thread ID of a comment on a node, while "0a.01" is the thread ID of the second comment on the 0a comment.

The code above increments the whole thread ID, and not the first segment of the thread ID when adding a new comment to the node. This is the bug.

By getting the first segment of the maximum, incrementing it, and using that as the new comment's thread ID in this circumstance, we ensure the next thread ID is correct.

See this in action with this PHP code in a node:

echo 'Current code: 0a<br/>';
echo 'Next: '.int2vancode(vancode2int('0a')+1).'<br/>';
echo 'This is correct!<p/>';

echo 'Current code: 0a.01<br/>';
echo 'Next: '.int2vancode(vancode2int('0a.01')+1).'<br/>';
echo 'This is wrong!<p/>';

which generates this result:

Current code: 0a
Next: 0b
This is correct!

Current code: 0a.01
Next: 2a02
This is wrong!

Incidentally, the comment module also starts comments on a node at vancode "01" instead of "00". The attached patch fixes that as well in the 4.7.5 comment.module.

Gerhard Killesreiter’s picture

Version: 4.7.4 » 5.x-dev

moving to HEAD to get more reviews. The code is the same in 5 and 4.7.

Gurpartap Singh’s picture

There isn't any problem with this in 5.x as discussed above. Pointing, in case the code is same.

PMunn’s picture

Version: 5.x-dev » 5.1
FileSize
694 bytes

5.1 seems to still have this problem.

I added a new node.
First comment on the node has a thread of 01/, and that's ok.
First reply on the First comment has a thread of 01.00/, and that's ok.
Second comment on the node has a thread of 2101/. This is too high.
First reply on the Second comment has a thread of 2101.00/, and this seems ok.
Third comment on the node has a thread of 410101/, and this is getting too big.

The following patch fixes it, again.

Zen’s picture

Status: Needs review » Active

Paul,
Thanks for persevering with this issue :) However, trying to replicate this issue in 5-dev with your provided steps:

This code is wrong in the following circumstance:
- The last comment in a node is a comment-on-a-comment.
- Someone attempts to add a comment on the node.

does not realise any errors.

A dump of my comments table:

mysql> select cid, subject, thread from comments;
+-----+----------+-----------+
| cid | subject  | thread    |
+-----+----------+-----------+
|   1 | c1       | 01/       |
|   2 | c1.1     | 01.00/    |
|   3 | c2       | 02/       |
|   4 | c2.1     | 02.00/    |
|   5 | c2.1.1   | 02.00.00/ |
|   6 | c3       | 03/       |
|   7 | c4       | 04/       |
|   8 | c5       | 05/       |
|   9 | c6       | 06/       |
|  10 | c7       | 07/       |
|  11 | c8       | 08/       |
|  12 | c9       | 09/       |
|  13 | c10      | 0a/       |
|  14 | c8.1     | 08.00/    |
|  15 | c11      | 0b/       |
|  16 | c10.1    | 0a.00/    |
|  17 | c12      | 0c/       |
|  18 | c10.1.1a | 0a.00.00/ |
|  19 | c13      | 0d/       |
|  20 | c2.1.2   | 02.00.01/ |
|  21 | c2.1.3   | 02.00.02/ |
|  22 | c2.2     | 02.01/    |
|  23 | c2.1.1.1 | 02.00.00.00/ |
|  24 | c14      | 0e/          |
|  25 | c13.1    | 0d.00/       |
|  26 | c15      | 0f/          |
+-----+----------+-----------+

Please provide concrete steps to manually reproduce this on a clean database.

-K

PMunn’s picture

I took a look at the code in the 5.x comment.module and it looks identical to my patch. Looks like someone integrated it in without noting it here.

Do I need to wait for whenever enough security holes are found to release a 5.2 to get this into production?

PMunn’s picture

Version: 5.1 » 5.2
FileSize
693 bytes

The fix must have been abandoned, because 5.2 is now broken the same old way. The patch is identical for this version, but I've regenerated it anyway.

Pretty please integrate my fix? Please?

Once again, the test case is:

I added a new node.
First comment on the node has a thread of 01/, and that's ok.
First reply on the First comment has a thread of 01.00/, and that's ok.
Second comment on the node has a thread of 2101/. This is too high.

Once I saw the comment ID in the URL for the first comment, I just did this to check it in the database and see what the thread value was being set as:

Broken way:

drupal_farbot=# select cid, subject, thread from comments where cid >= 9227;
 cid  |          subject           | thread
------+----------------------------+--------
 9227 | first comment              | 01/
 9228 | first comment reply        | 01.00/
 9229 | second comment on the node | 2101/
(3 rows)

Fixed way:

drupal_farbot=# select cid, subject, thread from comments where cid >= 9230;
 cid  |      subject       | thread
------+--------------------+--------
 9230 | comment 1          | 01/
 9231 | reply to comment 1 | 01.00/
 9232 | comment 2          | 02/
(3 rows)
gdavid’s picture

Version: 5.2 » 5.6
FileSize
938 bytes

the problem still in 5.6
I made another solution

PMunn’s picture

Version: 5.6 » 5.7

This is still broken on 5.6 and 5.7. I don't know whether to scream or cry. Or both.

gdavid's patch for 5.6 works with 5.6 and 5.7, which is broken in the same way on the same lines. It's a simpler patch than my own previous patches, so maybe this one will make it in.

6.2 is also broken in the same way from the look of the comment module code, but its patch will need to cover different lines.

Who can commit this into the next release so we can move on with our lives?

catch’s picture

Version: 5.7 » 7.x-dev
Status: Active » Needs work

This is almost certainly broken in 7.x too, so I'm moving it there in order that it get committed first, backported to D6, then to D5.

Patch doesn't apply to D7 though so marking as Code needs work.

PMunn’s picture

This bug is still present in 5.10.

PMunn’s picture

Status: Needs work » Needs review
FileSize
949 bytes

Confirmed as a bug in 7.x dev as of tonight 2008-08-19, here is the patch to fix it.

PMunn’s picture

And this patch is for 5.10 in case anyone else wants it.

kscheirer’s picture

I just tried the test in @22 on a drupal HEAD install, and everything seemed to work fine without the patch in @27. Is this related to php or other configuration settings in some way then? I got the same results after applying the patch, and the code does look a little safer, and the query should perform better with large numbers of comments, so I'm still in favor of committing to HEAD.

I added a new node.
First comment on the node has a thread of 01/
First reply on the First comment has a thread of 01.00/
Second comment on the node has a thread of 02/

mysql> SELECT cid, subject, thread FROM comments;
+-----+----------------+--------+
| cid | subject        | thread |
+-----+----------------+--------+
|   1 | first comment  | 01/    | 
|   2 | first reply    | 01.00/ | 
|   3 | second comment | 02/    | 
+-----+----------------+--------+
3 rows in set (0.00 sec)
Damien Tournoud’s picture

I understand the bug, and agree with the solution (except that I don't get why the cast to int could ever be necessary). But I *can't* reproduce it.

PMunn could you give as much details about your setup as possible?

Damien Tournoud’s picture

Status: Needs review » Postponed (maintainer needs more info)

Thinking about this a little more, I'm not convinced there is a bug here.

'.' is supposed to be < '/', so '01.00' < '02/' because string comparisons are typically done in lexicographical order.

Maybe the string collation you use breaks one of these two assumptions (either '.' > '/' or the comparison is done differently). But this will break the whole thread sorting algorithm anyway, so you have a bigger problem on your hands.

Which database engine and collation do you use?

PMunn’s picture

The full test case is first described above at #16.

The important thing is to do the test in this order:

Create a Node, let's call it NODE-1.
Comment on this node, call it COMMENT-1.
Reply on COMMENT-1, call this COM-1-REP-1.
Comment on the node, call it COMMENT-2.

Observe the Thread IDs in the comments table for the three comments on that one node. Most notably the COMMENT-2 thread ID will be wrong. It will still sort properly, yes, but the thread ID is still wrong.

The patch gets the highest thread ID of all comments without a parent comment, that is, it gets the highest thread ID that is directly pointing to the NODE. It then increments that thread ID. The original, flawed code increments the entire thread ID of the last comment.

I adopted gdavid's patch because it's smaller and seems to work just as well instead of my own patch before his and used it to make the 5p10 and 7dev patches. The tactics in either patch, mine or gdavid's, are the same across all versions of drupal right now. The comment module hasn't changed lately.

If my solution will break the sorting at some point let me know how and we can close this up for good.

Damien Tournoud’s picture

@PMunn: can you give us as most specifics as you can about your environment? (that's what I called "setup", above), database engine, database configuration, PHP version, etc.

Damien Tournoud’s picture

Some more research:

$ export TEXT="01/\n02/\n01.00/\n01.00.00/\n02.00/"

$ echo -e $TEXT | LC_COLLATE=C sort
01.00.00/
01.00/
01/
02.00/
02/

$ echo -e $TEXT | LC_COLLATE=en_US.ISO-8859-1 sort
01/
01.00/
01.00.00/
02/
02.00/

$ echo -e $TEXT | LC_COLLATE=en_US.UTF-8 sort
01/
01.00/
01.00.00/
02/
02.00/
PMunn’s picture

Okeydoke:

PostgreSQL 8.3.3
PHP 5.2.6
Database encoding is UTF8

Sitting on a Fedora Core 8 box running kernel 2.6.25.14-69.

Damien Tournoud’s picture

@PMunn can you confirm that the comment ordering is wrong at display time with your setup (please try both ascending as descending ordering)?

PMunn’s picture

The comment ordering is correct.

This is a bug that doesn't exhibit itself visually to the end user. It's just an observation that the thread IDs are being generated incorrectly internally.

kscheirer’s picture

Status: Postponed (maintainer needs more info) » Needs review

I still think this patch should go in, regardless of being able to reproduce the bug.

The code is clean and the sql query is much more correct with the patch, and will execute faster on sites with a lot of comment threads.

maartenvg’s picture

Status: Needs review » Needs work

Patch no longer applies.

I, too, can't reproduce this bug. I tried it on a clean HEAD install and got

ID  	Title  	  	Thread
1  	Comment 1  	01/
2 	Comment 1.1 	01.01/
3	Comment 2 	02/

MySQL 5.0.5, PHP 5.2.4, Ubuntu

I do think that
$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d', $edit['nid']));
should become
$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d AND pid = 0', $edit['nid']));
because that makes more sense (you only need the parentless nodes to determine the max) and is less work when looking at thready comments.

PMunn’s picture

maartenvg, your code change to that line fixes the problem in 5.11.

catch’s picture

Status: Needs work » Needs review
jonskulski’s picture

Damien Tournoud’s picture

@johnskulski: good catch, that's a regression introduced by DB:TNG... but this has nothing to do with the thread at hand. Please open a new one with your patch.

PMunn’s picture

Here is an updated patch for the 5.12 release. Same code change, but the row it's on is a different row number so some work must've been done on the comment module since the 5.11 release.

Anonymous’s picture

Status: Needs review » Needs work

The last submitted patch failed testing.

maartenvg’s picture

Status: Needs work » Needs review
FileSize
1003 bytes

Following HEAD.

Status: Needs review » Needs work

The last submitted patch failed testing.

lilou’s picture

Status: Needs work » Needs review
Gábor Hojtsy’s picture

Status: Needs review » Needs work

I think we should do "AND" and not "and" (markind CNW for this) but otherwise looks like a simple code change. Should give the next comment thread the right incremented thread number instead of taking the whole thread structure from the last comment.

Gábor Hojtsy’s picture

BTW I've been pointed to here on the Hungarian Drupal conference this past weekend and got test results from gdavid there (on postgresql), which shows that the third comment on the same node does indeed get a wrong thread identifier.

drupal7=# select * from comments;
 cid | pid | nid | uid | subject | comment | hostname  | timestamp  | status | format | thread |  name  | mail | homepage
-----+-----+-----+-----+---------+---------+-----------+------------+--------+--------+--------+--------+------+----------
  1 |   0 |   1 |   1 | egy     | egy     | 127.0.0.1 | 1226745858 |      1 |      1 | 01/    | gdavid |      |
  2 |   1 |   1 |   1 | ketto   | ketto   | 127.0.0.1 | 1226746009 |      1 |      1 | 01.00/ | gdavid |      |
  3 |   0 |   1 |   1 | harom   | harom   | 127.0.0.1 | 1226746032 |      1 |      1 | 2101/  | gdavid |      |
(3 rows)
maartenvg’s picture

Status: Needs work » Needs review
FileSize
1003 bytes

#35 also refers to Postgresql, so I think we can safely assume that the problem lies there. And that is also the reason why I (and others) couldn't reproduce it, because most people use MySQL.

Attached patch with capitals for "and" per #49.

catch’s picture

Status: Needs review » Reviewed & tested by the community

Patch looks great. Since this is environment dependent I don't think it's viable to write a test for it (and it's also not harmful, just a bit odd), so marking RTBC.

webchick’s picture

Status: Reviewed & tested by the community » Needs work

I discussed this some with DamZ on IRC.

He makes the case that while the patch will fix the symptom of the problem (and is a performance improvement to boot), there is obviously still something funny going on with threading on some systems.

Drupal has 4 threading modes (threaded desc order, unthreaded desc order, threaded asc order, and unthreaded asc order), and while this messed up value in pgsql might not have user-facing consequences when using whichever mode phpbb2drupal uses (probably unthreaded asc), it likely would have user-facing consequences on one or more other modes.

We have a couple very simple tests for threading as part of testing comment replies, but we lack comprehensive tests for all threading modes for comments.

I'm therefore marking this CNW so that we can get those in place, which should hopefully help us figure out the "root" cause of the problem on pgsql at the same time.

PMunn’s picture

On a side note, here's a patch that fixes this for 5.18, where it is still a problem. Patch does the same thing as previous one.

gdavid’s picture

this is still problem in 6.x. It's not too good. This is 3year old problem. And only need " AND pid = 0" text to the module code. Only 12 characters (with spaces). It's ridiculous a little bit. What work need more to accept these changes?

wretched sinner - saved by grace’s picture

@gdavid - see webchick's comments in #53. To commit this to 6.x, it needs to be committed to 7.x first. To commit it to 7.x, we need tests so that comment threading doesn't break in the future. If you can help write these tests, we can get this issue committed quickly

gdavid’s picture

the root problem is it: (postgresql)

# SELECT '.' > '/' ;
?column?
----------
t
(1 sor)

template1=# SELECT '.' < '/' ;
?column?
----------
f
(1 sor)

#SELECT '123456/' > '123456.01.01/' ;
?column?
----------
f
(1 sor)

# SELECT 'a' > '/';
?column?
----------
t
(1 sor)

# SELECT 'a' > '.' ;
?column?
----------
t
(1 sor)

# SELECT '0' > '.' ;
?column?
----------
t
(1 sor)

Why? look at this:

# SELECT hashchar('.');
hashchar
----------
-47
(1 sor)

# SELECT hashchar('/');
hashchar
----------
-48
(1 sor)

I hope this can help to show the ROOT problem, why working wrong the comment threading in PG.
Thanks.

PMunn’s picture

Are you trying to say that the forward slash is causing the sorting problem? Can we just change the terminator to another character? My results for running those hashchar calls, which doesn't look meaningful to me:

# select hashchar('.');
  hashchar
------------
 -163135266
(1 row)

# select hashchar('/');
  hashchar
------------
 1043037474
(1 row)

The code change is needed to create and properly increment the segments of the thread ID. This isn't disputed.

I get it that this seems to have exposed a sorting problem particular to PostgreSQL.

gdavid’s picture

I don't understand. Why dosen't enoght good this syntax?

$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d AND pid = 0', $edit['nid']));

why whould be better solution to change the terminator character to another? " AND pid = 0" 12 characters (with spaces) no other script, no transformation. it will run on mysql, postgres, oracle, mssql and others too.

PMunn’s picture

gdavid, I agree this fix is simple and should be implemented.

However, it seems that some have found a sorting problem in PostgreSQL even when all is working properly.

I vote we split that into a separate bug, and fix and close this one.

This bug, again, is simply that the thread ID increases too quickly. The code fix described should iron that out and with more testing perhaps the fixed code can contribute meaningfully to the second bug discussion.

All in favor?

PMunn’s picture

Here's a patch for 5.21.

AlexisWilke’s picture

Darn! This was posted in 2006 and no one in Core accepted the fix?!?!?

I cannot believe it!

The fix is given in #1 and has not been applied since then. Who should I strangle here?!

I posted my own report here   http://drupal.org/node/122098#comment-3112214   before finding this entry.

The function base_convert() ignores the periods on some versions of PHP (at least the two I have):

PHP 5.2.4-2ubuntu5.10 with Suhosin-Patch 0.9.6.2 (cli) (built: Jan  6 2010 22:01:14) 
Copyright (c) 1997-2007 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
PHP 5.2.6-3ubuntu4.5 with Suhosin-Patch 0.9.6.2 (cli) (built: Jan  6 2010 22:25:33) 
Copyright (c) 1997-2008 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2008 Zend Technologies
    with XCache v1.2.2, Copyright (c) 2005-2007, by mOo

Why was that not applied since 2006???? You're wasting everyone's time here. #62 comments and still bogus! Super very simple fix that won't hurt anyone!

Well... At least I'm glad I can fix my own version of Drupal. 8-)

Damien Tournoud’s picture

@AlexisWilke: actually, you are wasting everyone's time by bumping an old issue without doing anything to make it go forward.

As I identified back in #34 (!!), the ordering is *not correct* when using C collation. That means that the core of the issue is that *you need to create your database with a decent collation*.

The identifiers growing too fast is nothing more then a symptom of a wrong database collation, that will likely create other ordering issues.

AlexisWilke’s picture

@Damien,

No. There are two problems. The collation, which does not affect me at all, and the totally improper use of base_convert() as shown here:

      elseif ($comment->pid == 0) {
        // This is a comment with no parent comment (depth 0): we start
        // by retrieving the maximum thread level.
        $max = db_query('SELECT MAX(thread) FROM {comment} WHERE nid = :nid', array(':nid' => $comment->nid))->fetchField();
        // Strip the "/" from the end of the thread.
        $max = rtrim($max, '/');
        // Finally, build the thread field for this new comment.
        $thread = int2vancode(vancode2int($max) + 1) . '/';
      }

Since in that case $max may be equal to something like '01.00.00' and then the +1 is assigned to a number defined as 10000 and not just 1 as expected.

You do not have to believe post #0 and #1, of course. The other issues are a problem for some people, but if you had applied fix in #1 you'd already solve a lot of problem on a lot of websites. Note that it is very destructive since the threads are completely broken by this wrong math.

The collation would not break the math.

And I will continue to waste your time until you fix that very problem. The collation can wait. It is much more complicated.

Note that I'm using PostgreSQL and did not have any of the collation problems reported. However, the PHP code problem, #0 and #1, sure does break my website.

Thank you.
Alexis

Damien Tournoud’s picture

Since in that case $max may be equal to something like '01.00.00' and then the +1 is assigned to a number defined as 10000 and not just 1 as expected.

No, by definition if the collation is right, $max will be a top-level thread. As a consequence, the rest of your analysis is totally wrong.

AlexisWilke’s picture

Damien,

You may want to try the SQL and see that it works perfectly putting the last thread last as expected.

However, if what you are saying is true, would you be so kind as to explain this one sort:

      if ($order == COMMENT_ORDER_NEWEST_FIRST) {
        if ($mode == COMMENT_MODE_FLAT_COLLAPSED || $mode == COMMENT_MODE_FLAT_EXPANDED) {
          $query .= ' ORDER BY c.cid DESC';
        }
        else {
          $query .= ' ORDER BY c.thread DESC';
        }
      }
      else if ($order == COMMENT_ORDER_OLDEST_FIRST) {
        if ($mode == COMMENT_MODE_FLAT_COLLAPSED || $mode == COMMENT_MODE_FLAT_EXPANDED) {
          $query .= ' ORDER BY c.cid';
        }
        else {
          // See comment above. Analysis reveals that this doesn't cost too
          // much. It scales much much better than having the whole comment
          // structure.
          $query .= ' ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1))';
        }
      }

What is the difference with the MAX used for the INSERT and that other sort?! (I show D6, although D7 does the same sorting for display, D6 is easier to read.)

The order is the same and you see the threads as expected.

If that is wrong, then the threading is not working at all. Although, strangely, applying #1 fixes all the problems on my system.

The collation would not have any effect on the MAX() of the INSERT. The thread is sorted in the same manner in both places. Really, frankly. So it works the way intended. It returns the "wrong" value in the INSERT. I repeat myself, I know. But hey... after 3 and 1/2 years, you must be quite tired of this bug.

#50 perfectly shows the result of the problem. And again, apply #1 and it goes away.

Comment #1 you get "01/" as expected.

Comment #2 is a reply to comment #1, you get "01.00/" as expected.

And at that point, if you run "SELECT * FROM comments ORDER BY thread", it works as expected.

Comment #3, this is a new comment, you get "2101/", not what you'd want to get.

Yet, sort that, and it still works! Yes, try "SELECT * FROM comments ORDER BY thread" again, and it works just fine. Except that the number grew really fast (as the title of this thread points out). The expected number was "02/".

Note that at all the other levels, the PHP code does it right. Why not fix this root too? The sort is working until you have an overflow (as pointed out in the title... grows way fast, and then overflow!)

Now, you may wish that the MAX() in the INSERT work "better", just fix it like in #1 and stop being stubborn. (3 and 1/2 years?!)

Thank you.
Alexis Wilke

Damien Tournoud’s picture

@Alexis: see #34:

This is the order you are supposed to get:

$ echo -e $TEXT | LC_COLLATE=C sort
01.00.00/
01.00/
01/
02.00/
02/

The MAX(thread) is 02/.

Could you import this dataset in a table and give us:

  • the result of SELECT thread FROM comments WHERE nid = xxx ORDER BY thread
  • and the result of SELECT MAX(thread) FROM comments WHERE nid = xxx

Thanks for moving this issue forward, very appreciated.

AlexisWilke’s picture

Damien,

A long answer testing everything I could think of...

UTF-8 sorting not respected by MySQL

Got it. MySQL and PostgreSQL do not work the same way. Actually, as usual, MySQL does not work right at all. 8-)

First of all, as specified in http://api.drupal.org/api/function/db_check_setup/6 and pointed out in issue #349508: Require UTF8 database encoding, the PostgreSQL installation is expected to be done in UTF-8. Therefore, I have it that way. UTF-8 sees the period after the slash:

$ echo -e $TEXT | LC_COLLATE=en_US.UTF-8 sort
01/
01.00/
01.00.00/
02/
02.00/

This may sound strange to some people since the period (0x2E) has a code that is smaller than the slash (0x2F). That's what collation does for you.

Testing databases

To test I created a table, added some thread entries and then run a SELECT. The CREATE + INSERT both work the same in both database systems:

CREATE TABLE co (t TEXT);
INSERT INTO co VALUES ('01/'), ('02.01/'), ('01.00.00/'), ('02.00/'), ('01.00/'), ('02/');

Notice how I insert in disorder which means that a SELECT without ORDER is expected to give you the INSERT order (as long as you don't update anything.)

So, let's see with PostgreSQL. To my point of view, that works as expected, that's how I want to see the sorted threads in my website (i.e. comment 1, reply 1, reply of reply 1, comment 2, reply 1 of comment 2.)

     t     
-----------
 01/
 01.00/
 01.00.00/
 02/
 02.00/
 02.01/

However, when I test with MySQL I get it all wrong (personal point of view, of course):

select * from co order by t;
+-----------+
| t         |
+-----------+
| 01.00.00/ | 
| 01.00/    | 
| 01/       | 
| 02.00/    | 
| 02.01/    | 
| 02/       | 
+-----------+

This will not be sorted properly. It'd show me the most recent post first when I want the oldest and the most recent reply...

I suppose that the code to display the comments takes that in account in some ways because I get the threads right on my website with PostgreSQL...

Is it? Well... no. So in other words, the MySQL users have not noticed that everything was semi-up-side-down. Answers before the first question... Weird. And if you ask from newest to oldest, you also get a problem.

Could it be that the inverted sort works then?

This is the result of an invested thread (i.e. newest first). Now you get the main question first, but the last answer appears before the first. This has to be overly confusing to MySQL users.

select * from co order by t desc;
+-----------+
| t         |
+-----------+
| 02/       | 
| 02.01/    | 
| 02.00/    | 
| 01/       | 
| 01.00/    | 
| 01.00.00/ | 
+-----------+

Just in case, what I would expect as the correct output is what PostgreSQL outputs:

select * from co order by t desc;
     t     
-----------
 02.01/
 02.00/
 02/
 01.00.00/
 01.00/
 01/

Although this is not clear either since the question will appear after all the answers.

Can we agree on the correct/expected output?

The correct sorting would probably be on a per thread basis. So something like this:

02/
02.00/
02.01/
01/
01.00/
01.00.00/

In D7 you should consider having one more column to simplify the code tremendously and make sure it is compatible (using SUBSTRING is not a good idea.) So, there is what I'd propose in the comment render of D6... Although that only works for the first 36 entries:

select substring(t, 1, 2), substring(t, 3, length(t) - 3), t from co order by substring(t, 1, 2) ASC, substring(t, 3, length(t) - 3);

The ASC can be changed to DESC and the order on pages will still look coherent. i.e. Main question (new comment) and replies stay in order as I would expect.

Now, this works until the first number goes over "0z". (obviously, since 1, 2 means first and second characters.)

The very first character is the length so I guess we can use that to properly compute the size. And from the comment I've seen, the first character can be ignored (most certainly true). So I think that this would work:

substring(t, 2, substring(t, 1, 1) + 2)

except that substring() returns a string, so it needs to be CASTed in some ways... (in PostgreSQL, it's CAST(substring... AS int), but that does not work in MySQL.) That's definitively very complicated. With two columns, we'd solve the problem since the first column can be just an integer sorted normally. This represents the thread number. And the next column represents the replies.

Two column solution

Assuming that threads are t and replies are r, we'd get something like this instead:

select t, r from co order by t asc, r;

t int, r text (anything after the first period).

t could be set to cid (I think that would be a good idea to allow finding all the replies to a specific thread.)

Can we fix the collation in MySQL?

Thinking about it, I checked the MySQL setup and I get this for characters & collations:

SHOW VARIABLES;
[...]
| character_set_client            | latin1                          | 
| character_set_connection        | latin1                          | 
| character_set_database          | latin1                          | 
| character_set_filesystem        | binary                          | 
| character_set_results           | latin1                          | 
| character_set_server            | latin1                          | 
| character_set_system            | utf8                            | 
| character_sets_dir              | /usr/share/mysql/charsets/      | 
| collation_connection            | latin1_swedish_ci               | 
| collation_database              | latin1_swedish_ci               | 
| collation_server                | latin1_swedish_ci               | 
[...]

latin1_swedish_ci?! That's MySQL magic I guess. Yet, the tables are created with UTF8 as seen here:

  if (empty($table['mysql_suffix'])) {
    $table['mysql_suffix'] = "/*!40100 DEFAULT CHARACTER SET UTF8 */";
  }

Just in case, I tried to change the column text encoding as described here: http://dev.mysql.com/doc/refman/5.1/en/charset-conversion.html and the SELECT still does the same thing (exact same sorting; collation not respected...)

Okay, a search with Google brought this one: http://drupal.org/node/772678 and as usual, big surprise, MySQL is dearly broken... I'm so glad I don't use it! 8-)

May I suggest that we change the scheme in order to avoid collation problems? We can have an upgrade path to fix existing databases. The one good thing with MySQL is that people won't have had the overflow error happen to them.

Proposed solution

The following is what I see as a workable solution that works for both databases. In D7, you'd have to test with the others too (Oracle, SQL Server, SQLite, etc.) For that, I'd strongly advice for a test in the requirements() so we know at installation time that a database won't work right.

Two columns

Create two columns:

1. The thread identifier, an integer

2. The thread replies, still a text field

Thread identifier

The thread identifier can be set to cid. So, whenever you create a new comment, you set ("thread = %d", cid)

Replies receive the same cid as the thread (i.e. "parent most comment", practical to hide a whole thread!)

Replies field

This can pretty much keep the same scheme, however, to avoid collation problems, we can force the width of each "column" to a specific size and thus drop the period. I would suggest 6 digits in decimal (i.e. up to 1 million comments) to make totally sure, although if we still use base 36, it is reduced to 4 digits and we'd have 36^4=1,679,616 comments before running in a problem. So the field will grow larger than now, but not by much unless you get many replies to replies (i.e. "000000000000" would represent 3 levels!) but the sort works in all cases, whatever the collation (assuming that 0-9, a-z are always sorted as expected?!).

Code changes

This means code changes to all the modules that directly peek and poke in the comments table (i.e. Discuss This! is one I have worked on. Also I think that one does not access the table directly.)

Otherwise, it is mainly in the SELECT where you'd just add a column when sorted by threads and remove the SUBSTRING().

In the comment_save() function, it is fixing the else part which assumes that the thread value is incorporated in the reply field. Otherwise, the new column is very easy to manage since the following SELECT works as is:

SELECT MAX(thread_id) + 1 FROM {comment}

Upgrade of existing installations

This is of course the hardest part, making sure we can upgrade existing installs. Since PostgreSQL users who have "too many" comments in a threaded comment list have had their database destroyed, we cannot really help them without having a module helping them reorganize their comments, if they want to spend the time to do so. (I've done mine by hand... I had only one such page with too many comments.)

For all others, we should be able to fix the data pretty easily. And this the thread identifier would be set to cid, the too large problem vaporize automatically.

Note: we can tell that a PostgreSQL install was "destroyed" by checking whether there are some comments with "00/" which should never happen (you start with "01/".) So we could at least let people know that Drupal Core messed up on that one...

Damien Tournoud’s picture

I'm semi-happy you confirm what I suspected all along: that threading is severely broken on PostgreSQL.

You should take a look at the documentation of comment_render() that describes the design choices and assumptions of the current code, which is *not* broken but assume a certain collation.

Sadly, I'm pretty sure your analysis is wrong: it's not MySQL collation that is broken, but PostgreSQL's one. MySQL implements the standard Unicode collation algorithm. You can check (using this test page, for example) that '.' is supposed to be before '/'.

PostgreSQL doesn't support collation itself, but rely on the implementation in the GLibc. Sadly, this implementation seems broken in most versions of Linux I know. It's not always the case, if I repeat the test from #34 on Mac OS X (which is known to have pretty solid Unicode support), I get the expected:

$ echo -e $TEXT | LC_COLLATE=en_US.UTF-8 sort
01.00.00/
01.00/
01/
02.00/
02/

We already knew that Collation support is a very weak point of PostgreSQL. This only confirms it.

That said, I'm not sure what is the way forward. Could we find a set of characters that can work on both broken and non-broken collations?

AlexisWilke’s picture

Another important note:

While we work on this patch and have it tested, etc. I would suggest that we apply patch in #1, even if we remove it later. Until then, PostgreSQL users will lose data, which I think is really bad.

My own change is slightly faster.

$max = rtrim($max, '/');
$p = strpos($max, '.');
if ($p !== FALSE) {
  $max = substr($max, 0, $p);
}

As for the sorting problem, as mentioned by someone, removing the '/' altogether or replacing it with a period would be a good idea for MySQL users. But that one I can care less... 8-) Plus, the wrong sort does not destroy any data.

mysql> select * from co order by t desc;
+-----------+
| t         |
+-----------+
| 02.01.    | 
| 02.00.    | 
| 02.       | 
| 01.00.00. | 
| 01.00.    | 
| 01.       | 
+-----------+

mysql> select * from co order by t asc;
+-----------+
| t         |
+-----------+
| 01.       | 
| 01.00.    | 
| 01.00.00. | 
| 02.       | 
| 02.00.    | 
| 02.01.    | 
+-----------+

Thank you.
Alexis

Sanbka’s picture

I think that the problem with tree of comments on PostgreSQL can be temporarily solved by using the code (line 987 of comment.module)

$query .= " ORDER BY rpad(c.thread, 255, 'z') DESC";

and (line 741 of comment.module)

$max = db_result(db_query('SELECT MAX(thread) FROM {comments} WHERE nid = %d AND pid = 0', $edit['nid']));
PMunn’s picture

Please enact the #1 patch to ensure the integrity of the thread ID the way it is designed. This is all anyone seems to be asking for.

Once the thread ID generation is fixed, it follows that a new issue would be opened to deal with any problem PostgreSQL systems have with it, with an eye to a separate idea on reorganizing storage of threads and what kinds of display modes they support. Philosophically, at least with this problem in PostgreSQL, it seems to me this should be solved in Drupal instead of waiting for an external DBMS to fix it. If that means allowing MySQL to do the collation heavy lifting and having code for other DBMSs that don't collate properly, that's what supporting multiple databases means. You are in control of the destiny of this project, not PostgreSQL.

Waiting for someone else to fix their system could be problematic. You could end up writing a bug report and waiting 3 years and 8 months for acceptance of 10 lines of code that could fix the problem. For example.

Damien Tournoud’s picture

Title: comment thread IDs grow too fast » Comment threading broken on PostgreSQL

You are in control of the destiny of this project, not PostgreSQL.

Waiting for someone else to fix their system could be problematic. You could end up writing a bug report and waiting 3 years and 8 months for acceptance of 10 lines of code that could fix the problem. For example.

This is not how this thing works. *You*, the users of Drupal on PostgreSQL, are on the driving seat for Drupal on PostgreSQL. Nobody will do it for you. Waiting for someone to fix Drupal on PostgreSQL for you could be problematic :)

Now that we have (at last!) confirmed that comment threading is severely broken on PostgreSQL (it took 3 years and 8 months, I do not congratulate you, PostgreSQL community), what do you suggest to fix it?

AlexisWilke’s picture

Damien,

#1.

But, pretty please, wait another 4 months, so we get at 4 years first.

Thank you.
Alexis

PMunn’s picture

This is not how this thing works. *You*, the users of Drupal on PostgreSQL, are on the driving seat for Drupal on PostgreSQL. Nobody will do it for you. Waiting for someone to fix Drupal on PostgreSQL for you could be problematic :)

Ohhh, it's been me all along! Me, the person who wrote a patch #1? Me, who can't commit that patch? Me, who is in favor of not escalating this patch into a whole attack on the comment module? Me, who is in favor of walking before running?

PMunn’s picture

Bug is present in 6.20. Patch attached.

alpapan’s picture

Version: 7.x-dev » 6.x-dev

+1 that bug is there at 6.20 and HEAD

AlexisWilke’s picture

alpagan,

This bug has been there for ages, but the comments author does not use PostgreSQL and therefore he pretty much said himself that he did not want to "fix something that works for him."

Maybe if you can have some political pressure over him? 8-)

Thank you.
Alexis

catch’s picture

Version: 6.x-dev » 7.x-dev

Patches are applied to the latest development version (or right at this minute current release) first, then backported to Drupal 6. Changing the status takes this out of the queue of most people who can look at it.

@This bug has been there for ages, but the comments author does not use PostgreSQL and therefore he pretty much said himself that he did not want to "fix something that works for him."

Who is the "comments author" that "doesn't want to fix" it?

To get this patch committed, here's what needs to be done:

1. Review the current comment tests to see if if they now cover all comment threading permutations to cover webchick's question in #53 (now more than two years ago, no-one complaining about this patch not being committed appears to have even looked at the tests, if you're lucky someone else may have written them for you while you were bickering).

2. Re-roll the patch so that it applies against HEAD, adding the missing tests if necessary.

3. Make the cast to int meet the conventions at http://drupal.org/coding-standards

4. What would really help would be to add a test that fails in HEAD on postgresql, but works with the patch applied, this would demonstrate that it actually fixes the bug and prevent future changes to comment module from causing a regression.

Zed Pobre’s picture

Priority: Normal » Major
Status: Needs work » Needs review
FileSize
1.58 KB

I have updated the patch in #1 to Drupal 7.x-dev, applied it to my local installation, and verified that it fixes the data corruption caused by multiple comments on a PostgreSQL backend, which in turn fixed the threading issue I was seeing that caused me to find this page.

For reference, I'm testing against a local tree with one additional patch applied: the comment authorship fix by Bleen18 at https://drupal.org/node/936844#comment-3892522.

There is nothing in that patch to require a special cast to int; I assume that is in reference to one of the other patches. I went with this one because it was minimally invasive, easy to understand, and first in the list.

I don't know enough about the testing structure to add tests. If that's absolutely mandatory (and I agree it would be nice if someone has the time and knowledge), I must ask that someone else please look into it. The basic test sequence would be to add two comments with no parent, two comments each selecting one of the first two as parents, and then verifying that the contents of the thread column are correct.

I can push my fork of Drupal to GitHub if it would be useful to others to be able to download both patches at once until they are accepted upstream.

I am also reflagging this bug as Priority Major due to the fact that it is a data corruption bug that can destroy installations. I just spent an hour manually correcting corrupt entries, and I'm not done yet, and it's only been so fast because I am just this week starting to migrate data from a relatively small Drupal 5 installation and I've only copied a few nodes over.

alpapan’s picture

Sorry Catch, my bad for changing the version number. I didn't know how it worked (now I do)
a

catch’s picture

Version: 7.x-dev » 8.x-dev
Status: Needs review » Needs work
Issue tags: +PostgreSQL, +Needs tests, +Needs backport to D7
// We need to get the value at the correct depth.

This should be "Get the value at the correct depth to ensure blah blah blah." - there is no explanation why we're doing this.

Still needs a test.

Zed Pobre’s picture

Title: Comment threading broken on PostgreSQL » Data corruption in comment IDs (results in broken threading on PostgreSQL)

Well, I re-rolled the patch in my repository to lengthen the comment, but while I was trying to figure out how to build a test for this (which if I understand how things work is completely impossible on the current technique without refactoring comment_save), I think I got a better understanding of what vancodes are.

If I'm reading this right, the vancode describes just one segment of the entire comment ID, not the entire comment ID, i.e. comment IDS are <parentvancode>.<childvancode>.<grandchildvancode>, and so on. If that is the case, the more important problem is in vancode2int, not in comment_save, and the better description of the core problem is that vancode2int has no sanity checks whatsoever -- not for invalid characters and not for the first digit actually being numeric and not for exceeding the length specified in the first digit -- and test cases *can* be theoretically built for vancode2int by attempting to throw invalid values at it and making sure that an expected result is returned. I'm going to redo this patch by forcing vancode2int to conform to the vancode specification as described in the comments of int2vancode and see if I can build unit tests to match.

That said, I think it's probably appropriate to attempt to keep a fix in both functions -- it's technically still a bug (albeit one for which it is impossible to build a unit test) for comment_save to attempt to run vancode2int on a string containing more than one vancode section.

I'm a little busy this week, but I hope to have time by this weekend.

remidewitte’s picture

Just hit the bug with a D7 install.

Thanks Zed Pobre, the patch from April 19, 2011 at 7:50pm just works. Why this simple patch was not applied for D7.10 for example ?

xjm’s picture

@remidewitte: The patch was not in 7.10 because no one made the necessary changes nor contributed an automated test for it yet.

Remaining tasks:

  1. Reroll the patch with the following changes:
  2. Add an automated test that reproduces the bug. Upload the test in a patch by itself (which should fail tests) and in a patch combined with the fix described in #1 (which should pass).
  3. Review #84 and determine if any further changes need to be made to the patch.
  4. An issue summary would also be helpful.

Tagging novice for any of the tasks above.

NROTC_Webmaster’s picture

Here is the patch updated for the latest version. I'm not sure how to go about making the test but if anyone wants to provide guidance I can give it a shot.

NROTC_Webmaster’s picture

Status: Needs work » Needs review
xjm’s picture

Thanks for the reroll. Looks okay to me, pending testbot of course.

We can't create a functional test for the Postgres part of it since that's tied to a specific enviroment. However, what we can do is test that the comment ID is incremented properly. So, the test should instead test the comment IDs themselves.

Based on the steps to reproduce in the summary, here's what I'd suggest for a test:

  1. Create a node.
  2. Add a comment A to the node.
  3. Add a reply B to comment A.
  4. Assert that B's ID is incremented in the manner indicated.
  5. Add a reply C to comment B.
  6. Assert that C's ID is incremented in the manner indicated.
  7. Add a comment D to the main node.
  8. Assert that D's ID is incremented in the manner indicated.

The last assertion will be the one we expect to fail before the patch and pass with it. The test case should go in comment.test, probably in a new class extending CommentHelperCase (as none of the other child classes looked particularly applicable when I skimmed just now.) We could call it CommentThreadingTestCase or something.

Edit: You'll probably want to inspect what the actual values are before and after the patch to make sure you understand the pattern and that I'm not just making stuff up. ;)

Thanks @NROTC_Webmaster for looking into it.

NROTC_Webmaster’s picture

FileSize
4.75 KB

I'm attaching an attempt at a test for this.
I took the suggestions from xjm on how to make it and then copied some of the code from earlier in the test. The problem is that it passes with or without the patch and I'm not sure where to go from here.

Can anyone take a look at it and let me know what I need to do to make it work correctly.

NROTC_Webmaster’s picture

Status: Needs review » Needs work
AlexisWilke’s picture

xjm,

The test will pass with or without the patch if you run MySQL.

The problem underlined here is that the old code makes use of a "special" feature of MySQL (I'd call it a bug...) which doesn't work with PostgreSQL.

Thank you.
Alexis

xjm’s picture

The test I outlined does not check for a database error. It's supposed to check the generated ID code, independent of the database, at least as far as I understand. The previously posted test does not follow the STR from the summary.

Are the values of the IDs themselves tied to the particular database?

NROTC_Webmaster’s picture

Status: Needs work » Needs review
FileSize
6.11 KB

The test should hopefully fail for anyone on PostgreSQL without the patch and then pass with it. I only have MySQL so I cannot really test to make sure of that but based on the description I think I have a good test.

NROTC_Webmaster’s picture

And then hopefully it will still work with this one.

xjm’s picture

FWIW, it should more than merely fail on PostgreSQL--the comment 5 assertion in particular should fail. It'd be worth outputting the actual values from assertEqual() rather than messages, I think.

I'll try to test the next time I'm on a machine where I have postgres installed if no one gets to it before then.

Edit: Alright, I just spent some more time reading the issue, and it looks like it's indeed the case that the assertions won't fail on testbot. (In #69 DamZ's example makes this fairly clear.) I still think the test is worth adding, though.

NROTC_Webmaster’s picture

I was having a few problems with git so hopefully this patch will apply. Thanks @xjm for all of your help on this.

xjm’s picture

FileSize
5.48 KB

Tests-only version for tester convenience.

haydeniv’s picture

Ran test in #99 on Postgres with the following results:

Comment Threading
Test to make sure the comment number increments properly.
153 passes, 4 fails, 0 exceptions, and 46 debug messages

Screenshot:

haydeniv’s picture

FileSize
13.47 KB

Specific test fails:

haydeniv’s picture

FileSize
7.95 KB

Here is the results of the test in #98:

xjm’s picture

Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests, -Novice
FileSize
5.66 KB

Attached just removes a couple of out-of-scope whitespace changes.

The patch now includes tests for comment threading. These tests do not fail on testbot because this is an environment-specific bug (related to a particular database behavior). However, the screenshots above illustrate the test failures (before the patch) and passes (after the patch) on Postgres, which will be useful some day if/when we have multiple test environments (which I know is not going to happen in the short term, but is a planned feature). Meanwhile, the test coverage codifies proper comment threading so that we don't break it in any other way.

Thanks NROTC_Webmaster and haydeniv for the awesome work on this issue!

RTBC.

webchick’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

YAY! SO nice to have comment threading tests! Thanks too for the really detailed manual tests that help show those of us PostgreSQL-less how things are working. :) So wonderful to have this 6 year old issue nailed at last!

Committed and pushed to 8.x. Thanks! This will need a re-roll for D7.

xjm’s picture

Assigned: Unassigned » xjm

I'll backport this since I am responsible for the API change that makes it more than a simple reroll. :)

xjm’s picture

Assigned: xjm » Unassigned
Status: Patch (to be ported) » Needs review
FileSize
5.57 KB

Go testbot!

aspilicious’s picture

Status: Needs review » Reviewed & tested by the community

Looks a sane backport!

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Excellent, thanks!

Committed and pushed to 7.x. Woohoo! Another major bug bites the dust. :)

Status: Fixed » Closed (fixed)
Issue tags: -PostgreSQL, -Needs issue summary update, -Needs backport to D7

Automatically closed -- issue fixed for 2 weeks with no activity.