Now that issue followups are comments (what we're affectionally calling "IFAC"), one of the things this will allow is deleting inappropriate followups.

However, issue followups are numbered sequentially in the queue. This is crucial functionality we have to maintain, e.g. so you can say "Dries was right in #39, #25 is a total hack and should be rolled back", or "#26 is RTBC, but #28 is insanity, don't touch", whatever. ;)

Currently (pre-IFAC), these followup sequence numbers are generated as the page is rendered. For each followup we display, we increment a counter, and print it out as we go. This works fine, so long as followups are never deleted. However, once we can delete followups via IFAC, if we ever delete one, all the numbering would be screwed up and people's existing comments would be referring to the wrong patches, etc. :(

Right now in IFAC, we save this followup sequence number directly in the comment (as the comment subject, in fact). That way, even if you delete one, later comments retain their numbering. The problem is that we've not got a race condition when trying to allocate the next sequence number when 2 people are replying to the same issue at the same time. :(

There are a few possible solutions:

A) Forget about the sequence numbers, expose the comment's real unique ID, and get everyone to start using those. This is a nightmare for legacy issue content and from a usability standpoint.

B) Forget about deleting followups. Make it impossible to do via the UI. If it doesn't belong, just unpublish it. Continue to just print the sequence numbers when we render the page. This would require a lot of bloat to hide the delete UI, and defeats some of the gains of using core comments in the first place.

C) Add a new column to the {project_issues} DB table called something like "last_comment_number", and LOCK/UNLOCK this table in the appropriate places when adding a new comment to make sure that 2 replies don't end up with the same value for their sequence number. This would work fine, but introduces potentially big performance bottlenecks, especially on d.o where a sizable fraction of all traffic is in the issue queues. At least d.o is running innodb now, so it wouldn't have to be full table locks, but still. :(

D) Add the new column but forget about the LOCK and don't worry about the race condition. In a few rare cases, we get 2 comments with the same number. They'd still have unique comment ids, so you could still distinguish them if you had to.

E) Some better solution someone else can come up with...

Thoughts?

Comments

deekayen’s picture

I don't really like my idea because I like spam to vaporize, but I thought it might add to the brainstorm.

When I get spam on one of my YouTube videos and I go moderate it, I click a spam button and the comment collapses. I often find they don't delete the spam right then and when I come back, the spam is still there, but collapsed. Maybe there's a short-term CSS solution that way?

dww’s picture

@deekayen: that's basically just the "don't delete, unpublish" option (B) from the original post, but with some more UI sugar coating on top, no?

Anonymous’s picture

Doesn't db_next_id() take care of the worry about the race conditions?

Instead of #1, etc for the comments one could display the date and time of the comment #2007-10-04 19:07:32. This would help remove the concern for sequential mess. One could even be cleaver about the display and let the legacy use the sequential to keep comments like ``Dries said at #2 ...'' correct.

anders.fajerson’s picture

How about D) but adding some more logic to it: When the comments are rendered some code checks for potential (rare) duplicated id's and adds something after it in that case: e.g #2a, #2b (in the case of two comments with #2).

dww’s picture

@earnie: This isn't about unique ids for the comments -- D5 core does in fact handle that via db_next_id(). That's the "comment id" or "cid", as in the (currently) 6 digit number that uniquely identifies the comment in the {comments} table. I'm talking about the sequence number for each comment per issue node. So, the {sequences} table would need separate records for every issue node on your site if you wanted to use db_next_id() to solve this. And that table (and db_next_id() itself) is already gone in D6, anyway. db_next_id() was removed specifically because people hated the performance hit of the LOCK.

Unfortunately, your proposal to use date/time instead of comment id doesn't make #1 suck any less in terms of existing issue content, nor the lack of easy usability for the future. No one wants to cut/paste date stamps any more than they want to try to mentally make sense of things by dealing with 6 digit comment ids. Nor would anyone want to try to find the right patch by comparing minutes/seconds closely, etc. :(

greggles’s picture

It sounds like you want to calculate the next comment # when the form is rendered. Can't we solve the problem by calculating the next comment # right before the insert happens as part of the same query:

insert into {whatever_table} values (form_stuff, form_stuff2, (select count(1) from {commenttable} where stuff), etc)

Is this still the race condition you're talking about?

dww’s picture

@greggles: Right, so long as you don't lock the DB, it's possible that 2 people are trying to create a comment in the same issue at the same time, and they get the same answer for the "SELECT whatever" query to find the next value, before they can update the value themselves (whether that update is implict by adding another row or explicit by updating a counter field somewhere).

Also, COUNT(*) fails us in the case where a followup is deleted. Say you've got 5 comments, the last numbered #5. Turns out #4 is spam, so you deleted it. When you go to add another followup (which should be #6) COUNT(*) gives you 4, you add 1, and you get #5 again. :( So, you really need either a) to select the last comment, parse the title, strip off the #, and increment the integer, or b) add a new column in {project_issues} as I propose in (C) in the original post...

bjaspan’s picture

fajerstarter: I really liked your idea about #2a, #2b, #2c until I realized it has exactly the same problem we are trying to solve. If in #4 I refer to #2c and later on #2b gets deleted, the old #2c will now be #2b so my reference will be wrong.

The MySQL manual (end of section 1.9.5.3) suggests a way to fake row-level locking, which would allow option (c) to work (create a next_comment_number column in the issue table) without locking or transactions:

With MyISAM tables, you can use a flag column in the table and do something like the following:

UPDATE tbl_name SET row_flag=1 WHERE id=ID;

MySQL returns 1 for the number of affected rows if the row was found and row_flag wasn't 1 in the original row. You can think of this as though MySQL Server changed the preceding statement to:

UPDATE tbl_name SET row_flag=1 WHERE id=ID AND row_flag <> 1;

This adds two extra update queries to every followup, though.

I'm sort-of inclined to ignore the problem and allow the rare set of comments with the same sequence number. It won't happen very often and, when it does, followups can refer to "#2-bjaspan" or "#2-dww" as that designation will be right even if "#2-spammer" gets deleted. If a single user intentionally hits the race condition to create multiple followups with the same number by the same author, they are certainly all spam and no one will want to refer to them anyway.

anders.fajerson’s picture

bjaspan: So true. Although it could still be an improvement since rare (race) + rare (deletition) = rarer.

Maybe it's too costly but instead of just rendering the "2a" with some logic it could also be saved to the database, making it "sticky".

I agree that "#2-bjaspan" adoption might make this a non-issue.

dww’s picture

@fajerstarter: interesting approach. seems like a hack, but i'll grant these race conditions will be fairly rare. one big problem here is that we're just using core's comment_render() and that's printing out the titles directly. i don't know of any good way to hook into that to modify the output, short of parsing the resulting HTML ourselves (ugh). :( Furthermore, for these identifiers to be deterministic (the problem we're trying to solve), we'd definitely need to update the comment subject in the DB to reflect these collisions, but it's not clear when to do that. We could do it when we insert the comment, but then we're back in race condition land. ;)

Really, the consequences of being bitten by the race are fairly minor -- it's not like we're corrupting data here, it's just that the UI is a little strange with duplicate followup numbers (which can still be uniquely identified manually by folks posting in the issue with a little more work, either via the comment id, or the author/date, etc).

Another approach is basically (D), but with a little bit of extra work to make the window for the race even smaller. Something like putting this in a loop:

$next_id = db_result(db_query("SELECT last_comment_number from {project_issues} where nid = %d", $nid)) + 1;
if (db_result(db_query("SELECT COUNT(*) FROM {comments} WHERE nid = %d AND subject RLIKE '%%%d%%'", $nid, $next_id))) {
  // collision, try again
}

But, that might also suck for performance (maybe even worse than the LOCK). ;) It's a shame to make performance bad in the common case to defend against a rare potential problem. :(

dww’s picture

Eek, sorry, lots of comments (yay for feedback!). My comment #10 was referring to comment #4. I didn't even see bjaspan's #8 when I was composing my reply, though it seems we reached very similar conclusions.

@bjaspan: Your poor-man's LOCK is a lot more elegant than my hack to check for collisions, thanks for sharing. ;) Although, it appears that's only for MyISAM tables, which won't do. :(

Crell’s picture

Just how often would a follow up be deleted, anyway? If it happens once a week or less (I see decidedly little true spam in follow-ups), then I think that's something we can live with. If it happens daily or more, then it's probably worth the effort to close that race condition hole.

bjaspan’s picture

I did not think the row_flag trick depended on MyISAM, that was just given as an example because MyISAM does not have row-level locking. I assume (!!!) that all databases can tell you how many rows an UPDATE query modified. No?

steven jones’s picture

@crell: The race condition doesn't depend on the number of deletions in an issue at all. If ANY two people submit a follow up to ANY issue at the same time, they're going to be fighting over the next number in the list, and could both end up with the same number.

Crell’s picture

OK, I misunderstood the problem here then. Two people replying to an issue at about the same time does happen often enough to worry about, and is likely to only happen more in the future. :-)

This sounds like a case where having a non-primary-key sequence system would help after all. ;-/

dww’s picture

@bjaspan: I get it now, thanks. db_affected_rows() should work for this and be portable. Perhaps this is our best bet for now...

@crell: Sorry if this issue is confusing. Deleting comments is relevent because it means the current system of just sequentially numbering the followups on page render will no longer work. The race condition is introduced by the fact that we want to store these per-issue sequence numbers in the DB (so they're resiliant in the face of a delete). The race is there given any 2 followups, regardless of if there have been any deletes, even though the reason we're storing these sequence numbers in the DB is to protect ourselves from the (hopefully rare) case of a deleted comment.

Anonymous’s picture

How about adding our own mutex lock controls? Something like drupal_mutex_accquire() and drupal_mutex_release() then we could have more control over this issue. This issue is really bigger than just project isn't it?

Anonymous’s picture

Another thought: Use a flag to mark as deleted leaving the data row but not displaying the comment if the flag is set.

chx’s picture

Emulate Drupal 6 and use auto_increment and update the comment once you have the id.

dries’s picture

One could argue that this issue is not specific for the project module, or project issue module for that matter. If I have a discussion in a forum topic, I might also want to use similar comment numbers to refer to someone else comment. If there is a simple way to implement this as part of the comment module in core, I might be willing to let this slip in for Drupal 6. I'm just floating the idea, I don't have definite answers yet, and I'd like to see this discussed first.

drewish’s picture

You might want to check with the DB powers-that-be on the future of db_affected_rows(). I was using it in my SoC project and someone on IRC mentioned that it was supposed to be on the way out since it's not supported by all databases...?

Crell’s picture

http://api.drupal.org/api/function/db_affected_rows/6

db_affected_rows() is still around in D6. db_num_rows() was removed. I expect that in D7 db_affected_rows() the function will go away but the functionality will remain in some other name. It's safe to use if necessary.

dww’s picture

@earnie #17: Doesn't help, because the race comes from multiple instances of the Drupal php code in separate httpd threads. Drupal-specific mutex at the php layer won't be able to prevent this race. The only way to do IPC here is via a DB LOCK, since the DB is the only thing that all the Drupal php threads have to syncronize on.

@earnie #18: That's (B) from the original post.

@chx #19: I don't see how auto-increment helps us here. We don't have separate tables for each issue node, which appears to be what you'd need if you wanted to rely on auto-increment. Seems like what we need is to restore {sequences} and db_next_id() and do something like db_next_id("{project_issue_comments}_". $issue_nid); with separate rows in the {sequences} table for each issue we're keeping track of a sequence on. Either that, or the various poor-man's LOCK proposals to notice collisions and recover from them.

@Dries #20: On first thought, it's not clear how these comment sequence numbers would make sense in the UI except for sites that did flat comment rendering, not threaded. Seems like it'd be confusing to display these numbers on threaded comments. It'd probably be somewhat costly to compute them, and it'd be a shame to do that if we're not even going to display them on many sites. Plus, I really don't see any way to do this generally for all core comments without a schema change, and it does feel kind of late in D6 to be doing that, even though I totally appreciate your willingness to make this a special-case exception to make IFAC work right on d.o...

Crell’s picture

If we were to do it for comments generally, it could be an easy toggle in the config whether or not to display a "local id". Probably even per-node-type. dww is right, though, that it could be tricky to do. After all, we're again trying to generate a unique sequence number without relying on an auto-increment field, so we're right back where we started, just storing the value elsewhere.

Hm, although what about something like this (properly written, anyway):

INSERT INTO comments (nid, local_id, ...) VALUES ($nid, (SELECT MAX(local_id) FROM comments WHERE nid=$nid)+1, ...)

Would that be sufficiently atomic?

(And I just realized I wrote a query that my insert-builder for D7 wouldn't be able to handle yet. Bah!)

bjaspan’s picture

#24: Logically, that nested query should not be atomic; there is nothing that prevents two queries from issuing the sub-query, getting the same answer, and inserting the same value into separate rows. In practice, it of course depends on the dbms implementation. In a multi-master arrangement, I'm sure it wouldn't be. I certainly wouldn't want to depend on it.

dww’s picture

The more I think about it, the more i think db_next_id() is the right solution to this problem. It's a real shame y'all removed it from the API for 6.x. :( I'm highly tempted to just write the code for the 5.x version of IFAC that uses it, and then open a separate issue about restoring db_next_id() in 6.x. We could use bjaspan's poor-man's LOCK for the implementation instead of really forcing a LOCK, which should hopefully address some of the performance concerns that caused it to be removed in the first place, right?

drewish’s picture

dww, i don't think that' anythings been lost by getting rid of db_next_id(). since the id would be specific to each node rather than each table you'd have been mis-using the function to a certain extent. and it locks as you said you wanted to avoid. since you're locking a table you might as well lock the project comment table and compute MAX() + 1 for the new comment number.

dww’s picture

@drewish: db_next_id() just says "Return a new unique ID in the given sequence". That's what we're doing here -- we've got a bunch of sequences, and we need to be able to get the next unique ID in each one. The fact that our sequences happen to be named "'{project_issues}_'. $issue_nid" doesn't mean we're mis-using the function. Getting the next available number in the named sequence is exactly what db_next_id() is for. Granted, the LOCK sucks for performance, which is why it could be reimplemented with the fake-row-level-locking described in #8. However, if we did this ourselves in the {project_issues} table, we'd just be duplicating the logic of db_next_id() for our special case...

hunmonk’s picture

one thing we'll need to guard against w/ the poor man's LOCK is the (hopefully rare) situation where the db marks the row as locked, then fails for some reason to unmark it (unexpected crash, etc). in this case we're going to need a reasonable timeout for a followup trying to obtain the next comment ID.

Anonymous’s picture

dww@26: I'm doing development work now that is really dependent on this function and I'm going to have to depend on the lock so it doesn't suck that badly. Because of your warning earlier about its removal in D.6 I copied the functionality under my module. Perhaps we could create a module just to resurrect db_next_id for D.6 for those who are dependent on this type of functionality?

hunmonk’s picture

as i understand it, the advantage of the poor man's lock is that it will be row level, as opposed to locking the whole table, which db_next_id() does.

Anonymous’s picture

I still need the value of the next id with the assurance that it will remain unique. This poor man's row locking may be implemented in the db_next_id function but the maintenance of the flag may be just as expensive as the table lock itself. If each module had its own sequences table then each the LOCK become less painful as the mutex then moves per module and I don't have to wait on the node module to release the LOCK for the taxonomy module to gain its LOCK.

bjaspan’s picture

I think we'd want to study the "poor-man's lock" suggested in the mysql manual much more extensively before using it in a Drupal-wide function that has to work 100% of the time than we would before using it in a project-only function that actually does not matter if it returns a duplicate value. Also, as others have suggested, the study would need to include performance, too.

As for timing out the lock in case of failure, we could change it from a boolean flag to a Unix timestamp:

UPDATE tbl_name SET row_flag=NOW WHERE id=ID AND row_flag < NOW-300

This updates the one row where id=ID if the flag has not been set in 5 minutes, otherwise it updates zero rows.

Of course, if for some reason a process sets row_flag, waits for 6 minutes, then continues operating assuming it has the lock, the lock has failed and we'll get erroneous results.

hunmonk’s picture

i don't think that will work -- it doesn't seem to provide a way for a new incoming lock request to know what's going on. i think we would need a combination of a flag and a timestamp to make this kind of thing work.

Anonymous’s picture

@hunmunk: I agree which is why I suggest that we create per module sequence control tables. The module would lock it own table so that the number of requests for a lock is reduced. If the module only uses one row of sequences in its own {_sequences} table it essentially becomes row locking with the benefit of having the lock controls remain with the sql engine.

Anonymous’s picture

That {_sequences} should read {<module>_sequences}.

dww’s picture

@earnie: one problem is that we need a separate sequence for *every* issue node. So, the {project_issue_sequences} table won't just have a single row, it'll have 1000s of rows, and it'd be nice to avoid a table-wide LOCK, even if it's just for this.

Anonymous’s picture

#37 submitted by dww on October 8, 2007 - 14:08 updated

@earnie: one problem is that we need a separate sequence for *every* issue node. So, the {project_issue_sequences} table won't just have a single row, it'll have 1000s of rows, and it'd be nice to avoid a table-wide LOCK, even if it's just for this.

I know of no better solution. If you update a lock row flag, then you need to unlock the row flag which can happen in the REPLACE BY. However, you need a function to handle the overhead the LOCK takes care of for subsequent requesters while the flag is set. I don't think you can really win extra time by eliminating the LOCK in this fashion. Beyond this a DB engine that supports row locking needs to be used.

hunmonk’s picture

well for the d.o use case, locking all issues to get a unique number would be a serious performance problem, i think. however, at least on d.o, the number of simultaneous requests to one issue would be low, and collisions would be relatively rare. in most cases it would require two pretty inexpensive queries per followup to do the row level locking -- i see that as the most livable option we have.

Crell’s picture

Didn't the remove-sequences patch have a viable single-query sequence generator? I could have sworn some iteration of that patch had a non-locking query that worked before we decided to just drop sequences all together.

hunmonk’s picture

Assigned: Unassigned » hunmonk
Status: Active » Needs review
StatusFileSize
new7.75 KB

we need to move this issue forward. in the talk is cheap, code is gold spirit, here's a first stab at the poor man's row lock. after having written it, i can't say that i'm crazy about the iimplementation, but it's at least an attempt that we can work from.

tested this on postgres, and it works, including previews and the upgrade path. haven't tested a clean install, or mysql at all.

bjaspan’s picture

I agree that the actual implementation ends up being clunky. Anyway, my first comment is that you need some kind of delay between your tries to get the lock. I suggest that each iteration wait a random number of milliseconds between A and B (so that multiple attempting lockers do not make their second attempt at the same time, too), where A and B also increase each iteration, until some maximum amount of time passes. I'm sure there is an optimal algorithm out there somewhere for choosing the delay ranges.

hunmonk’s picture

StatusFileSize
new7.98 KB

@bjaspan: how's this look?

bjaspan’s picture

That's pretty much what I had in mind.

My late-night-math calculation is that this will wait on average 11 seconds before giving up. I have no idea if that is in the right ballpark or not.

Anonymous’s picture

Status: Needs review » Needs work

The SET lock = 0 update after the while destroys the lock in the wrong place. This gives opportunity for a failing lock to unlock the row. The SET lock = 0 needs to be set at the time the last_comment_id is updated.

bjaspan’s picture

@earnie: Good catch. Not sure how I missed that.

Anonymous’s picture

I was reviewing the descriptions of some of the other tables and had the following thought about this patch. The column name lock should be changed to db_lock to be more descriptive and not confused with meaning "row can't be deleted by the UI".

hunmonk’s picture

Status: Needs work » Needs review
StatusFileSize
new8.31 KB

well, i wanted a failing lock to unlock the row. the reasoning is that if a followup can't claim a lock after 11 seconds and 20 tries, then something is wrong with the lock, ie it's stuck. my hope was to eliminate support requests of "i can't submit a followup to issue X." however, given that this will be an even more rare occurance than the race condition (the database failing at exactly the right spot), i suppose it makes more sense to put the unlock where you suggest. attached patch corrects.

i've also added cleanup code for the case of a failed lock. while it probably looks a bit harsh, i see no other reasonable way to do it -- the comment has to be deleted or it will appear in the issue. i tested the code pretty thoroughly, and it works as expected -- the aborted comment is fully cleaned up, and the redirect prevents any additional bogus data from creeping in.

hunmonk’s picture

StatusFileSize
new8.32 KB

@earnie: i'll buy that. attached patch uses the new column name.

Anonymous’s picture

Status: Needs review » Needs work
+          db_query("UPDATE {project_issues} SET last_comment_id = %d WHERE nid = %d", $id, $arg['nid']);
+          db_query("UPDATE {project_issues} SET db_lock = 0 WHERE nid = %d", $arg['nid']);

The above can be change to:

+          db_query("UPDATE {project_issues} SET last_comment_id = %d, db_lock = 0 WHERE nid = %d", $id, $arg['nid']);
hunmonk’s picture

Status: Needs work » Needs review
StatusFileSize
new8.25 KB

whoop. good catch. lost the forest for the trees... ;)

dww’s picture

StatusFileSize
new8.28 KB

#51 no longer applied.

dww’s picture

StatusFileSize
new8.28 KB

Re-roll with some minor whitespace and comment formatting/phrasing cleanups. Otherwise, the code looks good to my eyes. I haven't tested yet.

dww’s picture

StatusFileSize
new8.28 KB

hunmonk just committed a fix for something broken in project_issue.install i pointed out which was unrelated to this patch. re-roll that still applies cleanly. ;)

dww’s picture

Status: Needs review » Reviewed & tested by the community

limited testing seems to work just fine. i've found some other problems with IFAC, but those don't belong in here. hunmonk, if you're happy with this, feel free to commit...

hunmonk’s picture

Status: Reviewed & tested by the community » Fixed

committed to HEAD. thanks everybody for the help and guidance!

Anonymous’s picture

Status: Fixed » Closed (fixed)