Closed (fixed)
Project:
Universally Unique IDentifier
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
27 Aug 2010 at 19:29 UTC
Updated:
26 Jan 2012 at 08:29 UTC
Jump to comment: Most recent file
Comments
Comment #1
recidive commentedIt looks like the UUID generation algorithm isn't clash proof.
Can you tell more about the platform you're running it?
Comment #2
markabur commentedThanks, Centos 5 Linux 2.6.9-023stab048.6-enterprise and MAMP too. I am seeing it about half the time when I do a replace with search/replace scanner.
Comment #3
recidive commentedYou using mysql right? Could you tell me which version?
This may be a bug in mysql, since this is being used to generate the UUID.
Comment #4
markabur commentedYes, MySQL. I'm seeing this with 5.0.45 and with 5.1.44.
Comment #5
markabur commentedPossibly related: the admin/settings/uuid page currently has everything unchecked... is the module supposed to even be doing anything in this case?
Comment #6
recidive commented@markabur, can you test the 6.1.x-dev version and see if you can still duplicate the problem?
Thanks.
Comment #7
markabur commentedIt's the same. I've done a little troubleshooting and learned some things.
- To test, I'm using search and replace scanner to replace a string with itself
- I've added dsm() at lines 63 and 67 to see what's going on.
- When I search for "manager" I get results from 18 nodes
- When I replace with "manager" I get warnings for 4 of those nodes
- When I do the exact same replacement, I get warnings for the other 14 nodes
- When I do the exact same replacement, I get warnings for the original 4 nodes
- When I do the exact same replacement, I get warnings for the other 14 nodes
- When I do the exact same replacement, I get warnings for the original 4 nodes (etc.)
- The warning always comes from the TRUE part of the test, when the existing $node->revision_uuid is getting added.
So, it looks like the $node->revision_uuid is set when the node is loaded, then the warning occurs when that uuid is attempted to be used for the new revision. Is there a reason for reusing the uuid here, instead of creating a fresh one for every revision?
Also, it seems like the uuid_node_revisions table shouldn't get updated at all if the node isn't in $automatic_types.
Thanks,
Comment #8
recidive commented"Is there a reason for reusing the uuid here, instead of creating a fresh one for every revision?"
Yes, the reason is that one would want to generate the uuid for the node/revision and pass it directly to the node_save() method.
Apart from that, there may have another check we could make to ensure such cases?
It's interesting, though, that no one else have complaint about such problems. Could search/replace scanner module be doing something not usual with the node api?
Thanks for investigating this.
Comment #9
markabur commentedHm, yes, if I edit a node normally with revisions turned on, then the warning never comes up, because the module always hits the "else" part of the test and creates a new uuid for the new revision. If I inspect the node object right at the beginning of the "update" block, I can see that:
- When I edit the node normally, the revision_uuid field never appears in the node object, so a new uuid is always created for the new revision.
- When I use search/replace, the revision_uuid field does appear in the node object, which triggers the "reuse the existing uuid" test and subsequently causes the warning that the uuid is not in fact unique (which is not surprising given that we're trying to use the same uuid for two different revisions).
I can see why this exists; if my module creates its own revision_uuid inside of $node, then it will get used:
But when a new revision is saved then this uuid will be non-unique since we just pulled it from the database.
Scanner module does a node_load() followed by a node_save(), so it's easy to see how the error would come up there. I don't know enough about nodeapi to understand how a regular manual update is different than that.
I guess since there's a chance that a pre-generated uuid is not actually unique, then uuid.module ought to test for uniqueness before inserting into the database anyway -- right now it is checking to see if it's in a valid format but not to see if it's actually unique.
Comment #10
recidive commentedRight, that's exactly due to the node_load() followed by the node_save() call.
In that case, I think it makes sense doing a check to see if the uuid is already in use. The questions that pop up are a) what should we do in the case the user supplied a uuid that's already in use? and b) What would be the performance hit by running one more query per insert in batch UUID generation processes?
For "a" I think we could either generate a new one or bypass the insert, since the developer can do the check inside a transaction to ensure data integrity, anyway. And for "b", what do you think about that? I would like to see other people opinion on this.
Thanks!
Comment #11
markabur commentedSo, I suppose that most any sort of programmatic node-updating script would have the same problem.
For (a) I suppose it depends on the use-case; is it more important to get a uuid inserted, period, or would it be better to give a friendly message saying what didn't happen? I think it makes more sense to go ahead and create a new if needed, but otoh if I was trying to insert my own uuids for whatever reason, I might not like that.
For (b), I dunno about the performance hit but can see why we'd want to avoid another query if possible.
One way around the immediate problem is to check if the uuid we're trying to insert is the same as the one we might have loaded:
This would solve the problem at hand but doesn't address the bigger question of how to deal with dupes that come from outside the module.
Comment #12
rj commentedI am receiving a similar error message every time I save a node:
user warning: Duplicate entry '688' for key 1 query: INSERT INTO uuid_node_revisions (vid, uuid) VALUES (688, 'c64a6ff6-c050-11df-ab8e-d1ba30c340f7') in /home.../uuid/uuid.module on line 65.Is this related, or should I post as another issue?
Comment #13
arski commentedHey guys,
this might not be directly a solution to the issue here, but more like a feature request which is directly related:
How about adding something like :
whenever a new uuid is generated (where uuid_unique_uuid() would be a simple validation function that checks in the DB if the new UUID is unique). I heard that this should never happen with MySQL-based generation, but would be extremely useful for the fallback random method.
PS. The implementation might not be the best idea as it would allow a small delay between the check and the insert.. so maybe some loop based on catching errors from the insert statement would be safer.. but either way for now I would be happy to hear your comments on this.
PPS. Would be willing to provide a patch if you think this is a good idea.
Thanks,
Martin
Comment #14
markabur commentedRe #12, it's a similar error but I have a feeling it's different enough to be something else. I was getting a warning that the UUID isn't unique, and it was happening with search and replace module, wheras you're getting a duplicate VID and it's happening on normal node saves.
Re #13, not my call but yeah that's the sort of function that could be added if we wanted a more robust "ensure uniqueness no matter what" feature in this module.
I'm done with the project where we were seeing the error, and the quick-fix in #11 worked fine for us, so this issue is pretty much off my radar at this point.
Comment #15
recidive commented@markabur can you please send a patch for your solution in #11? I think it's good enough and a complement for the external uuid feature. And I'm pretty sure this will also solve other cases when modules do a node_load() before node_save().
@arski your points are very sound as well, we can work this out on a separate issue, IMO a complete solution would be having a check for duplicates only when falling back to using the random generation (assuming MySQL will never generate duplicates unless buggy). Also as you outlined, factoring the check out as a separate function uuid_unique_uuid() give developers a way for checking if their uuids are unique before trying to set their own UUID. Could you please file another issue and attach your patch?
@rj yeah, this looks like the same issue, you may be using a module that does a node_load() followed by a node_save().
Thanks guys!
Comment #16
markabur commentedPatch attached. This won't fix #12 as we are only checking if the uuid is unique..
Comment #17
arski commentedHey, created a new issue for that in #915118: Make sure new UUIDs are unique - will try to provide a patch asap. Thanks.
Comment #18
apotek commentedThis is what I ended up doing. We were seeing cases both for situations where the vid would cause a conflict, as well as the uuid, as both columns are defined as unique.
I *highly* doubt that any messages caused by duplicate entries in the uuid column have anything to do with a uuid-generating logic happening to create an identical uuid. The issue is that we are doing an update, and if there already is a row in there, it needs to be updated, not inserted.
So to my mind, we know we can only have one row with our revision number, so let's just delete any pre-existing rows with that vid first.
Also, and this is important, in my opinion, the shipping logic inserts a new uuid into the table on update, but it doesn't assign it to the node, so the new uuid is not available as a $node property for any modules which might be looking for it after the update. So I assign a new uuid to the $node to make it available to Drupal before inserting.
Thus:
For reference, the dev version of the update sequence in nodeapi:
Anyone see any huge flaw in my reasoning here? Ready to roll a patch if not.
Thanks.
Comment #19
apotek commentedupdate/save/delete operations are rare, so performance in these areas is less of an issue than what happens in node load.
Accordingly, I would also like to patch that, from:
To:
That should present our single biggest performance improvement regarding nodes--especially in views of long node lists.
Comment #20
apotek commentedAny thoughts about whether using drupal_write_record() with the update flag passed for both fields would fix this? It's the recommended D6 API for inserting data.
Comment #21
recidive commentedI've commited the patch in #16. Thanks @markabur!
@klktrk, I think drupal_write_record(), don't buy us anything here. IMO uuid records should never be updated, since they are basically id, uuid pairs.
I think though it's worth getting node and node_revision uuids from a single query. Can you please file another issue for that, and attach a patch for the solution you've outlined in #19?
Thank you guys!
Comment #22
apotek commentedthat sounds right to me, and I note that there is no check for that in the update $op of the nodeapi.
Further, thinking about this more, it doesn't seem to make sense to me that one would *ever* update the UUID for a revision. If the revision id has not changed, why change the UUID for it?
I think what I'm proposing is that we should just *rip out* the code that's in the update $op. Why do we need it?
Once a revision has a UUID attached to it (which we see in the 'insert' op), there's no need to write to the uuid_node_revision table again, unless it is in order to delete the revision or node.
Also, there's no test in the update $op for whether the node type should even be considered.
Shouldn't we just be doing this, and be done with 'update'?:
Comment #23
apotek commentedComment #24
markabur commentedI still agree with this -- my site is in the odd state of having 920 rows in uuid_node_revisions but nothing at all in uuid_node.
Yeah, I was wondering that myself, actually... I mean, the whole point of a uuid is to uniquely identify something, so, once you have that why would you want to change it to a different identifier that's no more unique than the previous one?
Comment #25
apotek commentedmarkabur said
Please see #915254: revision uuids are not deleted upon deleting all revisions where I note that uuid_node_revisions cannot be deleted upon node deletion. That might be the source of your problem. There's a patch there, but it's extensive, since we need to add the nid column to the uuid_node_revisions table.
Comment #26
apotek commented@markabur
If you, like me, think there's no reason a UUID should ever be updated for a revision once one is generated, you might be interested in this patch.
1. It removes the 'loaded_revision_uuid' hack since we don't need it in this version.
2. It checks to see if there's a vid for the node in the uuid_node_revisions table. If there is, it doesn't do anything in the update $op.
I've been testing this for a bit now. the more I think about it, we *should* actually never do anything in $op. If we need to do anything there, it suggests that there's a bug in some other part of the process.
Comment #28
apotek commentedPosting a third version of my take on how to fix this issue, in case anyone is interested.
This is rolled against the latest version of uuid 6 in dev.
Comment #29
apotek commentedwith a small bug fix, latest version is uuid-895622-v4.patch.
maybe i should add that i still don't think it makes sense to update a revision's node uuid *ever*. otherwise we're versioning revisions. once a revision is created, it has a vid, and a uuid, and that should never change. Even if the contents of the revision changes, it is still the same revision, until a new one is generated.
If you really want to create some sort of identity for each version based on its contents, I think we'd want to serialize the node object and md5 it and use the hash as its unique id.
My patch implements the idea that uuids for node revisions should never change once created for a revision.
Comment #30
apmsooner commentedthe v4 patch fixes my problems with the duplicate errors. Thanks apotek
Comment #31
q0rban commentedThis is still causing problems. The logic isn't completely thought through. Here's an updated patch. I need to sleep on this and re-read it tomorrow though, it's a pretty complex problem to solve, (lots of possible scenarios to account for on update).
Comment #32
recidive commentedI'm willing to commit the patch in #29.
But want someone else to test this first.
@q0rban: can you explain what the patch #29 lacks and what your one does?
Comment #33
recidive commentedDoes the patch in #29 also fixes #934142: Node revision clash on node edit or new node save - not related to node deletion?
Comment #34
q0rban commentedI was hasty in posting #31, I just wanted to get some more eyes on it. A quick perusal of it now shows that it needs work.
The other patches above don't address a few situations:
1.) A node was saved (inserted) before UUID module was enabled. Node UUID should be generated on update.
2.) Node insert allows programmatic setting of node UUID for non-automatic types. Update should honor this functionality as well, or we should remove the support for it in insert.
3.) Node update should allow for a programmatic setting of revision UUID in the same manner as insert.
I'll put some more work on this tomorrow, but I feel pretty confident that this the patch in #31 is a more complete solution to the problem (once I fix a few issues with it). I'd like to get apotek to look at it first too, as he understands the problem very well.
Comment #35
recidive commentedOk, thanks for the explanations. In regards to the point #1, I think it doesn't matter much, as there's an option to batch create uuids for objects that don't have it yet. There's this issue to improve synchronization: #560896: Automatically synchronize when settings are updated.
Thanks for working on this.
Comment #36
q0rban commentedOk, here is a re-rolled and tested version of the patch.
> In regards to the point #1, I think it doesn't matter much, as there's an option to batch create uuids for objects that don't have it yet.
Yes, however scenario one is pretty trivial if you are supporting scenario 2, so might as well include it. :)
This patch takes care of the following scenarios:
1.) A new revision is being created on a node with a valid uuid. Generate and save a new rev UUID.
2.) A new revision is being created on a node with a valid uuid and a valid revision_uuid has been set programatically. Save the valid rev UUID.
3.) A new revision is not being created, but uuids and rev uuids do not exist in the database for this node and the node type is an automatic type. Generate both UUIDs and save.
4.) A new revision is not being created, uuids and rev uuids do not exist in the database for this node, but have valid ones have been programmatically set on the node. Don't generate new ones, just save the valid ones as is.
5.) A new revision is not being created, a valid uuid exists on the node and a valid rev uuid exists as well that needs to be saved to the database. (Basically, a programmatic update of the revision uuid).
Comment #37
ilo commentedThanks q0rban, I did test the patch and it gets rid of the Duplicate entry database error.
I do have just one little concern about this patch. Why should we consider that a node requires uuid and revision_uuid automatically generated only on insert? I mean..
Am I the only one thinking that if would require an else that should create, save and return uuid and revision_uuid automatically if it does not exist in database? should we only return the value when the content type has the automatic uuid option enabled? or should we only load when the node has uuid, and create if node has not uuid when the option is enabled? Am I wrong or something? At least is the way insert is going to work for now:
Right now the functionality testcase included in the patch at #927778: Complete simpletest testcases for taxonomy and comments (comment #1) fires the duplicate entry error. When appliying this patch, this error is gone, however still there are other errors to fix when loading nodes without uuid when automatic uuid option is set for that content type.
This patch only covers inserting uuid when the content type has automatic uuid enabled, even for nodes not having uuid or revision_uuid set. Should we cover the loading case also here?
Comment #38
q0rban commentedHey Ilo, thanks for testing the patch.
"Should we cover the loading case also here?"
In my opinion, the issue you raise is outside the scope of this issue. Perhaps we should create a separate issue for it?
Comment #39
ilo commentedThanks to you for the patch!
About opening a new issue, I'd say so, q0rban, actually there is an issue open as 'feature request' that perhaps deserves more attention and hold that: #822712: on-demand uuid generation instead of automatic
It is talking about having some nodes with and without uuid or revision_uuid even if they have the same content type, something that is not covered by the automatic generation of uuid/revision_uuid introduced by this patch in the insert hook.
Comment #40
recidive commentedI was going to commit the patch but I think there's still room for improvement.
1. The 'load' section of hook_nodeapi() will continue to 'insert' if there's no uuid for this node, since the return statement is inside an if, and there's no 'break' statement. Is that as designed? I think it's better having a 'return $record ? $record : NULL' construct there.
2. The patch introduces a lot of variables, can we get rid of some, e.g. the $uid_in_db variable is used only once, so no need for it at all. Also we should use API functions when possible like node_get_by_uuid() and the like.
Please let's work on a streamlined patch and I'll commit it fast.
Thanks!
Comment #41
q0rban commentedOh, very good catch on #1.
as far as #2, I was just trying to make it as readable as possible, as the logic is pretty complex without it. The variables explain what the logic is doing, so it's helpful in determining why it's there.
Comment #42
q0rban commentedComment #43
ilo commentedOk, uuid is now in the testbot queue, patches will be tested against the current testcase that recidive committed at #927778: Complete simpletest testcases for taxonomy and comments. I'd say that also new patches should include their own tests in some cases, or re-design the testcase to cover them.
Let make the issue queue roll a little bit!
Comment #45
ryumkin commentedsubscribing
Comment #46
gabilu commented#16: uuid-895622.patch queued for re-testing.
Comment #47
polPatch #29 rewrite for working with UUID 6.x-1.0-beta2
Comment #48
polUpdated version of the patch for UUID 6.x-1.0-beta2 and also adding some usefull API functions from http://drupal.org/node/858274
Comment #49
recidive commented@Pol, beta2 is outdated, patches need to made against CVS branch DRUPAL-6--1.
Comment #50
polWe needed it @ work, we can't use dev version for now :(
Comment #51
polUpdate patch path.
Comment #52
q0rban commentedThe patch in #42 is stable, the tests are failing because they need to be updated.
Comment #53
recidive commented@Pol, patches are only applied to the dev BRANCH, can't apply the patch to a TAG.
@q0rban, great! Could you point where you think the tests problems are? Or better, add a fix for that to your patch ;)
Comment #54
ilo commentedAlso, never put D6 in the patch name, this special keyword now instruct the testbot to ignore the patch. The patch should be rolled to the version the issue is tagged on for the testbot to apply, in this case it is 6.x-1.x-dev.
Comment #55
ezra-g commentedOne thing that hasn't been mentioned so far is that checking isset($node->revision) isn't really a correct check. $node->revision is set to 0 when there is no new revision being submitted, and the value of 0 will cause the isset() check to return TRUE regardless of whether we're updating a node with a new revision or not.
I wonder if we are overlooking a use case here. Wouldn't we want to create a new uuid for a new revision for a node that already has a uuid but is not of a type that should get uuids automatically generated?
This might be the case when a feature exports a node by uuid but the site doesn't have that content type configured to assign a uuid to *every* node of that type the site, which would make sense if you only wanted to assign uuids to content being imported or exported.
Indeed. Since, per my comment above, we need to check the current uuid for a node and we want to load this information consistently, I've created a new API function - uuid_node_load - that uses apotek's suggested single query from #19.
This re-roll also adds some code comments.
In my testing this patch alleviates the error and creates a new unique uuid for new node revisions, and not when a revision is not being created.
Comment #56
ezra-g commentedI neglected to use the new api function in hook_nodeapi() case 'load' ;)
Comment #58
q0rban commented@ezra-g, good catch. I would recommend starting from my patch in #42, and change the isset to a simple check on $node->revision. As for your other point, I believe that patch addresses that use case, where there already is a revision uuid, and generates accordingly.
The reason all these patches fail is because the tests have some bugs in them.
Comment #59
kndrThanks for patch #42 and notes in #56! I've changed isset() to !empty() since it makes significant difference inside code's logic. I've found one more bug inside this piece of code. We need to use array_filer() over $automatic_types array since variable $auto_create has incorrect value, when not all content types are selected. This causes empty rows inside UUID database tables. I am attaching the patch, which is modifed version od #42.
Comment #60
q0rban commented> We need to use array_filer() over $automatic_types array since variable $auto_create has incorrect value, when not all content types are selected.
Ah yes, another good catch. Actually we could just force in_array to be strict instead, but either way works fine.
Comment #62
ezra-g commentedI've changed array_filter to be strict in this patch, which is a re-roll of 56, that first checks for whether $node->revision_uuid is set programatically before fetching it from the database. I think this patch will achieve the same functionality in #56 and its predecessors, but it seems like the logic is simpler in the present patch. I'm obviously biased because I wrote it - happy to discuss in IRC to clarify!
Comment #63
ezra-g commentedJust noting that this is something we'd like to fix for COD alpha3.
Comment #64
recidive commentedHi guys, I'm on a trip right now. I'll review the issue queue when I get back next week.
Thank you for working on this.
Comment #65
kndr#62 looks simplier but I really don't know if functionality is the same :) There is one drawback of using strict in_array() - you should patch the line inside case 'insert':
from:
elseif (in_array($node->type, $automatic_types)) {to:
elseif (in_array($node->type, $automatic_types, TRUE)) {Comment #66
ryan_courtnage commentedsubscribe
Comment #67
ZoFreX commentedSubscribing
Comment #68
rmontero commentedComment #69
ezra-g commentedChanging title back.
Comment #70
mikeryanThere's another scenario for q0rban's list in http://drupal.org/node/895622#comment-3705340:
6. Regardless of the revision situation, there is a uuid value on $node that does not match that in uuid_node (i.e., the caller is trying to change the uuid). uuid_node needs to be updated to the new value.
You may ask, why would anyone want to change the node's uuid? Well, the context is that we're migrating nodes from a central site to one or more satellite sites, and we want the migrated copies of the node to have the same uuid in all environments. Our problem is that somehow discrepancies have occurred - target nodes don't have the same uuid as their source. We would expect remigrating the nodes (setting the source uuid in $node and calling node_save()) to fix things, but this doesn't work - the nodeapi hook does not update uuid_node.
I need to give this some more thought before submitting a patch, but wanted to throw the issue out there...
Comment #71
mikeryanOK, the attached patch handles uuid changes on the node, and also cleans up the simpletests and fixes the lack of {} in the uuid_node_load query.
Comment #72
q0rban commentedAll the latest patches are missing some of the checks that occur in #42, but adding the simpletests will make this a lot easier. I will try to update the patch today to incorporate the fixes and updates from mikeryan and ezra-g's patch with the extra checks in mine. :)
Comment #73
q0rban commentedOk, I've glued all the above patches into one crazy awesome patch that addresses all of these use cases:
1.) A new revision is being created on a node with a valid uuid. Generate and save a new rev UUID.
2.) A new revision is being created on a node with a valid uuid and a valid revision_uuid has been set programatically. Save the valid rev UUID.
3.) A new revision is not being created, but uuids and rev uuids do not exist in the database for this node and the node type is an automatic type. Generate both UUIDs and save.
4.) A new revision is not being created, uuids and rev uuids do not exist in the database for this node, but have valid ones have been programmatically set on the node. Don't generate new ones, just save the valid ones as is.
5.) A new revision is not being created, a valid uuid exists on the node and a valid rev uuid exists as well that needs to be saved to the database. (Basically, a programmatic update of the revision uuid).
6. Regardless of the revision situation, there is a uuid value on $node that does not match that in uuid_node (i.e., the caller is trying to change the uuid). uuid_node needs to be updated to the new value.
Tests are written for each of these as well. /me passes out
;)
Comment #74
q0rban commentedSee above ^^
Comment #75
ryan_courtnage commentedJust tried out this patch. With it, I can confirm that the "Duplicate entry" warnings are now gone ... everything seems to be working okay.
Comment #76
kndr#74 works good for me. Thanks!
Comment #77
apotek commented#74 is good for me to. Thanks also for working on the tests! We need this patch for a project we're working on so it would be great to get this committed asap.
Comment #78
q0rban commentedComment #79
btopro commented+1 for commit, going to merge this into a development stack right now since I get this about 200 times when I import a book structure with node_export... though I ran my import/export and I'm still getting the warnings after using the UUID dev as well as applying the monster patch from #74
Comment #80
btopro commentednever mind my previous comment. I did have to uninstall node export / UUID and then the import worked without all the error code
Comment #81
recidive commentedCrazy awesome patch committed!
Thank you guys!
Comment #82
ryan_courtnage commentedFYI - this patch had a bug. New issue & fix here: http://drupal.org/node/1065364
Comment #83
fxmasa commentedyes, thanks! it is ok!
Comment #85
roball commentedToo bad that this fix is still not available in an official release. I am using 6.x-1.0-beta2 and get hundreds of these errors in the dblog.