I've been doing more data imports recently, and core makes it painful to import when you want to control the uid and nid of the new records. Core automatically assumes that if you pass a nid or a uid to the save() function, then you performing update. The attached patch removes that assumption so that migation scripts can use node_save() and user_save(). All existing forms and scripts are unaffected.
Migration scripts get the new behavior by setting $node->is_new or $user->is_new before calling save(). This was formerly an internal property implemented just by node_save(). So I now expose that property and unify the logic between node_save() and user_save().
The change to drupal_write_record() is needed because currently that function refuses to accept a value for a serial field. See http://drupal.org/node/169982#comment-597094 for the justification. But we were too cautious there IMO. This is a special case and we should leave it up to the caller to handle the auto-increment if he performs special operations like these. Anyway, in my tests mysql actually increments the serial to one more than the nid/vid that was passed so this caution is not needed.
Comment | File | Size | Author |
---|---|---|---|
#45 | 303965-ignore-empty-serial-on-insert.patch | 846 bytes | Damien Tournoud |
#39 | node_save.patch | 7.92 KB | moshe weitzman |
#35 | import.patch | 8.8 KB | moshe weitzman |
#31 | 303965.patch | 8.85 KB | moshe weitzman |
#29 | mw.patch | 9.4 KB | moshe weitzman |
Comments
Comment #1
stewsnoozeI think this is a good idea. I've reviewed this patch and it looks fine.
Comment #2
moshe weitzman CreditAttribution: moshe weitzman commentedI tested this by pasting the following code into the Execute PHP block provided by devel.module. A small script would work just as well. The node was created just fine, with the specified nid.
Comment #3
bjaspan CreditAttribution: bjaspan commentedUnless I'm mistaken (and I haven't tested it), I think this will break on pgsql. pgsql serial columns are not really auto-increment; they are "use the next value from the attached sequence." If the sequence is at 10 and you insert 15, then 5 inserts later you will get a primary-key collision.
But I'd have to try it to be sure.
Comment #4
moshe weitzman CreditAttribution: moshe weitzman commentedI mentioned earlier that this is for developers who know what they are doing. They are responsible for setting the pgsql sequences after the import.
Comment #5
stewsnoozeI think in most import scenarios developers will be using sequence ranging techniques where the Drupal sequences are set higher than any possible value from the import. Otherwise I don't see an easy way how they could possibly avoid primary key collisions. I know that's how I do my imports. This patch is important for enterprise users as well as people who are migrating from other systems but need to retain IDs in some way. You can control most things in Drupal, not being able to control the IDs seems wrong.
Comment #6
alex_b CreditAttribution: alex_b commentedI like the approach. This will help us out a lot with migrations where node or user ids matter.
#3 - I'm not worried about collisions within node_save or user_save, they can be handled on a higher level IMO (see #4). I think this behavior and the possibility of collisions should be documented though (in comments? node_save() doesn't have any parameter comments at all).
Ran a quick test. Creating a new user fails with some notices: http://skitch.com/alexbarth/i15q/users-localhost
The culprit seems to be where
is substituted by
Comment #7
Dave ReidJust some small code reviews for simplicity...
$node->is_new = empty($node->nid) ? TRUE : FALSE;
should be$node->is_new = empty($node->nid);
$account->is_new = empty($account->uid) ? TRUE : FALSE;
should be$account->is_new = empty($account->uid);
Comment #8
moshe weitzman CreditAttribution: moshe weitzman commentedI have fixed the issues raised by Alex and Dave (thanks!). I also ran the node and user tests and identified a minor issue which is also fixed.
Comment #9
moshe weitzman CreditAttribution: moshe weitzman commentedI forgot to mention that I also added Doxygen for node_save().
Here is a fragment for importing a new user with a specific uid.
Comment #10
Aron NovakI tested the patch (on MySQL) and it seems to be fine.
#9 - the ease of import is nice
Comment #11
Aron Novakforgot to change the status
Comment #12
moshe weitzman CreditAttribution: moshe weitzman commentedReroll to remove fuzz
Comment #13
webchickOooo. I like this patch very much. Some nits:
* Since you've done an awesome job unifying these two functions, we should also make sure their PHPDoc reads symmetrically as well. One says something about is_new being FALSE and the other says something about is_new being TRUE.
* Please include POC code in the patch in the form of a test. One "is_new" for $node and one for $user should suffice. In node.test and user.test, respectively.
If by chance someone sees this and is around tonight, this can make it into UNSTABLE-2. If not, then UNSTABLE-3 for sure. :)
Comment #14
snufkin CreditAttribution: snufkin commentedadded two tests, for node and user module respectively to test if creating arbitrary content with is_new succeeds, and it seems so :)
also changed the documentation so it is worded the same for node_save and user_save when its talking about the new stuff being created.
Comment #15
webchickOk, since this is your first test I'm going to give you the Merciless Nitpicking Webchick Review (tm). ;) *cracks knuckles*
a) Please add a line of PHPDoc above there that summarizes what the test case does.
b) Rather than name this NodeImportTestCase, how about NodeSaveTestCase. What you're calling "Import" here is actually just one facet of testing the node_save() function, and we could potentially expand this test case out to cover other uses as well.
That means these lines need to be updated. Should be node_save() (with the parentheses, to signal that it's a function).
a) All inline comments should start with a capital letter, and end with a period.
b) Replace the comma with a semi-colon.
Normally, you only need to do this stuff in setUp() if it will be used in multiple testX functions, and this test case only has one. However, since we're planning on eventually using this test case for the rest of node_save(), this code is fine.
I would use $this->randomName() instead. That's more standard with what other parts of SimpleTest do. Check the function drupalCreateNode().
Same thing here, and also you have spacing issues (two spaces to the left of the = and none to the right)
I'd take out those comments.
a) Here as well.
b) $node->vid gives a notice if not provided? That's not right. Sounds like a bug. Can you see if node_save() generates the notice without this patch? This strikes me as something we ought to fix as well in this patch if we're trying to make things easier for import scripts; hard-coding the vid is not an option in the 'real world'.
c) You shouldn't hard-code this to a number that might already exist in the database. I have my worries about $test_nid in that respect as well.
Should be "Now, let's test this import." But how about just "Test the import." :)
a) $node_test is not very obvious. Let's call this $node_by_nid or something.
b) Can be shortened to node_load($test_nid);
While your assertion is fine by the traditional unit testing rules, you'll find that because of the way SimpleTest module displays messages, we tend to make the message reflective of what the test is testing, rather than the message it should display on failure. So change to something more neutral like "Node loaded based on node ID."
See above. All of these comments also hold true for user.test as well.
$this->randomName(). Probably for password too.
Aw. :( Sad panda. :( Let's think about a follow-up patch that would make this match node_save($node).
a) Can be shortened to user_load($test_uid);
b) Again, $test_user not very clear.
c) Do we need to load by name as well?
Comment #16
snufkin CreditAttribution: snufkin commentedThanks a lot for the thorough review! vid does not give a notice if omitted, i must have mistaken it with something.
Changes in this patch:
* added doc for the test cases and renamed them
* changed description and names
* comments corrected
* node and user object/array are now built based on drupalCreateNode() and drupalCreateUser()
* variable names are cleared up for the testing phase
* assert messages are fixed
* user is being tested three ways, as return from user_save, loading by uid and loading by name
Comment #17
snufkin CreditAttribution: snufkin commentedchanging to cnr
Comment #18
moshe weitzman CreditAttribution: moshe weitzman commentedI've code reviewed and ran the provided tests. No failures for either so we are RTBC.
I'd just like to address one of webchick's items:
In my testing, the caller just has to set $node->vid to something in order to avoid a notice. I know that because I first tried this without setting $node->vid. The actual value is not important, and node_save() will create a new vid and save it in node and node_revisions tables. So this is not a bother for import script writers. Further, fixing this is out of scope IMO (if it really is a problem, it is extremely minor).
Comment #19
webchickThat seems strange. I've programmatically created all kinds of nodes in my day, and have never had to set vid, which made me think that perhaps this notice was introduced in this patch. And setting it to a static value, in any case, looks buggish. But I'll have to take a closer look next week when I get back.
Comment #20
snufkin CreditAttribution: snufkin commentedA bit more on the notice. the exact error message is:
It only appears when one tries to send a node with is_new and nid set, without the patch. After applying the patch the notice does not appear anymore. I think this is related to the part in the patch where it does
unset($node->vid);
, i will try tracing the error when i have some more time.Comment #21
moshe weitzman CreditAttribution: moshe weitzman commentedThat NOTICE no longer happens. I retried the $node listed at the top of this message, without a vid and it imported without a notice. Here is a reroll.
Comment #23
lilou CreditAttribution: lilou commentedSee: #335122: Test clean HEAD after every commit and http://pastebin.ca/1258476
Comment #24
Kevin Hankens CreditAttribution: Kevin Hankens commentedsubscribing :)
Comment #25
moshe weitzman CreditAttribution: moshe weitzman commentedThis was RTBC before the great testbot debacle.
Comment #26
brainski CreditAttribution: brainski commentedCan this patch also be applied to Drupal 6? Or is there a way to use this genius functionality with drupal 6?
Comment #27
moshe weitzman CreditAttribution: moshe weitzman commentedWe are using it successfully on D6 on Economist. The patch is a slightly different for D6 - not sure I can extract the right bits in order to post here.
Comment #29
moshe weitzman CreditAttribution: moshe weitzman commentedFixed up a failing test due to node_load_multiple() patch. Back to RTBC.
Even with a cvs up, I am still seeing an *unrelated* test failure in testAliasTranslation(). The french language is not getting fully enabled.
Comment #30
catchI can't reproduce the path.test failure with the patch applied, so looks good since the test bot is also happy.
Comment #31
moshe weitzman CreditAttribution: moshe weitzman commentedReroll - no changes.
Comment #33
catchComment #34
catchThese changes look fine. Minor point - we no longer have phpdoc for getInfo/setUp, so CNW for that.
Comment #35
moshe weitzman CreditAttribution: moshe weitzman commentedReroll plus catch's suggested doxygen fix.
Comment #37
catchThese changes look fine, the new tests are good and it's hopefully a small step away from the ugliness in user/node_save()
Comment #39
moshe weitzman CreditAttribution: moshe weitzman commentedreroll. this was rtbc for 3 weeks and bit rotted. so i will rtbc this again once it passes testing. no changes.
Comment #40
moshe weitzman CreditAttribution: moshe weitzman commentedComment #41
webchickWow, I'm really sorry! I have no idea how this managed to fall off of my radar for this long. I guess my RTBC clean-up spurts and testing bot freak-outs were badly timed. :(
There was a couple minor formatting things I cleaned up and... committed to HEAD!
Marking "needs work" because this needs to be updated in the 6.x -> 7.x upgrade docs.
Comment #42
moshe weitzman CreditAttribution: moshe weitzman commentedNo problem webchick. It took me quite a while to reroll.
I added docs to the chronological page. I can't quite stomach adding to category page because it thats a new level of dumbness to post duplicate content when we have a whole CMS and categorization system at our disposal. We would never inflict such a system on our clients. Perhaps I misunderstood and there is no duplication but it looks like it at first glance.
Comment #43
webchickYeah, the second (categorized) page is only there for Coder module developers to coordinate on the 6.x => 7.x upgrade path checks. They're the ones keeping that page in sync. Once code freeze is done, we'll likely switch and make that the default page, but until then it's useful to have the items chronologically listed.
Comment #44
Damien Tournoud CreditAttribution: Damien Tournoud commentedThe change in drupal_write_record() broke some of the tests on (at least) SQLite. The reason is that some tests in Drupal pass drupal_write_record() a NULL value for a serial column (this is the case during the Text field test, for example).
This is unsupported on both PostgreSQL and SQLite, but MySQL will silently consider a NULL value as a synonym for a non present value.
Comment #45
Damien Tournoud CreditAttribution: Damien Tournoud commentedHere is a patch that fix the issue. I'm not completely convinced this is the way to go, and I'm hoping for some feedback.
Comment #46
yched CreditAttribution: yched commentedPatch in #445094: broken d_w_r() call on non-MySQL db in field_test.module should fix the problematic d_w_r calls in field_test.module
Comment #47
moshe weitzman CreditAttribution: moshe weitzman commented@Damien - does that just put back some code that I removed? I'm pretty sure that I had to remove that in order to fix some edge case. Lets fix the text field in yched's issue and leave this alone unless there are more problems.
Comment #48
moshe weitzman CreditAttribution: moshe weitzman commentedActually Damien's patch looks harmless if we want a quick fix for the test suite. It is debateable whether this should be fixed here or at the call sites.
Comment #49
Damien Tournoud CreditAttribution: Damien Tournoud commented@Moshe: I agree that this is not completely the fault of this particular patch (the bugs are in functions calling d_w_r() with invalid values). Let this issue die in peace. Josh opened #445214: dwr() fails when a NULL value is passed in a serial column (on PostgreSQL and SQLite), in which we will study how to correctly fix that.
Comment #50
yched CreditAttribution: yched commentedIndependantly of #445214: dwr() fails when a NULL value is passed in a serial column (on PostgreSQL and SQLite), I think the d_w_r() changes in this patch introduced a problem in the node save sequence.
Take the following code, handling node creation (roughly simplified from the actual code in node_save):
Incoming $node has no nid or vid property yet.
It comes out of the 1st d_w_r('node') with
- nid = [next nid] (because nid column is serial)
- vid = 0 (because vid column has default 0)
Then on SQLite, d_w_r('node_revision') writes vid = 0, instead of vid = [next vid] (previously, the incoming 0 value on the serial vid column would have been ignored). Thus, the {node}.vid value remains 0.
On a subsequent node creation, the first d_w_r('node') will fail because of duplicate vid = 0.
This is causing the failures webchick mentions in #445094-2: broken d_w_r() call on non-MySQL db in field_test.module
[edit: made the description above a little clearer]
Noe that MySQL properly inserts vid = [next vid].
Comment #51
Josh Waihi CreditAttribution: Josh Waihi commented#45++ I've tested and it fixes a large portion of what is broken on HEAD with PostgreSQL. Can we get this commited?
Comment #52
moshe weitzman CreditAttribution: moshe weitzman commentedwe already said that we would decide how to fix breakage at #445214: dwr() fails when a NULL value is passed in a serial column (on PostgreSQL and SQLite)
Comment #53
yched CreditAttribution: yched commentedNote that the behavior described in #50 is *not* related to NULL values. See comment #445214-6: dwr() fails when a NULL value is passed in a serial column (on PostgreSQL and SQLite).