Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Hi there,
I was wondering if anyone is getting the same problem as me and a fix for it.
When i get to the last stage of Node Import when importing products, the number of lines continues to grow even and the percentage complete goes up and down all the time.
When i stop the import i get the first few products over and over again. It duplicates the entries.
Any ideas?
Comment | File | Size | Author |
---|---|---|---|
#23 | node_import-lock-765576-22.patch | 4.23 KB | Bastlynn |
#21 | node_import-lock-765576-21.patch | 4.2 KB | deekayen |
#19 | task_lock_fix-765576-2.patch | 4.05 KB | Bastlynn |
#14 | task_lock_fix-765576-4482348.patch | 1.14 KB | Bastlynn |
Comments
Comment #1
myregistration CreditAttribution: myregistration commentedI, too, would like to know how to avoid duplicate product entries. I am using node_import and when I import the inventory containing products that already exist I get duplicate products, it doesn't update current products. I'm not sure why it wasn't written to use SKU as the identifier since it's a required field, correct? Or even better, would be nice for setting to choose which is the unique identifier. Regardless, now it just creates a new product even though the SKU's are the same. I don't see an easy way to delete mass products either, just one at a time which is very time consuming when you are talking 15000+ products. Please advise. Thanks! :)
Comment #2
ch_masson CreditAttribution: ch_masson commentedI cannot answer to the duplicate entry issue but I can certainly help you with how to delete all instances of a specific content type. For that you need to use the DEVEL module at http://drupal.org/project/devel.
It has the feature that you need. Be careful though to select ONLY the content type that you want to delete or else it will wipe out every single piece of content (The "Delete all" box is checked by default)!
Go to admin/content/delete_content.
Uncheck the "Delete All" box and check only the type of content that you want to delete! :)
Then click "Delete" and your 15,000+ products will be gone in no time!
Christian
Comment #3
myregistration CreditAttribution: myregistration commentedThank you for the info! By the way, I'm a total newbie at this point.
It would be nice if there was a checkbox beside listed products with a delete button or some alternative to the all-or-single option.
I found a module called node_import_update that is supposed to assist node_import by updating the products if they already exist, based on it's sku, instead of creating duplicates. Unfortunately, it's not working for me. It goes through all the steps of the import interface until step 8 (start install), instead it goes back to step 1. If I disable the module it works as before, it adds duplicates, but at least it processes. If anyone could debug this module it would be greatly appreciated. Thanks! :)
Answer: Going back to step 1 is an issue with IE and has nothing to do with the node_import_update module, when I use FireFox it seems to work fine.
Comment #4
myregistration CreditAttribution: myregistration commentedThe duplicate entries at end of import is still happening for me. The numerical status of rows imported goes past the actual line numbers and it duplicates nodes.
Comment #5
cherukan CreditAttribution: cherukan commentedSeeing this in RC4. Any suggestions on how to get around? The progress bar goes past the number of records, if I download all imported rows it shows the correct number of rows, but if I look at the data in Content Management/Content, I see duplicate nodes being created.
Comment #6
djevans CreditAttribution: djevans commented@myregistration, @cherukan:
Just had this error myself after upgrading to 6.x-1.1. Running update.php solved the problem for me - I don't have time to recreate the error but could you check if this works for you?
Comment #7
sveldkamp CreditAttribution: sveldkamp commentedupdate.php didn't seem to do it for me, and I'm on 6.x-1.1. I'll post back if I find anything. -Steve
Comment #8
deekayen CreditAttribution: deekayen commentedsubscribe - this just cost my company some money
Comment #9
cYu CreditAttribution: cYu commentedUsing 1.1 as well and having this issue sporadically. Have a csv of about 2500 rows that imports fine most of the time, but in one instance it created 2 nodes for 50 of the lines in the csv. The pairs of nodes were non-sequential and seemingly unrelated.
Comment #10
cYu CreditAttribution: cYu commentedIn my case it looks like the duplicates may be occurring when the import overlaps with a cron run. In my watchdog log I start getting messages of '@type: added %title.' with a location of /cron.php doubled up with the normal node creation messages.
Comment #11
deekayen CreditAttribution: deekayen commentedWe're going to do the opposite of Semiclean and set the cron semaphore variable during our next round of imports to prevent cron from succeeding. It's a bit of a hack, but instead of duplicate entries, we'll get "Attempting to re-run cron while it is already running." in watchdog.
Comment #12
deekayen CreditAttribution: deekayen commentedsomething like this excerpt...
Comment #13
Bastlynn CreditAttribution: Bastlynn commentedI worked out the root cause. Node_import has a hook_cron implementation to kickoff cron on tasks. If a task is completed, it won't attempt to execute the task. But if a task is open (such as for a very long running import of a very large csv file) it will attempt to kick it off a second time. At this point you get duplicated reading in of data and node creation.
The root cause is in the task locking mechanism.
The reason is that the system gets away with this is because node_import_lock_aquire() function is not threadsafe. Static variables do not work that way and are not shared across processes. At the start of the cron kickoff the file is being processed simultaneously.
Solution: Use variable_get and variable_set for locks.
Comment #14
Bastlynn CreditAttribution: Bastlynn commentedPotential patch up for review. See attached.
Comment #15
deekayen CreditAttribution: deekayen commentedWhy set the uniqid at all? Why not just set the boolean? For that matter, why even keep it around in the shutdown function? I'd think you could
variable_del()
it at that point.Comment #16
Bastlynn CreditAttribution: Bastlynn commentedAgreed, the shutdown function isn't the best way, or at least not the clearest way, to trigger the event to clear the lock after the task is complete. To remove it would mean needing to retool the logic for the locking system to catch at the beginning of tasks and the clear at the end of tasks (not necessarily a bad idea) but would require more meddling in other functions.
At that point though, there needs to also be a way to clear or delete locked tasks in case the system crashes in the middle of an import so we don't end up with these incomplete tasks lingering all over the place and unable to process new tasks. It would be worthwhile then to use a lock per task so you could set the system to working on multiple imports without risking duplication or preventing yourself from multiple runs.
Comment #17
deekayen CreditAttribution: deekayen commentedWhat's the point of the lock_id though?
Comment #18
Bastlynn CreditAttribution: Bastlynn commentedRight now - there isn't one (I'm not convinced there was one in the original code either). Once I finish working up a patch to do individual task locking as described in my late night rambling- then there will be. ;)
Comment #19
Bastlynn CreditAttribution: Bastlynn commentedUpdated patch - I saw the comment tracing the history of the locking mechanism as described here, but the local implementation of it for node_import missed a critical element by not pulling data in from the variables table when trying to gain a lock.
I'm debating updating the locking mechanism here to use the formal locking mechanisms in Drupal - thoughts, opinions, pros and cons of doing so?
Comment #20
Bastlynn CreditAttribution: Bastlynn commentedGoing? Going? Gone. I'm pretty content with not using the features in lock.inc for the moment. So this is ready for review and testing - please let me know if you spot issues with the logic being used here.
Comment #21
deekayen CreditAttribution: deekayen commentedfunction node_import_lock_acquire()
looks like it would never return FALSE in #19, signifying that a lock was already in place on the requested task. Here's an un-tested, maybe full of parse errors revision.Comment #22
Bastlynn CreditAttribution: Bastlynn commentedre: #21 - The logic seems functionally the same as #19 but I like this better than trying to stick with the original course of logic for the module. It reads clearer to the general coder and reduces the number of flags stored in the variables table. The patch doesn't have parse errors to correct, it looks good to me.
Comment #23
Bastlynn CreditAttribution: Bastlynn commentedI tweaked the logic on this patch one last time to make sure the lock releases correctly. See attached.