The site I'm currently working on uses feeds to import several nodes that use the same image. Feeds uses the file_save or file_copy functions, which by default rename files if they already exist in the database, which in this case leads to hundreds of duplicates.
It seems to be taking place in the getFile function in FeedsParser.inc. I could add FILE_EXISTS_REPLACE to the function arguments, but that will obviously be altering the way feeds handles files globally rather than in this one instance. Is there already a good way to handle this, or would something need to be hard coded?
It seems the FeedsParser.inc is used no matter which actual parser is selected, right? Is there a way to override that file for a single importer? This may be a support request instead of a feature request, and if so my apologies.
Comment | File | Size | Author |
---|---|---|---|
#124 | interdiff-1171114-123-124.txt | 1.99 KB | MegaChriz |
#124 | feeds-file-replace-rename-1171114-124.patch | 36.53 KB | MegaChriz |
| |||
#123 | interdiff-1171114-115-123.txt | 31.53 KB | MegaChriz |
#123 | feeds-file-replace-rename-1171114-123.patch | 36.33 KB | MegaChriz |
| |||
#115 | interdiff-1171114-114-115.txt | 7.81 KB | justanothermark |
Comments
Comment #1
rhouse CreditAttribution: rhouse commentedHere is a patch to do your suggestion. I suppose this should really be an option somewhere, but clearly replacement is in general the more useful action than generating buckets of identical files that are disconnected from any possible use on the site.
Comment #2
manu manuI totaly agree rhouse.
thanks for the patch, switching the issue to needs review, hope this will be commited.
Comment #3
paulgemini CreditAttribution: paulgemini commentedGood work. Me too!
Comment #4
pcambraWorks for me, but I'm not sure if this is de desired behavior by default, ideal thing would be to expose this as a setting.
Comment #5
rhouse CreditAttribution: rhouse commentedSure options are better than no options, but if we have to pick just one, what is the more useful? In fact, can anyone give me a scenario where the current behaviour does anything useful at all? IOW, under what situations could the current behaviour be desired? Since the old files aren't linked by anything, I can't imagine any. The typical use case is that a csv file is created describing lots of nodes, this is imported. Some bugs are discovered and fixed, and it is imported again. But some fields are files. Why would we want the old file, identical to the current, to stay around? How do we make any use of it?
I.e., even if there is an option setting created, we want "replace' to be the default, not 'duplicate'.
Comment #6
pcambraSetting to needs work as this is not really working when you import content that share fields.
Steps to test:
Import a node with image field and images A & B
Import another node with image field and images B & C
Comment #7
carn1x CreditAttribution: carn1x commentedsubscribe
Comment #8
rfayI believe there's an (awkward) workaround for this involving making sure the images are local to begin with, but I'm not remembering what it was.
Comment #9
Dryphtnet CreditAttribution: Dryphtnet commentedIn my case your work-around would be preferred. I have a huge catalog import that has thousands of images. Every entry has numerous calls to "shared" images like color swatches and brand logos. Uploading the images as a compressed file and expanding them on the server would be much faster than importing them individually.
That being said, I tried the patch posted by rhouse, which adds FILE_EXISTS_REPLACE to FeedsParser.inc, but it isn't working for me. The shared images still get re-copied and incremented.
Over at drupal.stackexchange.com/questions/17956/feeds-import-causing-duplicate-image-files, PhiloSurfer suggested using Template Suggestions and import flat links instead of images.
Since there has been little visible movement on this issue for a while now, I may have to learn Template Suggestions but I'd rather just have a working importer.
Comment #10
AJen CreditAttribution: AJen commentedI agree. It doesn't seem right to me that the default behaviour is to have duplicates. CMS = each data is only represented once
Comment #11
dman CreditAttribution: dman commentedI'm in both camps (not with Feeds, though I'm migrating towards it) but with similar-ish modules of my own, "import_html", "wrapper" & "mirror". Also with reference to the earlier feeds_image_grabber
I've worked with content that had an even split of unique images-per-item and re-used logos, incidental layout images, and PDF files referenced from several different sources.
The sure-fire way for everything to keep working is (the current) option A : Make a new version of the file for each import. That works, is not actually broken ... but internally can leave horribly redundant copies around the place.
OTOH, option B : there is no way blindly over-writing/re-using can be trusted. I had 300 (different) images called 'preview.jpg' and a number of 'newsletter.pdf' or 'update.pdf' etc.
My work-around is:
Look-ahead, if file doesn't exist, copy and import as normal.
If it Does exist, download the new one to a temp location, do a filesize & MD5 comparison to figure if it's a dupe.
Either relink the existing one and discard the temp if it's a dupe,
import using FILE_EXISTS_RENAME otherwise.
Now ... working around that IIRC required a little probing around and re-implementing the way the core file functions worked (I had to respect filefield_path settings etc to figure out where the file was going to land rather than trusting the system do just do the right thing) and, well it's an ungraceful chunk of code that made a simple task look much more complicated.
But , well, that's the way I've solved this before now.
Comment #12
amaree CreditAttribution: amaree commentedI see this as a pain in the butt issue, for example when I am importing 3000 spare parts to my site that share 200 referenced images I get duplication all over the place, and I am not prepared to hand something to my client that clutters there file system and also increases their storage use.
I am applying this patch but agree it should go into the feeds core as a selection, entity reference is good where you can use it however you can not use entity reference in every case.
I will try to post a blog post etc on how I eventually tackle the problem of 30,000+ products, 10,000+ spare parts and 6000 accessories that share images wish we luck ;))
Comment #13
amaree CreditAttribution: amaree commentedI am not sure how this patch really helps as it changes at the gui end ie shows the client the real name but still imports the images as image_1, image_2 etc :((( this is a pain of an issue is this going to get corrected or will we see this bad/poor logic copied into Drupal8??
Comment #14
goodeit CreditAttribution: goodeit commentedI think I have found the method @rfay was referring to in #8.
Image to use: Jellyfish.jpg in /sites/default/files/products/images.
Path:
public://products/images/Jellyfish.jpg
This uses the same file for both new nodes and creates no new files. The important part is to make sure that your file exists in the destination path that is set for that field (in my case, I have the Image field mapped to products/images/).
Comment #15
Masala CreditAttribution: Masala commentedPath work for me! Very thanks!
Comment #16
amaree CreditAttribution: amaree commentedThanks goodeit for that :D
Comment #17
PedroKTFC CreditAttribution: PedroKTFC commentedI tried #14 in my csv file but it didn't work! :(
I have a column for the images for my drupal commerce products, many of which share the same image. In that column I have entries of the type public://products/images/Jellyfish.jpg, repeated many times for each image shared by multiple products. This image is also present in that directory but the import insists in producing multiple versions of the image files in the /sites/default/files directory. Is there something I'm doing wrong?
Comment #18
dready2011 CreditAttribution: dready2011 commentedThe patch in #1 didnt work for me, but i am importing images from a URL on a differnt server.
By adding the same parameter to file_save_data on line 326 of FeedsParser.inc, changing it into
$file = file_save_data($this->getContent(), "$destination/$filename", FILE_EXISTS_REPLACE);
I solved my problems.
Sorry, i can't really make a patch from the machine i am on now - but thought this might still be helpful to some.
Comment #19
lwalley CreditAttribution: lwalley commentedI found it useful to patch FeedsEnclosure (see attached patch applied to Feeds 7.x-2.0-alpha6) adding a $replace parameter to getFile() method so that I could pass in FILE_EXISTS_REPLACE instead of FILE_EXISTS_RENAME when calling getFile() from custom hook_feeds_processor_target_alter() callbacks.
Comment #20
hepabolu CreditAttribution: hepabolu commentedI must say that I know how to write code, but I'm not proficient in PHP, so the results below are found during endless sessions with importing and sprinkling 'watchdog' messages in the code.
I've added the patch from #19, but I still end up with a duplication of the files at the filesystem.
For now this is what I found:
- using feeds with patch from #19, but the constant set to FILE_EXISTS_REPLACE
- CSV parsing configured
- image to 'import' exists in ../sites/default/images/orison/
- image location in CSV defined as 'public://images/orison/' -> considered to be a new file, doesn't even come close to the line patched in #19
- image location in CSV defined as '/absolute/path/on/local/filesystem' (= actual file location) -> considered to be a new file, doesn't even come close to the line patched in #19
- image location in CSV defined as 'http:///sites/default/images/orison/' is recognised as a local file and is subsequently passed onto the patched file_save_data function.
- the path in filefield_path settings needs to include sites/default/files (so sites/default/files/images/orison), otherwise (i.e. only images/orison) it is still considered a different file
Right now, I've fixed the problem, but when looking at the file_managed database I still see this:
- IMCE stores the same file as public://images/orison/filename
- multiple public://sites/default/files/images/orison/filename_1, ..../filename_2 etc.
I would expect the IMCE location to be the same to the 'feeds location' and that I would find multiple records in file_usage to point to a single record in file_managed.
Comment #21
hepabolu CreditAttribution: hepabolu commentedUpdate: WRONG! Now the duplicate files are stored in sites/default/files/sites/default/files/images/orison/filenames. :-(
Comment #22
hepabolu CreditAttribution: hepabolu commentedUpdate 2: on inspection of the file_save_data function, it looks like the file is passed as 'public://filename' rather than 'public://images/orison/filename'. After much 'debugging' I found the cause: http://drupal.org/node/1809698
It would be nice if someone could help out on how to write a module that modifies the file_field_widget_uri with a filefield_paths friendly version, but I guess that's an issue for the filefield_paths issue queue.
Comment #23
hepabolu CreditAttribution: hepabolu commentedUpdate 3: I replaced the call to the file_field_get_widget_uri in mappers/file.inc to
$destination = _get_widget_uri($info, $instance_info, $data);
and added a custom function
After this, I can revert back to use 'public://images/orison/filename' in my CSV file, rather than the full URL.
And finally, after much frustration, I realised I cannot have an underscore + number naming scheme in these filenames. :-(
I hope this is of help to anyone.
Comment #24
Summit CreditAttribution: Summit commentedHi,
Referring to this:
How is this possible? And is there a workaround?
greetings,
Martijn
Comment #25
patrik.hultgren CreditAttribution: patrik.hultgren commentedI've made a new patch that should be used with the patch in #19.
My patch adds FILE_EXISTS_RENAME when calling getFile() directly in mappers/file.inc.
It may be useful if you don't wan't to create a custom callback in hook_feeds_processor_target_alter().
Works with alpha7.
Comment #26
Alan D. CreditAttribution: Alan D. commentedStatus of this issue? We are finding that images with well over 500 copies are common from a Flickr feed that has "update" setting selected.
Switching to a major bug report as this quickly consumes disk space, potentially going over hosting limits, or in our case, failed backups on Amazon S3 (5 GB transfer limit).
Comment #27
vinmassaro CreditAttribution: vinmassaro commentedI ran into this today while testing an import on nodes referencing the same images as well. Here is a patch that combines #19 and #25. I tested this and it is preventing duplicate images on import. This is against 7.x-2.x HEAD.
Comment #28
vinmassaro CreditAttribution: vinmassaro commented@ Alan D.: can you test the patch in #27 and confirm if it fixes the problem for you?
Comment #29
Alan D. CreditAttribution: Alan D. commentedWe are running an older version of feeds, I ended up hard coding the replace hoping that we never import two images with the same name :?
Comment #30
mknet CreditAttribution: mknet commentedI am afraid I have to say that none of the patches solves my issue.
In FeedParser.inc, in the function getFile there is the if condition asking for the drupal_real_path. In my case this condition is true:
if (drupal_realpath($this->getValue())) {
In my case the value of getValue is 'public://uwe-nebel.jpg' and the real path is the absolute file path. So, I don't get into the else path.
Furthermore, the
$destination
(value: 'public://' - not trimmed by file_prepare_directory since it is a stream path) is compared withdirname($file->uri)
(value: 'public:'). Since the those strings are not equal the file is copied:So, as for me I'd say we have to make sure that in such as my cases the the two paths which gets compared should be the same. Is the usage of the file_prepare_directory really the right way to go? Not trimming of the $destination value is wrong - at least from my perspective.
I'd say I trim the path as well an I'm done.
What do you think?
Comment #31
seaneffel CreditAttribution: seaneffel commentedI have tested #27 the patch and it does not solve the original post's problem. Using Feeds to import image fields will create duplicate files when discovering the file exists in the database and/or target directory.
If my target image file directory is empty, then the image fields do not update with the image path on import. If the image file directory contains images that match the data in the CSV file, then every image is duplicated and appended "_0" in the file name and the appended file path is recorded in the database.
If I had 100 images, I would just chuck the originals and leave the dupes to rot. But I have thousands and could not do that kind of filtering.
Comment #32
mknet CreditAttribution: mknet commentedThat is exactly what I meant with my post.
I'v done a simple change at line 347
But I am pretty sure that this is just a workaround and a hack ...
Comment #33
seaneffel CreditAttribution: seaneffel commentedThing is, this is kind of a sharpshooter module. You drop it in, pick off your problem, and chuck it when you're done. Even a hack would get most users of this module to their intended goal.
I'll give your hacky hackness a whack and report back.
Comment #34
twistor CreditAttribution: twistor commentedAlright, I need this one.
The plan is to provide 2 options for the start.
We can use our fun new mapping config for it. Yay!
In the future, we can implement Dman's suggestion about filesize/md5 comparison.
Comment #35
twistor CreditAttribution: twistor commentedLet's see what the bot says.
I also fixed feeds_tests_files() to test locally. This should have tests.
There is some other whitespace fixes, but I'm OK with those.
Comment #36
twistor CreditAttribution: twistor commentedWe need a follow-up for providing defaults to mapping config. It gets too messy.
Comment #38
twistor CreditAttribution: twistor commentedHaha, so much for "fixing" that. Also, missed the file_copy().
Comment #39
Vincent Rommelaars CreditAttribution: Vincent Rommelaars commentedHi,
Anyone any idea on how to achieve this in D6.
I've tried several options, but I can't get it to replace the existing file.
Extra information:
I'm importing with Feeds where one field is mapped as image field.
Source and target are the same, so it should update altered png files, but now it creates a duplicate file and renames it with an underscore and a incremental number.
Thanks in advance!
Comment #40
seaneffel CreditAttribution: seaneffel commentedPlease open a new ticket for this regarding Drupal 6. Shouldn't change the focus of a thread in progress.
Comment #41
twistor CreditAttribution: twistor commentedRe-roll
Comment #42
MegaChriz CreditAttribution: MegaChriz commentedMarked #2011112: Same Image for Many Nodes - Bulk CSV Import as a duplicate.
Comment #43
Summit CreditAttribution: Summit commentedThis is working I think, setting it to RTBC, ok?
greetings, Martijn
Comment #44
stevenkkim CreditAttribution: stevenkkim commentedHi Twistor,
Thanks for all the hard work on this patch... it's exactly what I need. One issue came up for me though... if I try to import multiple images into a field using feeds tamper explode, I get this error:
PDOException: SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'fid' cannot be null: INSERT INTO {file_usage} (fid, module, type, id, count) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4); Array ( [:db_insert_placeholder_0] => [:db_insert_placeholder_1] => file [:db_insert_placeholder_2] => node [:db_insert_placeholder_3] => 564 [:db_insert_placeholder_4] => 1 ) in file_usage_add() (line 662 of /home/steven/Local/em/includes/file.inc).
... any idea what's going on?
Thanks,
Steve
Comment #45
stevenkkim CreditAttribution: stevenkkim commentedOk, I've been trying to debug my issue and I think the problem occurs when when I use feed import + tamper explode, and the source is empty. I'm importing from a csv file, and if there is an image URI in the entry, then it imports the image and the entire node with no problem. But if the the image URI entry is blank (some of my nodes that I'm trying to import have no images), then the error occurs and the node is not imported. Still don't know the solution, but I'm going to keep trying to debug it.
Now that I think about it, I guess this is a new issue? Should I file a separate issue under feeds or feeds_tamper? Sorry, I'm kinda new to drupal and drupal.org, so I'm still trying to figure out the right way to do things. Thanks!
Comment #46
dman CreditAttribution: dman commentedThe error message certainly strongly implies that it's a null image. I'd have thought that somwhere in the process this would have been detected and further importing of that field there should have been aborted.
Can't tell if this is a new issue specifically related to this particular feature request or not, but yes, seeing as this thread has been finalized somewhat, this bug of yours is a follow-up, so best make a new issue I think.
Comment #47
drclaw CreditAttribution: drclaw commentedHello all,
I thought I would take a stab at adding the "Replace if different" replacement method. I also added a "Skip existing" option as well. Here's a patch. It's the same as #41 except with the extra options.
Let me know what you think!
drclaw
Comment #48
stewart.adam CreditAttribution: stewart.adam commentedLooks the function
file_feeds_file_compare
needs to be renamed tofeeds_file_compare()
:An AJAX HTTP error occurred. HTTP Result Code: 200 Debugging information follows. Path: /batch?id=608&op=do StatusText: parsererror ResponseText: Fatal error: Call to undefined function feeds_file_compare() in /Users/stewart.adam/Sites/d7-test/sites/all/modules/feeds/mappers/file.inc on line 145
Comment #49
drclaw CreditAttribution: drclaw commentedSilly Me. Re-roll.
Comment #50
stewart.adam CreditAttribution: stewart.adam commentedI'm nitpicking a bit here, but I think we need to rename it to
feeds_file_compare
to ensure we don't violate the file module's namespace (fwiw, the latest patch works).Comment #51
drclaw CreditAttribution: drclaw commentedHm... That's a good point, I suppose, but all the other functions in file.inc (other than
file_feeds_processor_targets_alter()
) are technically using the "file" namespace. I kind of considered the namespace to be "file_feeds", even though it isn't technically a module...Comment #52
stewart.adam CreditAttribution: stewart.adam commentedAh, I hadn't noticed that. In that case, I say best course of action is to RTBC this patch and let's submit another altogether but about the namespace used in the parser file.
Comment #53
kenpeter CreditAttribution: kenpeter commented* Add some code to #49
* "Column 'fid' cannot be null" issue happens when using feeds and feeds tamper together to sync file.
* If source file is empty and you set feeds tamper to use ", " as delimiter.
* Feeds tamper pass $value as $value[0] => '' to feeds module.
* Attachment works for me, but sure it is not a good fix.
Comment #54
itamair CreditAttribution: itamair commented#49: feeds-file-replace-rename-1171114-49.patch queued for re-testing.
Comment #55
twistor CreditAttribution: twistor commented#53, your patch is in the wrong format, please use git, or diff -up.
Review for #49:
This is going to cause the file to be downloaded twice in the event that they are different.
Also, this will need more tests.
Comment #56
richsky CreditAttribution: richsky commentedHey, I think this is a very helpfull addition (specially when doing migration with feeds), but doing some migration it is failing with file field path module using pathauto cleaning (either file path or name), please see https://drupal.org/node/1945148, thanks.
Comment #57
izus CreditAttribution: izus commentedHi,
Just suggested a patch to #2093829: Allow the reuse of existing file entities instead of creating new ones. that is kind of related issue that may fit the needs, but its not exactly what this issue is trying to solve.
++
Comment #58
ressa CreditAttribution: ressa commentedI want to replace images when I import a feed and avoid creating duplicates -- which patch should I test to achieve this?
Comment #59
drclaw CreditAttribution: drclaw commentedThe Last Useable patch was from comment #49, but there may still be some issues that need to be ironed out as mentioned in comment #53.
Comment #60
drclaw CreditAttribution: drclaw commentedOkay, here's a first attempt at some tests for the new file handling options.
Secondly, to address Comment #53 by @kenpeter, the errors that occur with feeds tamper explode and empty fields is universal across all field types (i think). I believe you need to use the "Filter empty items" tamper plugin after "Explode" to remove those empty ones.
Comment #61
runeasgar CreditAttribution: runeasgar commentedI am unable to get this patch to apply to the version of feeds that is presently downloaded by drush:
(Updated to reflect most recent patch)
Comment #62
drclaw CreditAttribution: drclaw commentedThe patch is for the development release (7.x-2.x-dev) of the module. You'll need to specify that when you dl with drush.
drush dl feeds-7.x-2.x-dev
Or you can clone the 7.x-2.x branch with git and patch it with "git apply". Something like this:
Comment #63
twistor CreditAttribution: twistor commentedPatch no longer applies.
Comment #64
drclaw CreditAttribution: drclaw commentedDang. I'll see what I can do.
Comment #65
chilligroup CreditAttribution: chilligroup commentedsubscribe
Comment #66
chilligroup CreditAttribution: chilligroup commented60: feeds-file-replace-rename-1171114-60.patch queued for re-testing.
Comment #68
drclaw CreditAttribution: drclaw commentedRe-roll of #60 to apply cleanly. Also fixed a bug with the "replace if different" logic. It didn't account for when the destination file didn't exist at all.
Comment #69
drclaw CreditAttribution: drclaw commentedHm... Patch didn't upload... or something...
Comment #70
st8ofmindz79 CreditAttribution: st8ofmindz79 commented69: feeds-file-replace-rename-1171114-68.patch queued for re-testing.
Comment #72
Khumbu CreditAttribution: Khumbu commentedSo just to be clear, when i have a import of an image and i choose replace, it replaces the url and doesn't rename the file to image_0.jpg...correct?
Comment #73
mahmost CreditAttribution: mahmost commentedRerolled #69 to current dev
Comment #74
huijing CreditAttribution: huijing commentedI applied patch #73 and get this error when I import:
The specified file public:///audio.mp3 was not copied because it would overwrite itself.
The file was already uploaded in the files folder and I selected the "Replace" option.
Comment #75
zmove CreditAttribution: zmove commentedDoesn't work if you use filefield_path with that patch.
when you have several entity that share the same image, the first one will have it's image, but all the following ones not.
If you use devel to see what happens, you will see that the entity image value is not saved correctly, you have $entity->field['images']['und'][0]['value'] = "" (empty).
Don't know if it's related to filefield path or feeds, the only solution I found to make it works is the following :
Comment #76
mslywka CreditAttribution: mslywka commentedDoes anyone find that patch #73 fails when applied to the security update in feeds 7.x-2.0-alpha9?
Comment #77
twistor CreditAttribution: twistor commentedIt probably does, since that changed the file handling code.
Comment #78
mslywka CreditAttribution: mslywka commentedI've tried it with the 7.x-2.x version which is now using git (https://www.drupal.org/project/feeds/git-instructions) and it doesn't apply to that version cleanly either.
Comment #79
ufku CreditAttribution: ufku commentedRerolled for latest dev (30-Jun-2015)
Comment #80
mslywka CreditAttribution: mslywka as a volunteer commentedUsing a fresh copy of the 7.x-2.x branch from
git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git
And applying the patch: (also used patch -p1 -u < feeds-file-replace-rename-1171114-79.patch fails)
git apply -v feeds-file-replace-rename-1171114-79.patch
Checking patch mappers/file.inc...
Checking patch plugins/FeedsParser.inc...
Hunk #1 succeeded at 411 (offset -5 lines).
Hunk #2 succeeded at 442 (offset -5 lines).
error: while searching for:
$destination = trim($destination, '/') . '/';
}
try {
$file = file_save_data($this->getContent(), $destination . $this->getLocalValue());
}
catch (Exception $e) {
watchdog_exception('Feeds', $e, nl2br(check_plain($e)));
error: patch failed: plugins/FeedsParser.inc:463
error: plugins/FeedsParser.inc: patch does not apply
Checking patch tests/feeds_mapper_file.test...
I can get the patch to apply with a little manual intervention, but am I working with the wrong release? I've tried it with a fresh download of the dev version as per the last comment but the 2015-Jul-02 dev release.
Comment #83
vinmassaro CreditAttribution: vinmassaro commentedLooks like the latest patch no longer applies.
Comment #84
impara CreditAttribution: impara commentedFixed patch using diff -Naurp, to apply type: patch -p1 < feeds-file-replace-rename-1171114-84.patch
Comment #85
bneil CreditAttribution: bneil at The University of Iowa commentedComment #86
bneil CreditAttribution: bneil at The University of Iowa commentedComment #89
impara CreditAttribution: impara commentedGit apply feeds-file-replace-rename-1171114.patch
Comment #90
impara CreditAttribution: impara commentedComment #91
impara CreditAttribution: impara commentedComment #94
Anas_maw CreditAttribution: Anas_maw at Vardot commentedThere was a small syntax error in the last patch.
I just fixed it.
Comment #95
Anas_maw CreditAttribution: Anas_maw at Vardot commentedComment #98
Anas_maw CreditAttribution: Anas_maw at Vardot commentedRetest
Comment #101
cyclone321 CreditAttribution: cyclone321 commentedAny News on this ?
Comment #102
firewaller CreditAttribution: firewaller commented+1
Comment #103
millionleaves CreditAttribution: millionleaves as a volunteer and commentedThe patch in #98 works for me- thanks.
I'm running a Feeds import using Feeds SQL to migrate blog posts from a Joomla database, including an image URL, into a node type in Drupal. The import grabs the image from the live Joomla site and copies it to into the Drupal files directory.
I've found that re-running the import results in the image files being duplicated when updating existing nodes. However, after installing the patch in #98 and setting the image import in the mapping to only replace the file if different, the image duplication on the second and subsequent imports stopped happening.
I have another site (Drupal Commerce). I'm importing product images where the same images are repeated across multiple product variations, and a duplicate of the image is created each time it's linked to a product variation. I'll test this patch on that installation in the next few days and report back.
Comment #104
cgmonroe CreditAttribution: cgmonroe as a volunteer commentedThe current patch did not work with the changed made for 7.x-2.0-beta2. Mostly because of the language handling changes.
I've re-rolled the patch to integrate it with these changes. Main difference is that the new way of dealing with language preference is integrated with the original patch code.
Comment #105
MegaChriz CreditAttribution: MegaChriz at WebCoo commentedSetting status to "Needs review" so the patch will be picked up by the testbot.
Quick review on just two aspects from the patch:
For a target, summary and form callbacks can now accept multiple callbacks. They keys should be renamed to "summary_callbacks" and "form_callbacks" and the values should be an array.
(This change should have been documented in the change records, but I forgot to do so.)
Example from feeds.api.php:
It looks like multiple things are tested here. It would be better if these tests are defined as separate test methods. This would also make it easier to port this feature to Drupal 8 later. I know that in the existing test different cases are also tested in a single test method (they are labeled with
1)
,2)
and3)
), but I would judge that as a bad example. Ideally, each test method should test a single thing.Comment #107
cgmonroe CreditAttribution: cgmonroe as a volunteer commentedHere's a new patch that should pass the tests. Changes are:
Fixed a manual merge problem in the initial patch.
Now uses the array versions for callback definitions.
Test are now individual methods, e.g. testFileExists[condition]..
Comment #108
cgmonroe CreditAttribution: cgmonroe as a volunteer commentedSetting to needs review so new patch will be tested.
Comment #109
azinck CreditAttribution: azinck commentedI'm testing #107.
Why is it that we still call file_save() even if an existing file is found, and even if we have chosen to "skip existing files"? The consequence of calling file_save() is that many of the existing file's properties are wiped out or reverted to defaults, including the filename and uid.
In file_feeds_set_target() we need to either avoid calling FeedsEnclosure::getFile() on such files or we need to change FeedsEnclosure::getFile() to not call file_save() if we're intending to skip the file.
Comment #110
azinck CreditAttribution: azinck commentedLet's try this to address #109.
Comment #111
nelslynn CreditAttribution: nelslynn commentedPatch works great except for one minor thing.... Using the 'Skip when file Exists", and looking at the file_managed database table, the uid is not updated on the 'skipped' images. See screenshots.
https://www.dropbox.com/s/4nxggs4gdxpgni8/SAVE_feeds_skipped_images.png?...
https://www.dropbox.com/s/td95wo8guaheb9a/SAVE_feeds_1171114.png?dl=0
Nice work, thanks!
Comment #112
helmo CreditAttribution: helmo at Initfour websolutions for Aegir Cooperative commentedWorks nice... using the 'Replace if different' option. Errors went away :)
no comment on #111
Comment #113
Collins405 CreditAttribution: Collins405 commentedYep, the patch in #110 applied cleanly, and I just successfully imported 10,000 nodes.
Where there were 350,000 images before including duplicates, there are now 22,000 !!! Much better! Lets get this committed!
Comment #114
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedMy review after going through the full patch for the first time:
Functional
At first, the patch didn't seem to have effect until I disabled the Filefield Paths module (also noted in comment #75).
With that module disabled, the patch is working great! I've tested each of the four options.
I'm not sure yet if we should hold this patch off because it isn't compatible with the Filefield Paths module. It seems like it is kinda hard to make the feature work in combination with that module. Maybe we should "fix" this with documentation.
I had some trouble initially to understand the differences between the options 'Replace' and 'Replace if different'. It appears to be that with the option 'Replace if different' there is an additional check for if the incoming file is the same and the only gain is that the file won't be resaved if this is the case.
I also noticed that the option 'Replace if different' only works when importing files that already exist locally. When trying to import a file via HTTP I got the following error message:
I think it is enough to fix this with documentation. Also because the option 'Replace if different' only does something when the file to import is different, I suggest to rename the option to 'Replace only if different'.
To clarify the options further, my suggestion is to add the following description to the target configuration field 'Replacement method':
I've done so in the attached patch. Because that didn't line out well in the theme 'Seven' I've made some additions in feeds_ui.css as well.
Code
I found some minor issues in the code style. Most of these code style issues were in the tests. Sometimes an indentation of four spaces was used, sentences didn't end with a dot and a newline was missing between two methods.
Other things I noticed in the tests:
testFilesNameMap()
. Callbacks or helper functions in a test should not start withtest
, cause then SimpleTest thinks it is a test. This will result into SimpleTest building the test environment, installing Drupal, execute test setup code, and then execute the 'test' which in this case does nothing but returning an array. I renamed this method tolistTestFilesNameMap()
and moved it to the bottom of the file as well.I cleaned up the tests and also fixed code style issues found in the other files.
Feedback on my additions are welcome. I would especially like feedback on the description I added for the four options.
Comment #115
justanothermark CreditAttribution: justanothermark at CTI Digital commentedI agree with the changes in #114 but would like to add another option. As well as "Replace if different", we had a requirement for "Rename if different".
The additional code in the patch is all very similar to the Replace if different except for setting $mapping['file_exists'] to FILE_EXISTS_RENAME instead of FILE_EXISTS_REPLACE.
I also slightly changed how file_feeds_file_compare() works. When used with a remote URL filesize() might return FALSE instead of the filesize so it was always returning FALSE for some images. It now checks that the values are not empty before returning FALSE.
Note: If swapping from previous patch then be aware that FEEDS_FILE_EXISTS_RENAME_DIFFERENT = 4 and FEEDS_FILE_EXISTS_SKIP = 5. While I could have made FEEDS_FILE_EXISTS_RENAME_DIFFERENT = 5, it seemed more ordered to have FEEDS_FILE_EXISTS_REPLACE_DIFFERENT & FEEDS_FILE_EXISTS_RENAME_DIFFERENT together and then FEEDS_FILE_EXISTS_SKIP. Therefore, mappings in existing feeds (saved with previous patch) should be reset to the expected value and resaved. As it is only in the patch state, I thought it was acceptable to make this change.
Comment #116
internal CreditAttribution: internal as a volunteer commentedI think the support for File (Field) Paths module is a must. I encounter the file renaming headache now. I searched long way but still not sure which module of these two should be patched...
Comment #117
haver CreditAttribution: haver commentedIs it anogth to disable Filefield Paths checkbox in field settings or required to disable entire Filefield Paths module?
Comment #118
Christopher Riley CreditAttribution: Christopher Riley commentedPatched applied clean on the latest dev. I would like to see this committed.
Comment #119
Helice CreditAttribution: Helice as a volunteer commentedThe patch in #115 works for me. Thank you very much. It needs a commit on next release.
Comment #120
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedI would love to commit this, however I noticed that the newly added option in #115 'Rename only if different' is not clearly documented yet. I'm fine with renumbering the options (thus that FEEDS_FILE_EXISTS_SKIP is now 5 instead of 4).
The option 'Rename only if different' is not documented. What does it exactly do?
Minor: line should be 80 chars long and in the comments 'filesize' should be 'file size'.
In the test FeedsMapperFileTestCase::testFileExistsRenameIfDifferent() the comments and messages talk about *replacing* files, while the test should be about *renaming* files instead?
Comment #121
tyler-durden CreditAttribution: tyler-durden commentedIt looks like there was a recent commit, did this get in there by chance? If not, how can we proceed?
Comment #122
MegaChriz CreditAttribution: MegaChriz as a volunteer commented@tyler-durden
No, this didn't got in yet. The reason is that I found some issues with the option that was added in #115, see #120. I would love to get this in before the next beta release, which I plan to release this month. Would you like to address the issues noted in #120?
@internal
I agree that support for File (Field) Paths would be great, though not sure how to do that right now. Maybe it's better to handle that in a followup.
Comment #123
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedI've addressed my points from #120.
I also refactored the test cases as I found them a bit hard to read. Test failures possible.
There are no functional changes compared to the previous patch, only documentation and automated test changes.
Comment #124
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedSmall change in code comments and added a note that the setting doesn't work with the File (Field) Paths module.
I think this is ready.
Comment #128
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedThe failing test was an issue with the testbot (that suddenly didn't checkout a dependency of the i18n module anymore). Back to "Needs review".
Comment #129
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedThis is a target for the Feeds 7.x-2.0-beta3 release, which is planned to be released on November 24.
Comment #131
MegaChriz CreditAttribution: MegaChriz as a volunteer commentedCommitted #124. Thanks all!
Comment #132
5n00py CreditAttribution: 5n00py commentedWow, cool! Thanks for this!!!