The site I'm currently working on uses feeds to import several nodes that use the same image. Feeds uses the file_save or file_copy functions, which by default rename files if they already exist in the database, which in this case leads to hundreds of duplicates.

It seems to be taking place in the getFile function in FeedsParser.inc. I could add FILE_EXISTS_REPLACE to the function arguments, but that will obviously be altering the way feeds handles files globally rather than in this one instance. Is there already a good way to handle this, or would something need to be hard coded?

It seems the FeedsParser.inc is used no matter which actual parser is selected, right? Is there a way to override that file for a single importer? This may be a support request instead of a feature request, and if so my apologies.

CommentFileSizeAuthor
#124 interdiff-1171114-123-124.txt1.99 KBMegaChriz
#124 feeds-file-replace-rename-1171114-124.patch36.53 KBMegaChriz
#123 interdiff-1171114-115-123.txt31.53 KBMegaChriz
#123 feeds-file-replace-rename-1171114-123.patch36.33 KBMegaChriz
#115 interdiff-1171114-114-115.txt7.81 KBjustanothermark
#115 feeds-file-replace-rename-1171114-115.patch33.33 KBjustanothermark
#114 interdiff-1171114-110-114.txt30.97 KBMegaChriz
#114 feeds-file-replace-rename-1171114-114.patch27.91 KBMegaChriz
#110 interdiff.txt1.04 KBazinck
#110 feeds-file-replace-rename-1171114-110.patch29 KBazinck
#107 feeds-file-replace-rename-1171114-106.patch28.5 KBcgmonroe
#104 feeds-file-replace-rename-1171114-104.patch21.57 KBcgmonroe
#98 feeds-file-replace-rename-1171114-97.patch22.31 KBAnas_maw
#94 feeds-file-replace-rename-1171114-94.patch22.31 KBAnas_maw
#89 feeds-file-replace-rename-1171114.patch22.31 KBimpara
#84 feeds-file-replace-rename-1171114-84.patch21.84 KBimpara
#79 feeds-file-replace-rename-1171114-79.patch21.93 KBufku
#73 feeds-file-replace-rename-1171114-73.patch22.49 KBmahmost
#69 feeds-file-replace-rename-1171114-68.patch22.38 KBdrclaw
#60 feeds-file-replace-rename-1171114-60.patch22.2 KBdrclaw
#53 feeds-tamper-fix.patch3.91 KBkenpeter
#49 feeds-file-replace-rename-1171114-49.patch7.21 KBdrclaw
#47 feeds-file-replace-rename-1171114-47.patch7.2 KBdrclaw
#41 feeds-file-replace-rename-1171114-41.patch4.78 KBtwistor
#38 feeds-file-replace-rename-1171114-38.patch5.61 KBtwistor
#35 feeds-file-replace-rename-1171114-35.patch6.55 KBtwistor
#27 feeds-replace_existing_file_instead_of_renaming-1171114-27.patch2.2 KBvinmassaro
#25 file-mapper-replace-existing-1171114-25.patch520 bytespatrik.hultgren
#19 feeds-getfile_replace_behaviour_alter-1171114-19.patch1.33 KBlwalley
#1 feeds_1171114_1_replace_instead_of_renaming.patch635 bytesrhouse
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rhouse’s picture

Here is a patch to do your suggestion. I suppose this should really be an option somewhere, but clearly replacement is in general the more useful action than generating buckets of identical files that are disconnected from any possible use on the site.

manu manu’s picture

Status: Active » Needs review

I totaly agree rhouse.
thanks for the patch, switching the issue to needs review, hope this will be commited.

paulgemini’s picture

Good work. Me too!

pcambra’s picture

Works for me, but I'm not sure if this is de desired behavior by default, ideal thing would be to expose this as a setting.

rhouse’s picture

Sure options are better than no options, but if we have to pick just one, what is the more useful? In fact, can anyone give me a scenario where the current behaviour does anything useful at all? IOW, under what situations could the current behaviour be desired? Since the old files aren't linked by anything, I can't imagine any. The typical use case is that a csv file is created describing lots of nodes, this is imported. Some bugs are discovered and fixed, and it is imported again. But some fields are files. Why would we want the old file, identical to the current, to stay around? How do we make any use of it?

I.e., even if there is an option setting created, we want "replace' to be the default, not 'duplicate'.

pcambra’s picture

Status: Needs review » Needs work

Setting to needs work as this is not really working when you import content that share fields.

Steps to test:
Import a node with image field and images A & B
Import another node with image field and images B & C

carn1x’s picture

subscribe

rfay’s picture

I believe there's an (awkward) workaround for this involving making sure the images are local to begin with, but I'm not remembering what it was.

Dryphtnet’s picture

In my case your work-around would be preferred. I have a huge catalog import that has thousands of images. Every entry has numerous calls to "shared" images like color swatches and brand logos. Uploading the images as a compressed file and expanding them on the server would be much faster than importing them individually.

That being said, I tried the patch posted by rhouse, which adds FILE_EXISTS_REPLACE to FeedsParser.inc, but it isn't working for me. The shared images still get re-copied and incremented.

Over at drupal.stackexchange.com/questions/17956/feeds-import-causing-duplicate-image-files, PhiloSurfer suggested using Template Suggestions and import flat links instead of images.

Since there has been little visible movement on this issue for a while now, I may have to learn Template Suggestions but I'd rather just have a working importer.

AJen’s picture

I agree. It doesn't seem right to me that the default behaviour is to have duplicates. CMS = each data is only represented once

dman’s picture

I'm in both camps (not with Feeds, though I'm migrating towards it) but with similar-ish modules of my own, "import_html", "wrapper" & "mirror". Also with reference to the earlier feeds_image_grabber

I've worked with content that had an even split of unique images-per-item and re-used logos, incidental layout images, and PDF files referenced from several different sources.

The sure-fire way for everything to keep working is (the current) option A : Make a new version of the file for each import. That works, is not actually broken ... but internally can leave horribly redundant copies around the place.

OTOH, option B : there is no way blindly over-writing/re-using can be trusted. I had 300 (different) images called 'preview.jpg' and a number of 'newsletter.pdf' or 'update.pdf' etc.

My work-around is:
Look-ahead, if file doesn't exist, copy and import as normal.
If it Does exist, download the new one to a temp location, do a filesize & MD5 comparison to figure if it's a dupe.
Either relink the existing one and discard the temp if it's a dupe,
import using FILE_EXISTS_RENAME otherwise.

Now ... working around that IIRC required a little probing around and re-implementing the way the core file functions worked (I had to respect filefield_path settings etc to figure out where the file was going to land rather than trusting the system do just do the right thing) and, well it's an ungraceful chunk of code that made a simple task look much more complicated.
But , well, that's the way I've solved this before now.

amaree’s picture

I see this as a pain in the butt issue, for example when I am importing 3000 spare parts to my site that share 200 referenced images I get duplication all over the place, and I am not prepared to hand something to my client that clutters there file system and also increases their storage use.

I am applying this patch but agree it should go into the feeds core as a selection, entity reference is good where you can use it however you can not use entity reference in every case.

I will try to post a blog post etc on how I eventually tackle the problem of 30,000+ products, 10,000+ spare parts and 6000 accessories that share images wish we luck ;))

amaree’s picture

I am not sure how this patch really helps as it changes at the gui end ie shows the client the real name but still imports the images as image_1, image_2 etc :((( this is a pain of an issue is this going to get corrected or will we see this bad/poor logic copied into Drupal8??

goodeit’s picture

I think I have found the method @rfay was referring to in #8.

Image to use: Jellyfish.jpg in /sites/default/files/products/images.
Path: public://products/images/Jellyfish.jpg

This uses the same file for both new nodes and creates no new files. The important part is to make sure that your file exists in the destination path that is set for that field (in my case, I have the Image field mapped to products/images/).

Masala’s picture

Path work for me! Very thanks!

amaree’s picture

Thanks goodeit for that :D

PedroKTFC’s picture

I tried #14 in my csv file but it didn't work! :(

I have a column for the images for my drupal commerce products, many of which share the same image. In that column I have entries of the type public://products/images/Jellyfish.jpg, repeated many times for each image shared by multiple products. This image is also present in that directory but the import insists in producing multiple versions of the image files in the /sites/default/files directory. Is there something I'm doing wrong?

dready2011’s picture

The patch in #1 didnt work for me, but i am importing images from a URL on a differnt server.
By adding the same parameter to file_save_data on line 326 of FeedsParser.inc, changing it into
$file = file_save_data($this->getContent(), "$destination/$filename", FILE_EXISTS_REPLACE);
I solved my problems.

Sorry, i can't really make a patch from the machine i am on now - but thought this might still be helpful to some.

lwalley’s picture

I found it useful to patch FeedsEnclosure (see attached patch applied to Feeds 7.x-2.0-alpha6) adding a $replace parameter to getFile() method so that I could pass in FILE_EXISTS_REPLACE instead of FILE_EXISTS_RENAME when calling getFile() from custom hook_feeds_processor_target_alter() callbacks.

hepabolu’s picture

I must say that I know how to write code, but I'm not proficient in PHP, so the results below are found during endless sessions with importing and sprinkling 'watchdog' messages in the code.

I've added the patch from #19, but I still end up with a duplication of the files at the filesystem.

For now this is what I found:
- using feeds with patch from #19, but the constant set to FILE_EXISTS_REPLACE
- CSV parsing configured
- image to 'import' exists in ../sites/default/images/orison/
- image location in CSV defined as 'public://images/orison/' -> considered to be a new file, doesn't even come close to the line patched in #19
- image location in CSV defined as '/absolute/path/on/local/filesystem' (= actual file location) -> considered to be a new file, doesn't even come close to the line patched in #19
- image location in CSV defined as 'http:///sites/default/images/orison/' is recognised as a local file and is subsequently passed onto the patched file_save_data function.
- the path in filefield_path settings needs to include sites/default/files (so sites/default/files/images/orison), otherwise (i.e. only images/orison) it is still considered a different file

Right now, I've fixed the problem, but when looking at the file_managed database I still see this:
- IMCE stores the same file as public://images/orison/filename
- multiple public://sites/default/files/images/orison/filename_1, ..../filename_2 etc.

I would expect the IMCE location to be the same to the 'feeds location' and that I would find multiple records in file_usage to point to a single record in file_managed.

hepabolu’s picture

Update: WRONG! Now the duplicate files are stored in sites/default/files/sites/default/files/images/orison/filenames. :-(

hepabolu’s picture

Update 2: on inspection of the file_save_data function, it looks like the file is passed as 'public://filename' rather than 'public://images/orison/filename'. After much 'debugging' I found the cause: http://drupal.org/node/1809698

It would be nice if someone could help out on how to write a module that modifies the file_field_widget_uri with a filefield_paths friendly version, but I guess that's an issue for the filefield_paths issue queue.

hepabolu’s picture

Update 3: I replaced the call to the file_field_get_widget_uri in mappers/file.inc to
$destination = _get_widget_uri($info, $instance_info, $data);

and added a custom function

function _get_widget_uri($info, $instance_info, $data) {
  $destination = file_field_widget_uri($info, $instance_info, $data);
  $uri_scheme = $info['settings']['uri_scheme'] . '://';
  if ((empty($destination) || $destination == $uri_scheme) && module_exists('filefield_paths')) {
    $ffsettings = $instance_info['settings']['filefield_paths']['file_path'];
    $tmpdest = ltrim(filefield_paths_process_string($ffsettings['value'], $data, $ffsettings['options']));

    $destination = $uri_scheme . $tmpdest;
  }

  return $destination;
}

After this, I can revert back to use 'public://images/orison/filename' in my CSV file, rather than the full URL.

And finally, after much frustration, I realised I cannot have an underscore + number naming scheme in these filenames. :-(

I hope this is of help to anyone.

Summit’s picture

Hi,
Referring to this:

And finally, after much frustration, I realised I cannot have an underscore + number naming scheme in these filenames. :-(

How is this possible? And is there a workaround?

greetings,
Martijn

patrik.hultgren’s picture

Version: 7.x-2.0-alpha3 » 7.x-2.0-alpha7
FileSize
520 bytes

I've made a new patch that should be used with the patch in #19.
My patch adds FILE_EXISTS_RENAME when calling getFile() directly in mappers/file.inc.
It may be useful if you don't wan't to create a custom callback in hook_feeds_processor_target_alter().
Works with alpha7.

Alan D.’s picture

Version: 7.x-2.0-alpha7 » 7.x-2.x-dev
Category: feature » bug
Priority: Normal » Major

Status of this issue? We are finding that images with well over 500 copies are common from a Flickr feed that has "update" setting selected.

Switching to a major bug report as this quickly consumes disk space, potentially going over hosting limits, or in our case, failed backups on Amazon S3 (5 GB transfer limit).

vinmassaro’s picture

Status: Needs work » Needs review
FileSize
2.2 KB

I ran into this today while testing an import on nodes referencing the same images as well. Here is a patch that combines #19 and #25. I tested this and it is preventing duplicate images on import. This is against 7.x-2.x HEAD.

vinmassaro’s picture

@ Alan D.: can you test the patch in #27 and confirm if it fixes the problem for you?

Alan D.’s picture

We are running an older version of feeds, I ended up hard coding the replace hoping that we never import two images with the same name :?

mknet’s picture

Assigned: Unassigned » mknet

I am afraid I have to say that none of the patches solves my issue.

In FeedParser.inc, in the function getFile there is the if condition asking for the drupal_real_path. In my case this condition is true:

if (drupal_realpath($this->getValue())) {

In my case the value of getValue is 'public://uwe-nebel.jpg' and the real path is the absolute file path. So, I don't get into the else path.

Furthermore, the $destination (value: 'public://' - not trimmed by file_prepare_directory since it is a stream path) is compared with dirname($file->uri) (value: 'public:'). Since the those strings are not equal the file is copied:

--->file_prepare_directory</strong>($destination, FILE_MODIFY_PERMISSIONS | FILE_CREATE_DIRECTORY);
      // Copy or save file depending on whether it is remote or local.
--->if (drupal_realpath($this->getValue())) {
        $file           = new stdClass();
        $file->uid      = 0;
        $file->uri      = $this->getValue();
        $file->filemime = $this->mime_type;
        $file->filename = basename($file->uri);
--->if (dirname($file->uri) != $destination)</strong> {
--->   $file = <strong>file_copy($file, $destination);</strong>
        }

So, as for me I'd say we have to make sure that in such as my cases the the two paths which gets compared should be the same. Is the usage of the file_prepare_directory really the right way to go? Not trimming of the $destination value is wrong - at least from my perspective.
I'd say I trim the path as well an I'm done.

What do you think?

seaneffel’s picture

Status: Needs review » Needs work

I have tested #27 the patch and it does not solve the original post's problem. Using Feeds to import image fields will create duplicate files when discovering the file exists in the database and/or target directory.

If my target image file directory is empty, then the image fields do not update with the image path on import. If the image file directory contains images that match the data in the CSV file, then every image is duplicated and appended "_0" in the file name and the appended file path is recorded in the database.

If I had 100 images, I would just chuck the originals and leave the dupes to rot. But I have thousands and could not do that kind of filtering.

mknet’s picture

That is exactly what I meant with my post.

I'v done a simple change at line 347

if ((dirname($file->uri) != 'public:') && (dirname($file->uri) != $destination)) {

But I am pretty sure that this is just a workaround and a hack ...

seaneffel’s picture

Thing is, this is kind of a sharpshooter module. You drop it in, pick off your problem, and chuck it when you're done. Even a hack would get most users of this module to their intended goal.

I'll give your hacky hackness a whack and report back.

twistor’s picture

Title: Replace existing file/image instead of renaming. » Allow user to choose the method of file handling.
Assigned: mknet » twistor

Alright, I need this one.

The plan is to provide 2 options for the start.

  1. Rename existing files.
  2. Replace existing files.

We can use our fun new mapping config for it. Yay!

In the future, we can implement Dman's suggestion about filesize/md5 comparison.

twistor’s picture

Status: Needs work » Needs review
Issue tags: +Needs tests
FileSize
6.55 KB

Let's see what the bot says.

I also fixed feeds_tests_files() to test locally. This should have tests.

There is some other whitespace fixes, but I'm OK with those.

twistor’s picture

We need a follow-up for providing defaults to mapping config. It gets too messy.

Status: Needs review » Needs work

The last submitted patch, feeds-file-replace-rename-1171114-35.patch, failed testing.

twistor’s picture

Status: Needs work » Needs review
FileSize
5.61 KB

Haha, so much for "fixing" that. Also, missed the file_copy().

Vincent Rommelaars’s picture

Version: 7.x-2.x-dev » 6.x-1.x-dev
Assigned: twistor » Unassigned
Category: bug » support

Hi,

Anyone any idea on how to achieve this in D6.
I've tried several options, but I can't get it to replace the existing file.

Extra information:
I'm importing with Feeds where one field is mapped as image field.
Source and target are the same, so it should update altered png files, but now it creates a duplicate file and renames it with an underscore and a incremental number.

Thanks in advance!

seaneffel’s picture

Version: 6.x-1.x-dev » 7.x-2.x-dev

Please open a new ticket for this regarding Drupal 6. Shouldn't change the focus of a thread in progress.

twistor’s picture

MegaChriz’s picture

Summit’s picture

Status: Needs review » Reviewed & tested by the community

This is working I think, setting it to RTBC, ok?
greetings, Martijn

stevenkkim’s picture

Hi Twistor,

Thanks for all the hard work on this patch... it's exactly what I need. One issue came up for me though... if I try to import multiple images into a field using feeds tamper explode, I get this error:

PDOException: SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'fid' cannot be null: INSERT INTO {file_usage} (fid, module, type, id, count) VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3, :db_insert_placeholder_4); Array ( [:db_insert_placeholder_0] => [:db_insert_placeholder_1] => file [:db_insert_placeholder_2] => node [:db_insert_placeholder_3] => 564 [:db_insert_placeholder_4] => 1 ) in file_usage_add() (line 662 of /home/steven/Local/em/includes/file.inc).

... any idea what's going on?

Thanks,
Steve

stevenkkim’s picture

Ok, I've been trying to debug my issue and I think the problem occurs when when I use feed import + tamper explode, and the source is empty. I'm importing from a csv file, and if there is an image URI in the entry, then it imports the image and the entire node with no problem. But if the the image URI entry is blank (some of my nodes that I'm trying to import have no images), then the error occurs and the node is not imported. Still don't know the solution, but I'm going to keep trying to debug it.

Now that I think about it, I guess this is a new issue? Should I file a separate issue under feeds or feeds_tamper? Sorry, I'm kinda new to drupal and drupal.org, so I'm still trying to figure out the right way to do things. Thanks!

dman’s picture

The error message certainly strongly implies that it's a null image. I'd have thought that somwhere in the process this would have been detected and further importing of that field there should have been aborted.
Can't tell if this is a new issue specifically related to this particular feature request or not, but yes, seeing as this thread has been finalized somewhat, this bug of yours is a follow-up, so best make a new issue I think.

drclaw’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
7.2 KB

Hello all,

I thought I would take a stab at adding the "Replace if different" replacement method. I also added a "Skip existing" option as well. Here's a patch. It's the same as #41 except with the extra options.

Let me know what you think!

drclaw

stewart.adam’s picture

Status: Needs review » Needs work

Looks the function file_feeds_file_compare needs to be renamed to feeds_file_compare():

An AJAX HTTP error occurred. HTTP Result Code: 200 Debugging information follows. Path: /batch?id=608&op=do StatusText: parsererror ResponseText: Fatal error: Call to undefined function feeds_file_compare() in /Users/stewart.adam/Sites/d7-test/sites/all/modules/feeds/mappers/file.inc on line 145

drclaw’s picture

Status: Needs work » Needs review
FileSize
7.21 KB

Silly Me. Re-roll.

stewart.adam’s picture

Status: Needs review » Needs work

I'm nitpicking a bit here, but I think we need to rename it to feeds_file_compare to ensure we don't violate the file module's namespace (fwiw, the latest patch works).

drclaw’s picture

Hm... That's a good point, I suppose, but all the other functions in file.inc (other than file_feeds_processor_targets_alter()) are technically using the "file" namespace. I kind of considered the namespace to be "file_feeds", even though it isn't technically a module...

stewart.adam’s picture

Status: Needs work » Reviewed & tested by the community

Ah, I hadn't noticed that. In that case, I say best course of action is to RTBC this patch and let's submit another altogether but about the namespace used in the parser file.

kenpeter’s picture

FileSize
3.91 KB

* Add some code to #49
* "Column 'fid' cannot be null" issue happens when using feeds and feeds tamper together to sync file.
* If source file is empty and you set feeds tamper to use ", " as delimiter.
* Feeds tamper pass $value as $value[0] => '' to feeds module.
* Attachment works for me, but sure it is not a good fix.

itamair’s picture

twistor’s picture

Status: Reviewed & tested by the community » Needs work

#53, your patch is in the wrong format, please use git, or diff -up.

Review for #49:

+++ b/mappers/file.inc
@@ -114,8 +129,34 @@ function file_feeds_set_target($source, $entity, $target, $value) {
+        }
+        if ($mapping['file_exists'] == FEEDS_FILE_EXISTS_REPLACE_DIFFERENT && file_exists($destination . '/' . basename($v->getValue()))) {
+          if (file_feeds_file_compare($v->getValue(), $destination . '/' . basename($v->getValue()))) {
+            $skip = TRUE;
+          }
+          else {

This is going to cause the file to be downloaded twice in the event that they are different.

Also, this will need more tests.

richsky’s picture

Hey, I think this is a very helpfull addition (specially when doing migration with feeds), but doing some migration it is failing with file field path module using pathauto cleaning (either file path or name), please see https://drupal.org/node/1945148, thanks.

izus’s picture

Hi,
Just suggested a patch to #2093829: Allow the reuse of existing file entities instead of creating new ones. that is kind of related issue that may fit the needs, but its not exactly what this issue is trying to solve.
++

ressa’s picture

I want to replace images when I import a feed and avoid creating duplicates -- which patch should I test to achieve this?

drclaw’s picture

The Last Useable patch was from comment #49, but there may still be some issues that need to be ironed out as mentioned in comment #53.

drclaw’s picture

Assigned: Unassigned » drclaw
Status: Needs work » Needs review
FileSize
22.2 KB

Okay, here's a first attempt at some tests for the new file handling options.

Secondly, to address Comment #53 by @kenpeter, the errors that occur with feeds tamper explode and empty fields is universal across all field types (i think). I believe you need to use the "Filter empty items" tamper plugin after "Explode" to remove those empty ones.

runeasgar’s picture

I am unable to get this patch to apply to the version of feeds that is presently downloaded by drush:

$ patch -p1 < ~/Downloads/feeds-file-replace-rename-1171114-60.patch.txt 
patching file mappers/file.inc
Hunk #1 succeeded at 11 with fuzz 1 (offset 4 lines).
Hunk #2 FAILED at 35.
Hunk #3 succeeded at 48 (offset -12 lines).
Hunk #4 FAILED at 66.
Hunk #5 FAILED at 102.
Hunk #6 FAILED at 114.
Hunk #7 succeeded at 135 with fuzz 2 (offset -38 lines).
4 out of 7 hunks FAILED -- saving rejects to file mappers/file.inc.rej
patching file plugins/FeedsParser.inc
patching file tests/feeds_mapper_file.test
Hunk #1 succeeded at 32 with fuzz 2 (offset 4 lines).
Hunk #2 succeeded at 65 with fuzz 1 (offset 4 lines).
Hunk #3 FAILED at 103.
Hunk #4 succeeded at 127 (offset 3 lines).
Hunk #5 succeeded at 178 with fuzz 2 (offset 22 lines).
Hunk #6 FAILED at 554.
2 out of 6 hunks FAILED -- saving rejects to file tests/feeds_mapper_file.test.rej

(Updated to reflect most recent patch)

drclaw’s picture

The patch is for the development release (7.x-2.x-dev) of the module. You'll need to specify that when you dl with drush.

drush dl feeds-7.x-2.x-dev

Or you can clone the 7.x-2.x branch with git and patch it with "git apply". Something like this:

git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git
cd feeds
curl https://drupal.org/files/feeds-file-replace-rename-1171114-60.patch | git apply
twistor’s picture

Issue summary: View changes
Status: Needs review » Needs work

Patch no longer applies.

drclaw’s picture

Dang. I'll see what I can do.

chilligroup’s picture

subscribe

chilligroup’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 60: feeds-file-replace-rename-1171114-60.patch, failed testing.

drclaw’s picture

Status: Needs work » Needs review

Re-roll of #60 to apply cleanly. Also fixed a bug with the "replace if different" logic. It didn't account for when the destination file didn't exist at all.

drclaw’s picture

Hm... Patch didn't upload... or something...

st8ofmindz79’s picture

Khumbu’s picture

So just to be clear, when i have a import of an image and i choose replace, it replaces the url and doesn't rename the file to image_0.jpg...correct?

mahmost’s picture

Rerolled #69 to current dev

huijing’s picture

I applied patch #73 and get this error when I import:
The specified file public:///audio.mp3 was not copied because it would overwrite itself.

The file was already uploaded in the files folder and I selected the "Replace" option.

zmove’s picture

Status: Needs review » Needs work

Doesn't work if you use filefield_path with that patch.

when you have several entity that share the same image, the first one will have it's image, but all the following ones not.

If you use devel to see what happens, you will see that the entity image value is not saved correctly, you have $entity->field['images']['und'][0]['value'] = "" (empty).

Don't know if it's related to filefield path or feeds, the only solution I found to make it works is the following :

  • Disable filefield path for the entity you will import
  • Make you import by checking "replace" for the images files thanks to this patch.
  • Enable filefield path again
  • Parameter filefield path as you want and check "retroactive update". It will move all the images to the right folders defined by your filefield path tokens, and it will not duplicate files thanks to the filefield path patch.
mslywka’s picture

Does anyone find that patch #73 fails when applied to the security update in feeds 7.x-2.0-alpha9?

twistor’s picture

It probably does, since that changed the file handling code.

mslywka’s picture

I've tried it with the 7.x-2.x version which is now using git (https://www.drupal.org/project/feeds/git-instructions) and it doesn't apply to that version cleanly either.

ufku’s picture

Category: Support request » Feature request
Priority: Major » Normal
Status: Needs work » Needs review
FileSize
21.93 KB

Rerolled for latest dev (30-Jun-2015)

mslywka’s picture

Using a fresh copy of the 7.x-2.x branch from

git clone --branch 7.x-2.x http://git.drupal.org/project/feeds.git

And applying the patch: (also used patch -p1 -u < feeds-file-replace-rename-1171114-79.patch fails)

git apply -v feeds-file-replace-rename-1171114-79.patch
Checking patch mappers/file.inc...
Checking patch plugins/FeedsParser.inc...
Hunk #1 succeeded at 411 (offset -5 lines).
Hunk #2 succeeded at 442 (offset -5 lines).
error: while searching for:
$destination = trim($destination, '/') . '/';
}
try {
$file = file_save_data($this->getContent(), $destination . $this->getLocalValue());
}
catch (Exception $e) {
watchdog_exception('Feeds', $e, nl2br(check_plain($e)));

error: patch failed: plugins/FeedsParser.inc:463
error: plugins/FeedsParser.inc: patch does not apply
Checking patch tests/feeds_mapper_file.test...

I can get the patch to apply with a little manual intervention, but am I working with the wrong release? I've tried it with a fresh download of the dev version as per the last comment but the 2015-Jul-02 dev release.

Status: Needs review » Needs work

The last submitted patch, 79: feeds-file-replace-rename-1171114-79.patch, failed testing.

vinmassaro’s picture

Issue tags: +Needs reroll

Looks like the latest patch no longer applies.

impara’s picture

Fixed patch using diff -Naurp, to apply type: patch -p1 < feeds-file-replace-rename-1171114-84.patch

bneil’s picture

Status: Needs work » Needs review
bneil’s picture

Issue tags: -Needs reroll

Status: Needs review » Needs work

The last submitted patch, 84: feeds-file-replace-rename-1171114-84.patch, failed testing.

The last submitted patch, 84: feeds-file-replace-rename-1171114-84.patch, failed testing.

impara’s picture

Git apply feeds-file-replace-rename-1171114.patch

impara’s picture

impara’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 89: feeds-file-replace-rename-1171114.patch, failed testing.

The last submitted patch, 89: feeds-file-replace-rename-1171114.patch, failed testing.

Anas_maw’s picture

There was a small syntax error in the last patch.
I just fixed it.

Anas_maw’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 94: feeds-file-replace-rename-1171114-94.patch, failed testing.

The last submitted patch, 94: feeds-file-replace-rename-1171114-94.patch, failed testing.

Anas_maw’s picture

Status: Needs work » Needs review
FileSize
22.31 KB

Retest

Status: Needs review » Needs work

The last submitted patch, 98: feeds-file-replace-rename-1171114-97.patch, failed testing.

The last submitted patch, 98: feeds-file-replace-rename-1171114-97.patch, failed testing.

cyclone321’s picture

Any News on this ?

firewaller’s picture

+1

millionleaves’s picture

The patch in #98 works for me- thanks.

I'm running a Feeds import using Feeds SQL to migrate blog posts from a Joomla database, including an image URL, into a node type in Drupal. The import grabs the image from the live Joomla site and copies it to into the Drupal files directory.

I've found that re-running the import results in the image files being duplicated when updating existing nodes. However, after installing the patch in #98 and setting the image import in the mapping to only replace the file if different, the image duplication on the second and subsequent imports stopped happening.

I have another site (Drupal Commerce). I'm importing product images where the same images are repeated across multiple product variations, and a duplicate of the image is created each time it's linked to a product variation. I'll test this patch on that installation in the next few days and report back.

cgmonroe’s picture

The current patch did not work with the changed made for 7.x-2.0-beta2. Mostly because of the language handling changes.

I've re-rolled the patch to integrate it with these changes. Main difference is that the new way of dealing with language preference is integrated with the original patch code.

MegaChriz’s picture

Status: Needs work » Needs review

Setting status to "Needs review" so the patch will be picked up by the testbot.

Quick review on just two aspects from the patch:

  1. +++ b/mappers/file.inc
    @@ -21,6 +30,8 @@ function file_feeds_processor_targets($entity_type, $bundle_name) {
    +        'summary_callback' => 'file_feeds_summary_callback',
    +        'form_callback' => 'file_feeds_form_callback',
    

    For a target, summary and form callbacks can now accept multiple callbacks. They keys should be renamed to "summary_callbacks" and "form_callbacks" and the values should be an array.
    (This change should have been documented in the change records, but I forgot to do so.)

    Example from feeds.api.php:

        // Example 4: use the form and summary callbacks to add additional
        // configuration options for your target. Use the form callbacks to provide
        // a form to set the target configuration. Use the summary callbacks to
        // display the target configuration.
        // @see my_module_form_callback()
        // @see my_module_summary_callback()
        $targets['my_node_field4'] = array(
          'name' => t('My fourth custom node field'),
          'description' => t('A field with additional configuration.'),
          'callback' => 'my_module_set_target4',
          'form_callbacks' => array('my_module_form_callback'),
          'summary_callbacks' => array('my_module_summary_callback'),
        );
    
  2. +++ b/tests/feeds_mapper_file.test
    @@ -160,6 +164,318 @@ class FeedsMapperFileTestCase extends FeedsMapperTestCase {
    +    // 4) Test mapping of local resources with the file exists "Rename" setting
    ...
    +    // 5) Test mapping of local resources with the file exists "Replace" setting
    ...
    +    // 6) Test mapping of local resources with the file exists
    +    //    "Replace if different" setting
    ...
    +    // 7) Test mapping of local resources with the file exists
    +    //    "Skip existing" setting
    

    It looks like multiple things are tested here. It would be better if these tests are defined as separate test methods. This would also make it easier to port this feature to Drupal 8 later. I know that in the existing test different cases are also tested in a single test method (they are labeled with 1), 2) and 3)), but I would judge that as a bad example. Ideally, each test method should test a single thing.

Status: Needs review » Needs work

The last submitted patch, 104: feeds-file-replace-rename-1171114-104.patch, failed testing.

cgmonroe’s picture

Here's a new patch that should pass the tests. Changes are:

Fixed a manual merge problem in the initial patch.
Now uses the array versions for callback definitions.
Test are now individual methods, e.g. testFileExists[condition]..

cgmonroe’s picture

Status: Needs work » Needs review

Setting to needs review so new patch will be tested.

azinck’s picture

Status: Needs review » Needs work

I'm testing #107.

Why is it that we still call file_save() even if an existing file is found, and even if we have chosen to "skip existing files"? The consequence of calling file_save() is that many of the existing file's properties are wiped out or reverted to defaults, including the filename and uid.

In file_feeds_set_target() we need to either avoid calling FeedsEnclosure::getFile() on such files or we need to change FeedsEnclosure::getFile() to not call file_save() if we're intending to skip the file.

azinck’s picture

Status: Needs work » Needs review
FileSize
29 KB
1.04 KB

Let's try this to address #109.

nelslynn’s picture

Patch works great except for one minor thing.... Using the 'Skip when file Exists", and looking at the file_managed database table, the uid is not updated on the 'skipped' images. See screenshots.

https://www.dropbox.com/s/4nxggs4gdxpgni8/SAVE_feeds_skipped_images.png?...
https://www.dropbox.com/s/td95wo8guaheb9a/SAVE_feeds_1171114.png?dl=0

Nice work, thanks!

helmo’s picture

Works nice... using the 'Replace if different' option. Errors went away :)

no comment on #111

Collins405’s picture

Yep, the patch in #110 applied cleanly, and I just successfully imported 10,000 nodes.

Where there were 350,000 images before including duplicates, there are now 22,000 !!! Much better! Lets get this committed!

MegaChriz’s picture

My review after going through the full patch for the first time:

Functional

At first, the patch didn't seem to have effect until I disabled the Filefield Paths module (also noted in comment #75).
With that module disabled, the patch is working great! I've tested each of the four options.

I'm not sure yet if we should hold this patch off because it isn't compatible with the Filefield Paths module. It seems like it is kinda hard to make the feature work in combination with that module. Maybe we should "fix" this with documentation.

I had some trouble initially to understand the differences between the options 'Replace' and 'Replace if different'. It appears to be that with the option 'Replace if different' there is an additional check for if the incoming file is the same and the only gain is that the file won't be resaved if this is the case.
I also noticed that the option 'Replace if different' only works when importing files that already exist locally. When trying to import a file via HTTP I got the following error message:

Warning: filesize(): stat failed for http://www.example.com/feeds/images/image1.jpg in file_feeds_file_compare() (line 251 of /Websites/drupal/drupalsites/drupal7/sites/all/modules/workinprogress/feeds_dev/feeds/mappers/file.inc).

I think it is enough to fix this with documentation. Also because the option 'Replace if different' only does something when the file to import is different, I suggest to rename the option to 'Replace only if different'.

To clarify the options further, my suggestion is to add the following description to the target configuration field 'Replacement method':

New files are always copied. Files that have a name that is already in use on the site are handled based on this setting.

  • Rename: files whose name is already in use are renamed.
  • Replace: files on the site with the same name are replaced.
  • Replace only if different: files on the site with the same name are replaced only if the file to import is different, in other cases the file will not be imported. Works only if the file to import is locally accessible.
  • Skip existing: files whose name is already in use are not imported.

I've done so in the attached patch. Because that didn't line out well in the theme 'Seven' I've made some additions in feeds_ui.css as well.

Code

I found some minor issues in the code style. Most of these code style issues were in the tests. Sometimes an indentation of four spaces was used, sentences didn't end with a dot and a newline was missing between two methods.
Other things I noticed in the tests:

  • Some duplicated code: I moved these to the setUp() method.
  • A non-test method called testFilesNameMap(). Callbacks or helper functions in a test should not start with test, cause then SimpleTest thinks it is a test. This will result into SimpleTest building the test environment, installing Drupal, execute test setup code, and then execute the 'test' which in this case does nothing but returning an array. I renamed this method to listTestFilesNameMap() and moved it to the bottom of the file as well.
  • A few unnessacery setup code lines in tests. SimplePie isn't used in most of the tests, so I removed the setup code where it wasn't needed.

I cleaned up the tests and also fixed code style issues found in the other files.

Feedback on my additions are welcome. I would especially like feedback on the description I added for the four options.

justanothermark’s picture

I agree with the changes in #114 but would like to add another option. As well as "Replace if different", we had a requirement for "Rename if different".

The additional code in the patch is all very similar to the Replace if different except for setting $mapping['file_exists'] to FILE_EXISTS_RENAME instead of FILE_EXISTS_REPLACE.

I also slightly changed how file_feeds_file_compare() works. When used with a remote URL filesize() might return FALSE instead of the filesize so it was always returning FALSE for some images. It now checks that the values are not empty before returning FALSE.

Note: If swapping from previous patch then be aware that FEEDS_FILE_EXISTS_RENAME_DIFFERENT = 4 and FEEDS_FILE_EXISTS_SKIP = 5. While I could have made FEEDS_FILE_EXISTS_RENAME_DIFFERENT = 5, it seemed more ordered to have FEEDS_FILE_EXISTS_REPLACE_DIFFERENT & FEEDS_FILE_EXISTS_RENAME_DIFFERENT together and then FEEDS_FILE_EXISTS_SKIP. Therefore, mappings in existing feeds (saved with previous patch) should be reset to the expected value and resaved. As it is only in the patch state, I thought it was acceptable to make this change.

internal’s picture

I think the support for File (Field) Paths module is a must. I encounter the file renaming headache now. I searched long way but still not sure which module of these two should be patched...

haver’s picture

Is it anogth to disable Filefield Paths checkbox in field settings or required to disable entire Filefield Paths module?

Christopher Riley’s picture

Patched applied clean on the latest dev. I would like to see this committed.

Helice’s picture

The patch in #115 works for me. Thank you very much. It needs a commit on next release.

MegaChriz’s picture

Assigned: drclaw » Unassigned
Status: Needs review » Needs work

I would love to commit this, however I noticed that the newly added option in #115 'Rename only if different' is not clearly documented yet. I'm fine with renumbering the options (thus that FEEDS_FILE_EXISTS_SKIP is now 5 instead of 4).

  1. +++ b/mappers/file.inc
    @@ -140,3 +196,88 @@ function file_feeds_set_target(FeedsSource $source, $entity, $target, array $val
    +    '#items' => array(
    +      t('Rename: files whose name is already in use are renamed.'),
    +      t('Replace: files on the site with the same name are replaced.'),
    +      t('Replace only if different: files on the site with the same name are replaced only if the file to import is different, in other cases the file will not be imported. Works only if the file to import is locally accessible.'),
    +      t('Skip existing: files whose name is already in use are not imported.'),
    +    ),
    

    The option 'Rename only if different' is not documented. What does it exactly do?

  2. +++ b/mappers/file.inc
    @@ -140,3 +196,88 @@ function file_feeds_set_target(FeedsSource $source, $entity, $target, array $val
    +  // If the filesize is different then assume they are different files.
    +  // However, remote files may return FALSE from filesize() so only compare filesizes
    +  // if both values are not empty.
    ...
    +  // Filesizes are the same so check md5 hash of files.
    

    Minor: line should be 80 chars long and in the comments 'filesize' should be 'file size'.

  3. +++ b/tests/feeds_mapper_file.test
    @@ -163,6 +163,490 @@ class FeedsMapperFileTestCase extends FeedsMapperTestCase {
    +    // Assert: Files were not replaced.
    +    foreach ($same as $file) {
    +      if (is_file("$source/$file")) {
    +        $message = "$source/$file is STILL the same as $dest/$file";
    +        $this->assertTrue(file_feeds_file_compare("$source/$file", "$dest/$file"), $message);
    +        $message = "$dest/$file was not replaced (modification time is the same as before import)";
    +        $this->assertEqual(filemtime("$dest/$file"), $file_times[$file], $message);
    +      }
    +    }
    +
    +    // Assert: Files were replaced.
    +    foreach ($different as $file) {
    +      if (is_file("$source/$file")) {
    +        $message = "$source/$file and $dest/$file both saved and different";
    +        $this->assertFalse(file_feeds_file_compare("$source/$file", "$dest/$file"), $message);
    +      }
    +    }
    

    In the test FeedsMapperFileTestCase::testFileExistsRenameIfDifferent() the comments and messages talk about *replacing* files, while the test should be about *renaming* files instead?

tyler-durden’s picture

It looks like there was a recent commit, did this get in there by chance? If not, how can we proceed?

MegaChriz’s picture

@tyler-durden
No, this didn't got in yet. The reason is that I found some issues with the option that was added in #115, see #120. I would love to get this in before the next beta release, which I plan to release this month. Would you like to address the issues noted in #120?

@internal
I agree that support for File (Field) Paths would be great, though not sure how to do that right now. Maybe it's better to handle that in a followup.

MegaChriz’s picture

Status: Needs work » Needs review
FileSize
36.33 KB
31.53 KB

I've addressed my points from #120.

I also refactored the test cases as I found them a bit hard to read. Test failures possible.

There are no functional changes compared to the previous patch, only documentation and automated test changes.

MegaChriz’s picture

Small change in code comments and added a note that the setting doesn't work with the File (Field) Paths module.

I think this is ready.

Status: Needs review » Needs work

The last submitted patch, 124: feeds-file-replace-rename-1171114-124.patch, failed testing.

The last submitted patch, 124: feeds-file-replace-rename-1171114-124.patch, failed testing.

The last submitted patch, 124: feeds-file-replace-rename-1171114-124.patch, failed testing.

MegaChriz’s picture

Status: Needs work » Needs review

The failing test was an issue with the testbot (that suddenly didn't checkout a dependency of the i18n module anymore). Back to "Needs review".

MegaChriz’s picture

This is a target for the Feeds 7.x-2.0-beta3 release, which is planned to be released on November 24.

  • MegaChriz committed 08985df on 7.x-2.x
    Issue #1171114 by drclaw, MegaChriz, twistor, impara, cgmonroe, Anas_maw...
MegaChriz’s picture

Status: Needs review » Fixed

Committed #124. Thanks all!

5n00py’s picture

Wow, cool! Thanks for this!!!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.