Hello. I'm using the Filefield mapper as of the latest HEAD. I am trying to map a URL that has a space in it. The actual URL is:

http://www.lpcollection.com/images/D/bronze%20skull front.jpg

As you can see, one space is properly represented in my source feed with %20, but the other space is represented by an actual space. The end result is that the photo is not downloaded or mapped properly. That is a live URL in the event anyone else wants to verify the behavior.

I am thinking in this case we would want to do two things -

1) Use something like urlencode() or htmlentities() to modify the URL to change the spaces to %20 so the file downloads properly, and

2) Modify the resulting filename in such a way that it no longer has a space. Perhaps replacing spaces with underscores or something like that.

Thoughts?

Comments

rjbrown99’s picture

Interesting, I had it wrong but there is an issue. I checked the values inside of filefield_feeds_set_target, and inside of the first foreach for $v it is coming back with a correct/changed URL of http://www.lpcollection.com/images/D/bronze%20skull%20front.jpg. So it looks like it handles the spaces correctly on the front end download at least.

The file is then saved to my path sites/default/files/members/FeedsEnclosure-bronze%20skull%20front_2.jpg. The file is there, but when loading that URL in the browser Drupal comes back with a 404. Renaming the file to foo.jpg allows the browser to read it. So the file is downloaded correctly.

I'm not quite sure if this is a mapper issue or a Drupal issue. If possible, maybe this is just a matter of modifying getFile() to convert spaces to underscores before saving.

rjbrown99’s picture

My extremely unscientific fix is this:

  public function getFile() {
    if(!isset($this->file)) {
      $dest = str_replace("%20", "_", file_destination(file_directory_temp() .'/'. get_class($this) .'-'. basename($this->getValue
      if (ini_get('allow_url_fopen')) {
        $this->file = copy($this->getValue(), $dest) ? $dest : 0;
      }
      else {
        $this->file = file_save_data($this->getContent(), $dest);
      }
      if ($this->file === 0) {
        throw new Exception(t('Cannot write content to %dest', array('%dest' => $dest)));
      }
    }
    return $this->file;
  }
}

.. whereby I wrapped $dest inside of str_replace. This works now and the images download and display properly.

If Alex or someone can chime in and suggest a better/different function to use than a simple str_replace of a single character, I'm happy to roll a patch for this. I can write the code, just not sure of the best approach for this.

rjbrown99’s picture

While we are on the subject of getFile, specifically this:

if (ini_get('allow_url_fopen')) {
  $this->file = copy($this->getValue(), $dest) ? $dest : 0;
}
else {
  $this->file = file_save_data($this->getContent(), $dest);
}
if ($this->file === 0) {
  throw new Exception(t('Cannot write content to %dest', array('%dest' => $dest)));
}

.. in the case of a 404 it falls through to the exception. This stops the processing of the feed in my case because it couldn't write the destination file (because it never got one due to the 404.)

rjbrown99’s picture

Next potential issue: getFile leaves stuff hanging around in /tmp after it is done. With tons of items/imports, this gets out of hand quickly.

It also seems that when any field of an import file changes, it does a re-download of the image. For example, if I have an XML node called price and it changes by $1, getFile is called and the image is re-downloaded. Not sure why yet.

rjbrown99’s picture

The mapper also does not decode HTML entities. I have a feed coming in with image URLs that have entities such as &. The processor fails to grab those because the entities are not translated back to & or whatever the source character is.

rjbrown99’s picture

OK, here is my latest getFile function. I was running in to a problem whereby filefield/imagefield hung up on files with & in them.

  public function getFile() {
    if(!isset($this->file)) {
      $replace = array("%20", "&", "&");
      $dest = str_replace($replace, "_", file_destination(file_directory_temp() .'/'. get_class($this) .'-'. basename($this->getValue()), FILE_EXISTS_RENAME));
      if (ini_get('allow_url_fopen')) {
        $this->file = copy($this->getValue(), $dest) ? $dest : 0;
      }
      else {
        $this->file = file_save_data($this->getContent(), $dest);
      }
      if ($this->file === 0) {
        //throw new Exception(t('Cannot write content to %dest', array('%dest' => $dest)));
      }
    }
    return $this->file;
  }

Now I can correctly handle files with spaces and files with &.

johnv’s picture

Version: 6.x-1.x-dev » 7.x-2.x-dev
Status: Active » Needs review
StatusFileSize
new968 bytes

Thank you rjbrown99, for your patch!
I think I have the same problem with spaces and other strange characters, but first we need to split the issue in two:
1- I cannot get files from the original URL, because of spaces, etc. (BTW, " ' " is working fine already)
2- Once stored the file, the resulting filename is ugly/illegal.

ad 2. I solved it using the following 2 modules:
- http://drupal.org/project/transliteration changes spaces etc.
- http://drupal.org/project/filefield_paths lets you set a specific target directory/filename, and supports Transliteration.
When importing from a csv file, make sure the file is saved as UTF-8, e.g. with MS-Notepad, save as..

ad 1. I have created and attached a patch for D7 in Feedsparser.inc. I thought about moving it from getFile() to getContent(), but that only works for local files.

   public function getFile($destination) {
 
     if ($this->getValue()) {
+      $url_clean = str_replace(" ", "%20", $this->getValue());
+      $this->__construct($url_clean); // This sets $this->$value, to get it from $getValue() again
       // Prepare destination directory.

[Edit:] Attached patchfile is incorrect

manu manu’s picture

Status: Needs review » Needs work

Hello johnv;

When using 7.x-2.x-dev from git, FeedsEnclosure::__construct() takes 2 arguments, attached patch in #7 is producing a warning.

It should be:

<?php
public function getFile($destination) {

    if ($this->getValue()) {
      $url_clean = str_replace(" ", "%20", $this->getValue());
      $this->__construct($url_clean, $this->mime_type); // This sets $this->$value, to get it from $getValue() again
      // Prepare destination directory.
?>

Manu

wojtha’s picture

IMO better approach is not harm the original value (e.g. for local copying) and encode the value only on demand when fetching the file from the URL.

  /**
   * Use instead getValue() when fetching the file from the URL.
   *
   * @return
   *   Value with encoded space characters to safely fetch the file from the URL.
   */
  public function getUrlEncodedValue() {
    return str_replace(' ', '%20', $this->getValue());
  }

  //...

  public function getContent() {
    feeds_include_library('http_request.inc', 'http_request');
    $result = http_request_get($this->getUrlEncodedValue());
    if ($result->code != 200) {
      throw new Exception(t('Download of @url failed with code !code.', array('@url' => $this->getUrlEncodedValue(), '!code' => $result->code)));
    }
    return $result->data;
  }

  public function getFile() {
    // ...
    if (ini_get('allow_url_fopen')) {
      $this->file = copy($this->getUrlEncodedValue(), $dest) ? $dest : 0;
    }
   // ...

What do you think?

johnbarclay’s picture

wojtha’s picture

Status: Needs work » Needs review
StatusFileSize
new1.64 KB

Patch for 6.x-1.x

febbraro’s picture

StatusFileSize
new8.64 KB

Here is a rework for D7 with unit tests. Can someone review this and let me know if this works for you? It encodes the spaces in the URL to %20 and changes the local file to use an underscore.

johnv’s picture

We started by replacing spaces. ITMT I have known urldecode()/urlencode(), as OP already mentioned.
I am not sure why we cannot use those?

   public function getContent() {
     feeds_include_library('http_request.inc', 'http_request');
-    $result = http_request_get($this->getValue());
A+    $result = http_request_get(urlencode($this->getValue()));
B+    $result = http_request_get(urldecode($this->getValue()));
     if ($result->code != 200) {
      throw new Exception(t('Download of @url failed with code !code.', array('@url' => $this->getValue(), '!code' => $result->code)));
     }
     return $result->data;
   }

imclean’s picture

@febbraro, #12 works for me. Thanks!

emackn’s picture

JohnV, I'm not sure what you're getting at.

johnv’s picture

Well, the patch does its job, but I think 20 lines of custom code seems an awful lot just to remove spaces. There should be some standard php-fucntion to do that. And if we have this problem with spaces, I can imagine we have this problem for accented-characters, too - we just didn't test it.

emackn’s picture

I tested the patch which passed all the tests. I also went in and tried to add in the use of urlencode and the tests started failing. If you look at the patch it's not really that much custom code, just a little refactoring and new tests. If someone else can also verify, I'd like to update this to RTBC.

emackn’s picture

Status: Needs review » Reviewed & tested by the community

#12 looks good to me for now.

cerkio’s picture

#12 work for me.

emackn’s picture

Status: Reviewed & tested by the community » Patch (to be ported)

committed to 7.x-2.x

febbraro’s picture

Version: 7.x-2.x-dev » 6.x-1.x-dev
rogarbe’s picture

#7 work good, thank you johnv.

I have solved the problem to get files with whitespaces in URL adding %20, but i have several sizes of image and i needed my uploaded files were accesibles by drupal. With the modules http://drupal.org/project/filefield_paths , http://drupal.org/project/transliteration this works perfectly.

arski’s picture

Status: Patch (to be ported) » Needs work

hmm, first of all, when could one expect this in the 6.x branch?

But at the same time - I don't think a special fix for just the %20 is really good. I have a client who has lots of %2C and other url-encoded characters in their image links.. after an import, these suddenly become %252C - so the percentage sign itself gets re-encoded..

I think a more generic fix would be a much better solution, that's why I'm posting it here right away.

Cheers

twistor’s picture

Status: Needs work » Patch (to be ported)

@arski, there's still no proposed patch for 6.x so it's still needs backport.

If you have a specific issue that this patch created, please open a new issue.

jomarocas’s picture

Issue summary: View changes

im still have the issue, i import a csv with src="/sites/default/files/images/stories/2012/College school/DSC_3490(1).jpg" but in the body image, appears of this mode src="/sites/default/files/images/stories/2012/College" i dont know if the problem is for the module feeds or for the module ckeditor

picacsso’s picture

@jomarocas

Did you ever find a way around this?
I'm currently running into the same issue

I have both url encoded and non encoded source paths to the file download location. Both are failing.

twistor’s picture

Status: Patch (to be ported) » Closed (outdated)