Hello. I am using the filefield mapper to download images from remote URLs. This works well, however the mapper uses the FeedsEnclosure class and the getFile function to perform the download. There are a few bugs that prevent this from working well.
1) Special characters in filenames. The filefield mapper will fail if the file it tries to download has a space. The same is true if it has & or & in the filename. The result is the node is created correctly, but the file is not included in the node.
2) If the remote file does not exist, the if statement at the bottom of the getFile function will throw an exception and stop the import. I think this is undesirable behavior, since I can't assume that my 10,000 imported items always have a correct image. I'm not in control of the remote end, and I'd rather the import continue on and just tell me in the logs about the failure.
3) The mapper does not clean up /tmp, so you wind up with tens of thousands of downloaded images.
I have rolled a patch for the first two since they break my process. To fix #1, I use a str_replace to re-map those characters to underscores. To fix #2, I have changed the throw exception to watchdog. I'm not sure yet how/where to tackle #3.
Patch attached. Thanks for considering it and I'm more than open to different approaches to solving this issue.
| Comment | File | Size | Author |
|---|---|---|---|
| FeedsParserPatch.patch | 1.08 KB | rjbrown99 |
Comments
Comment #1
rjbrown99 commentedThis is now inter-related to #706908: Enhance filefield mapper to perform more validation on remote URLs, so I'm marking this as a dupe.