Problem/Motivation
The weakest area of the current Migrate API is file handling, both as destination entities and as file/image fields. We need to do significant refactoring to make the code more robust, more flexible, and more consistent between the destination plugin and the file field handler. The plan is to address this for Migrate 2.4.
Proposed resolution
It will most likely be very difficult to meet our goals by changing the existing MigrateFileFieldHandler and MigrateDestinationFile, without breaking existing migrations. The current plan is to replace them, and deal with the breakage.
Remaining tasks
Internal refactoring - make clear separation between the stages of file-handling: obtaining source file/data, writing to destination, constructing file entity.
This is not specifically a file issue, but the file handling is more dependent than any other component on being able to take multiple options and arguments, so this is an important pre-requisite:
#1279778: Improve argument passing to fields
File-specific issues:
#1447166: Make file_functions pluggable
#1120136: Option to flatten or preserve file hierarchies on import
#1308102: Support specifying the destination file name/path
#1205278: Prevent file deletion on rollback
#1358128: Problem detect the mime type in file.inc
#1400284: Offer options on how to handle uri collisions
#1430782: Is it possible to import into multi-value file fields from BLOBs?
User interface changes
None anticipated.
API changes
TBD.
Original report by mikeryan
- For media destinations, if the core file handler accepts a bundle in the constructor, it should work just fine - no MigrateDestinationMedia necessary.
- For media fields, at least for the most basic case all that's needed is an alternative initialization of $destination_dir (file_field_widget_uri won't work). Is it really as simple as file_default_scheme()? Anyway, if the file handler get this from a method, then the media file handler can just override that to pick up the right scheme.
This is based on a little light experimentation, there may be more involved, but these two items should help significantly.
| Comment | File | Size | Author |
|---|---|---|---|
| #14 | file_refactor-1240928-14.patch | 74.06 KB | mikeryan |
| #12 | file_refactor-1240928-12.patch | 49.96 KB | mikeryan |
| #5 | 1240928-5-migrate_file_copy.patch | 1.52 KB | ekes |
| #4 | 1240928-migrate_file_copy.patch | 1.67 KB | amateescu |
Comments
Comment #1
mikeryanI did commit a couple of tweaks that were helpful - however, as I dig deeper (e.g., to try to support media_youtube), I'm starting to think we may need to do some significant refactoring. For one thing, adding hard-coded file_functions complicates the code while limiting flexibility - the file_functions should be pluggable, so derived migration integrations for media, media_youtube, etc. can add their own. That's a bit much to do right away - my thinking at the moment is to wrap up a 2.2 release with a little further cleanup of the file support, and address refactoring for 2.3.
Comment #2
mikeryanSee also #1340404: Renaming destination file during migration - the field handlers should be flexible enough to permit renaming during migration.
Comment #3
darksnowI had a discussion regarding renaming files when moving them. For those looking for a work around for this issue, there is food for thought here: http://drupal.org/node/1364198
Far from a work around, just an idea, but in the absence of this ability in migrate it might help.
Comment #4
amateescu commentedHere's a start for adding more flexibility to the file copy behavior.
Comment #5
ekes commentedI'm using at the moment. Will add further feedback, with experience. Now just attaching exactly the same patch but based on the standard project root rather than a webroot for easier application.
Comment #6
mikeryanRewrote the original issue, pointing at the specific issues we want to address. I want to get Migrate 2.3 out the door - I'll release 2.3-rc1 today, hopefully have a final 2.3 release within a week or two, then this will be the first priority after that.
Comment #6.0
mikeryanMake into a meta-issue
Comment #7
mikeryanComment #8
dixon_So this issue will be worked on for the Migrate 2.4 release? The OP still says 2.3...
Comment #8.0
dixon_Add 1348322
Comment #9
dixon_Since this issue was tagged with "Migrate 2.4" I updated the Issue summary to reflect that as well.
Comment #9.0
dixon_Updated issue summary.
Comment #10
mikeryanI fixed it back - as it says, this work is to be done after the 2.3 release - i.e., for 2.4.
Comment #10.0
mikeryanFix version
Comment #10.1
mikeryanIssue wasn't really about files specifically
Comment #11
mikeryanJust a note that I am actively working on a complete rewrite of the file handlers. I believe at this point, rather than break code using the current file handler implementations, or seriously complicate the rewrite trying to provide backwards compatibility, it's best to implement brand-new handlers and deprecate the old ones.
My approach is to offload most of the real work to a class derived from the new MigrateFileSource, specifiable in the field mappings, e.g.:
You'll thus be able to implement your own source classes for special cases. Also, this will enable sharing most of the logic between the destination and field handlers, which right now are entirely separate.
Thoughts?
Comment #12
mikeryanAt long last, a first stab. My next step will be working on #1535786: Update media support for Migrate 2.4, which I'm sure will suggest some tweaking of the approach, but the curious may peruse where this is going. The key is the file class (implementation of MigrateFileInterface) - there are three such classes to choose from, FileMigrateUri (the default - migrating from a local file or a URI), FileMigrateBlob (migrating from blob data to a real file), and FileMigrateFid (attached a file field to a previously-migrated file entity).
Some examples of how to use this version of the file handling (from migrate_example):
Now, the addition of this shouldn't affect existing migrations using the old classes. It does need to be noted, though, that on admin/content/migrate/configure one should have only one of MigrateFileFieldHandler and MigrateFileField2Handler enabled. Running update.php/dbupdate will disable the new field handler - you'll need to go to the configure screen to switch to it when you're ready to start using it.
Comment #13
mikeryanComment #14
mikeryanOK, I think this is ready now. At this point, my inclination is to bite the bullet and replace the former file handlers entirely with the new ones - there will be some upgrade pain for 2.4, but I think it's worth it.
The beer.inc example of using the image field handler is now:
and the wine.inc example of using a file destination to migrate a blog looks like:
There's no way I can test every possible scenario myself - my inclination is to be aggressive, commit this soon and get a beta release with it out as soon as possible.
Comment #14.0
mikeryanUpdates
Comment #15
jelle_sI was directed here from #1539156: 'skip_empty' argument for MigrateFileFieldHandler
I would also like to mention that file source paths don't get urlencoded when it's a remote path. This resulted in some warnings saying files were not found because spaces didn't get urlencoded. I had to manually add a callback to urlencode the source values, this resulted in 'ugly' file names (%20 vs a space, ...) but for the latter a solution was already proposed here (renaming files). Since I was redirected here from that other issue, I didn't want to make an extra issue for this and thought this would be a better place to mention it.
Comment #16
mikeryanDocumentation at http://drupal.org/node/1540106.
Comment #17
mikeryanCommitted. I'm sure it's not perfect, but I think it will be best to get people banging on this and opening new issues for individual issues that are identified.
See http://drupal.org/node/1540106 for details on how to implement file migrations in the new world.
Comment #19
dgtlmoon commentedSo how do i rename files on the fly then? Is there something I can specify in the addFieldMapping(..)->arguments(...) ?
thanks
Comment #19.0
dgtlmoon commentedNew plan