Improve support for Filefield_paths to tokenise image filename and path

sp3boy - August 31, 2009 - 20:04
Project:Image
Version:6.x-1.x-dev
Component:image.module
Category:feature request
Priority:normal
Assigned:Unassigned
Status:needs review
Description

The attached patch should hopefully resolve a couple of problems I had getting Filefield_paths to tokenise image filenames and paths:

  • by calling node_save() in image_operations_rebuild() instead of image_update(), the node api hooks are run, allowing Filefield_paths to override the image's file / path name.
  • by not clearing $node->rebuild_images in image_update(), the object property is still set when Filefield_paths goes to work on it (see related patch in #564680: Add support for bulk rebuilds of derivative files in Image module).
AttachmentSize
image-support-for-filename_fields.patch875 bytes

#1

joachim - August 31, 2009 - 20:52

> by calling node_save() in image_operations_rebuild() instead of image_update()

Thing is, we call image_update in TONS of places to force a rebuild of derivatives.
It's pretty ugly -- precisely because of this sort of thing. Hook implementations should only be hook implementations.
There's a case for refactoring and having a nice _image_rebuild_derivs() function that's safe to call whenever -- but that's a big job and it's spaghetti in there, and we're near to a release.

#2

sp3boy - September 1, 2009 - 13:18

I'm not sure whether I should be looking at other modules as well (suggestions?) but in the current Image HEAD, image_update() is only called twice: once in the image_operations_rebuild() and also in image_load() (the call to which will move to another function if #226121: don't manipulate images on hook_load is adopted).

The setting of $node->rebuild_images does happen in quite a few places within image.module, sure.

Doing the search for image_update() has got me thinking that if it were to be replaced by node_save() it should be done consistently, hence the replacement should also be in image_load() (as-is). I will look into this.

#3

sp3boy - September 1, 2009 - 13:19
Status:needs review» needs work

Volunteering status change :)

#4

joachim - September 1, 2009 - 13:48

Check all the submodules -- attach, gallery, etc.

#5

joachim - September 1, 2009 - 13:51

Check all the submodules -- attach, gallery, etc.

#6

sp3boy - September 1, 2009 - 15:07

FWIW I did search all the contrib submodules and for example, image_create_node_from() in image_attach.module calls node_save().

BTW is anyone else experiencing serious timeout problems with D.O in recent days? I couldn't help notice the double posting above...

#7

joachim - September 2, 2009 - 08:10

Yup, pretty much everything broke on teh internets yesterday for people at Drupalcon.

#8

sp3boy - September 4, 2009 - 18:16

Just to confirm that I am still on the case and it has become apparent that calling node_save() instead of image_update() in the recently-introduced _image_build_derivatives_if_needed() is not a good idea. I had not forseen the implications of calling node_save() within a hook, itself run within node_load(). In practice I think filefield_paths attempts to substitute token values which are derived from node properties at a point when they have not yet have been populated on the node object.

So it may be workable to call node_save() in image_operations_rebuild() for the benefit of filefield_paths but to continue calling image_update() when missing derivatives need to be reconstituted (which as we know from #411568: derivative paths should be taken from the corresponding original file, and not from the default should be happening in the same path as the "original" image file).

I will continue investigating and BTW it's great to see beta1 released!

#9

sp3boy - September 16, 2009 - 18:14
Status:needs work» needs review

Here is a revised patch. The main change is that in order for Filefield_paths to work effectively with Image, I believe it is essential that for Image files, the {files}.filename column contains the name of the file, not the derivative size label as Image currently stores. Hence there is:

  • a change to _image_insert() to store the filename not the size in {files}.filename
  • a change to one line in image_gallery.pages.inc to use {image}.imagesize instead of {files}.filename
  • an image_update_6104() function which updates any existing image files in the database.

(This main change effectively takes over from the similar discussion in #424386: images saved to files table with bad data in filename column I think.)

There is still a change to image_operations_rebuild() so that bulk updates are processed with node_save() not image_update(), as mentioned previously.

In a couple of places I have had to add some code that is conditional on Filefield_paths being installed, so that the {files}.origname is retrieved to ensure that derivatives are created with the file's original name, to allow Filefield_paths to use that when tokenising the derivatives.

In image_create_node_from() I have added a call to _image_build_derivatives() to ensure that the derivative files physically exist before the node_save() is performed.

I have tested all the above (in conjunction with #564680: Add support for bulk rebuilds of derivative files in Image module) exhaustively, but of course it would be silly to claim it is water-tight and I plan to test some more. However I do think that this patch - or the features in it - are essential to complete (hopefully) the integration of Image and Filefield_paths.

AttachmentSize
image-564690-9.patch 5.41 KB

#10

scroogie - October 2, 2009 - 06:57

This seems to make more sense than storing the size in the filename column. Hopefully I'll have time to test it. Any corner cases I should try out?

#11

sp3boy - October 2, 2009 - 12:04

I've tested with basic image upload, image attach and image import. I haven't tested with imagecache and/or other modules that depend on Image so if anyone is able to it might be worth it.

Having said that, I am already by necessity running this patch together with #564680: Add support for bulk rebuilds of derivative files in Image module on two live sites (due to lack of any other option functionality-wise).

#12

joachim - October 3, 2009 - 11:59

> In image_create_node_from() I have added a call to _image_build_derivatives() to ensure that the derivative files physically exist before the node_save() is performed.

I'm increasingly confused about all the different places _image_build_derivatives() is called from and the flow around node creation. Doesn't this mean that a flow coming in from image_create_node_from() will get derivatives built twice?

#13

sp3boy - October 5, 2009 - 17:31

As I understand it, in order of appearance in image.module:

  1. When the "rebuild derivative" functionality is used from node edit or admin bulk updates, _image_build_derivatives() is called during image_update() which is itself called from node_save(). My patch introduced node_save() in image_operations_rebuild() instead of image_update() to try and run the same functions for a bulk operation as for a node-edit-and-save sequence.
  2. When a new image is uploaded during image node creation, _image_build_derivatives() is called during image_validate()
  3. When an image node is loaded for display, _image_build_derivatives() is called via _image_build_derivatives_if_needed() during image_load()

The problem I was having is that Image Import and Image Attach use image_create_node_from() but rely on the derivatives being created on the fly when the node is loaded again.

I thought the solution was to invoke _image_build_derivatives() in image_crate_node_from() so that reliance is eliminated.

However in the course of refreshing my memory to give you an answer, I think I have spotted a little problem.

Yes, we can ensure that when Image Import and Image Attach use image_create_node_from() it creates derivatives before the node_save(), however I think there is still an issue with re-creating missing derivatives. This is done with the build-on-the-fly approach too, which uses image_update() but does not perform a node_save(), hence may not invoke Filefield_paths.

I will re-test that side of things tomorrow and report back as soon as I can.

#14

sp3boy - October 6, 2009 - 15:08

There was a problem in that missing derivatives were created during node load with their original filename if Filefield_paths was installed and configured to rename image files. I'm not sure how I missed that in all the testing.

The approach of the attached patch in respect of this is now to:

  1. take out my previous change to call _image_build_derivatives() in image_create_node_from(). This should avoid performance issues in the event of a large image import.
  2. add a line to _image_build_derivatives_if_needed() so that, if it decides that some derivatives are missing (which is the case during the first node load directly after an image attach or image import) set a property on the node object to indicate that a node_load() is in progress. This property is tested by _image_build_derivatives() and _image_insert() and in each case, if it is set, the function does not attempt to look up the original name of the image file but creates derivatives based on the current name of the "original" image. Hence the derivatives match the original without having to invoke the Filefield_paths hooks.
AttachmentSize
image-564690-14.patch 5.82 KB

#15

sp3boy - October 7, 2009 - 14:54

I found another little problem with the previous patch for the following case:

  • an image's filename has been renamed using Filefield_paths
  • the image's derivatives files have been manually deleted
  • the image node is viewed, prompting the _image_build_derivatives_if_needed() to cause their re-creation
  • although my previous patch would ensure that the derivatives' physical filenames are based on the current physical filename of the ORIGINAL file, the origname column in {files} was not being populated by on-the-fly rebuilds
  • this did not matter much until the Filefield_paths retro-active rebuild was used. This simply finds all the files associated with a node and puts them through the renaming process. For those derivatives with empty origname columns, Filefield_paths falls back on the current physical filename so the the end result for the derivatives was a doubling-up of any tokenised name changes.

The attached patch takes a more sophisticated approach in _image_insert():

  • the original name is retrieved from the ORIGINAL image's {file} entry even if a node load is in progress (change from previous patch)
  • if a node load is in progress the derivative images are created using the physical filename of the ORIGINAL (same functionality as previous patch, test for new node property had to be moved down)
  • if a node load is in progress and the original name is known, for non-ORIGINAL images the derivative's "original derivative filename" is constructed by passing the original name to _image_filename() and that is assigned to the origname column when populating {files} (addition to previous patch)

Apologies to anyone who has spent time testing the previous patches (?anyone...), the goal is still a bug-free integration of Filefield_paths and Image.

AttachmentSize
image-564690-15.patch 6.33 KB

#16

sp3boy - November 4, 2009 - 16:39

#564680: Add support for bulk rebuilds of derivative files in Image module has been commited to Filefield_paths, as has #373094: Retroactive like behavior when changing token source which allows path changes to be applied automatically when a node is edited.

I have done more testing and have no outstanding problems with the last patch submitted for this issue.

 
 

Drupal is a registered trademark of Dries Buytaert.