We've been experiencing a very bizarre and hard to diagnose issue with images that are pushed to S3 not having their database records properly updated to reflect they are propagated. This is leading to random broken images on the site (as it's trying to use the local filesystem path), despite the fact that the file exists on S3.

Basically it seems to have a problem with files with similar names.

For instance, right now I have two files on S3 with the "same" name: Entrance.jpg AND a file called entrance.JPG - they are different images on S3, and downloading them from S3 gives me two different images.

The storage_instance table says entrance.JPG is on S3, but the storage table's serving_container field still refers to the filesystem.

Here's where it gets weird - in the file_managed table, the filename has been re-named entrance_1.JPG. I assume this is because it checked the database for files of the same name. The storage_core_bridge table also refers to this file as entrance_1.JPG.

However, there is no entrance_1.JPG file on S3 - only a entrance.JPG file as it was originally uploaded. I'm thinking this might be a problem with S3 accepting Entrance.jpg and entrance.JPG as distinct files which is confusing the propagator.

Comments

jmking’s picture

I've tracked down this bug to an issue with the serving_container cache not being cleared or updated after an image has been moved off the filesystem to S3. Setting the serving_container field to NULL for the images affected causes this cache to be rebuilt and the images start appearing again.

jbrown’s picture

Version: 7.x-1.2 » 7.x-1.x-dev
Status: Active » Postponed (maintainer needs more info)

Sorry - I can't replicate this. Can you provide steps to reproduce on 7.x-1.x-dev ?

jmking’s picture

  1. Create a class with one S3 container and the initial container being Filesystem
  2. Create a content type with an image field
  3. Create a new node, upload a photo with filename "entrance.JPG"
  4. Run cron to propagate images to S3
  5. Create another node, upload a photo with a filename "Entrance.jpg"
  6. Run cron to propagate images to S3
  7. Review both nodes (with all image variations shown - thumbnail, medium, large) and see if any images appear broken

My suspicion is that when the file is pushed to S3, some code tries to find the appropriate row in the Storage table to update or clear the serving_container cache by searching by filename which isn't reliable at this point. It's probably clearing the serving_container field for the wrong row since they technically have the same file name.

We noticed this bug because the site is a real estate site. So photos with common file names would regularly appear broken (entrance.jpg, foyer.jpg, livingroom.jpg etc etc).

jbrown’s picture

Can you specify the exact version of Drupal core that you are using?

jbrown’s picture

Drupal 7.14 fixed #966210: DB Case Sensitivity: system_update_7061() fails on inserting files with same name but different case - do you still get the problem with that version?

Also which operating system is it running on?

jmking’s picture

It's Drupal 7.14, and CentOS 5.8.

Drupal's subsystems are all working correctly. It has specifically to do with the serving_container field on the storage table not being correctly updated upon a successful upload.

My current workaround is an hourly clean up cron task that checks for disputes between the storage_instance table and the serving_container field on the storage table. If the storage_instance table says it's on the S3 container, but there is text in the serving_container field that says "StorageFS", then I clear the serving_container field.

Then the Storage module correctly populates serving_container the next time the file is requested.

The query below is what I have to run hourly to keep images from appearing broken:

UPDATE d_storage s LEFT JOIN d_storage_instance i ON i.file_id = s.file_ID SET serving_container = NULL WHERE container_id = 3 AND serving_container LIKE '%StorageFS%'
...where 3 is the container_id for our S3 bucket. To anyone reading this with the same issue, you'd obviously change that to the appropriate ID for your S3 bucket.

This is doing the trick in the meantime, but I intend to come back to this and track down the actual issue when I have some free time and hopefully offer a patch.

Perignon’s picture

Issue summary: View changes

There have been some update to the S3 service and this issue is 2 years old.

Let us know if this is still a problem and re-open the issue here. I'm trying to get to a state that we can move this module forward.

Perignon’s picture

Status: Postponed (maintainer needs more info) » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.