(line 825 of modules/image/image.module)

Looks like a race condition. Other reports of this same error:
http://drupal.org/node/1565980
http://drupal.org/node/1251492

Digging into the code image_load() hits image_get_toolkit() which looks like it returns imagemagick. Looking deeper image_get_info() appears to be the call that is failing.

Looking into the logs more closely and I see that I'm getting this error

Notice: getimagesize(): Read error! in image_imagemagick_get_info() 
(line 350 of sites/all/modules/imagemagick/imagemagick.module).

Trying to repo this is very hard as it is a race condition from what I can tell.

Comments

mikeytown2’s picture

Version: 7.x-dev » 8.x-dev
Status: Active » Needs review
StatusFileSize
new1.21 KB
new1.26 KB
mikeytown2’s picture

Status: Needs review » Needs work

Surprised above patch didn't fail ASAP as i forgot a semicolon in it.

mikeytown2’s picture

Status: Needs work » Needs review
StatusFileSize
new1.21 KB
new1.26 KB

Round 2

marcingy’s picture

Status: Needs review » Needs work
+    while (empty($image) && $counter < 10) {
+      $image = image_load($derivative_uri);
+    }

The counter is never incremented.

Also would make more sense as we always want to run the code at least once?

do {
  $image = image_load($derivative_uri);
  $counter++;
} while(empty($image) && $counter < 10);

Also

+      $uri = $image->source;
+      $headers = array(
+        'Content-Type' => $image->info['mime_type'],
+        'Content-Length' => $image->info['file_size'],
+      );

This assignment are pointless why not just

   if (!empty($image)) {
     file_transfer($image->source, array('Content-Type' => $image->info['mime_type'], 'Content-Length' => $image->info['file_size']));
   }
marcingy’s picture

Actually why not something like this

+  $image = FALSE;
   if ($success) {
     if (!empty($lock_acquired)) {
       lock_release($lock_name);
     }

-    $image = image_load($derivative_uri);
-    file_transfer($image->source, array('Content-Type' => $image->info['mime_type'], 'Content-Length' => $image->info['file_size']));
+    $counter = 0;
+    do {
+      $image = image_load($derivative_uri);
+      if (!empty($image)) {
+        file_transfer($image->source, array('Content-Type' => $image->info['mime_type'], 'Content-Length' => $image->info['file_size']));
+        break;
+      }
+      else {
+        $counter++;
+      }
+    } while($counter < 10);
   }
-  else {
+  if (empty($image)) {
marcingy’s picture

Status: Needs work » Needs review
StatusFileSize
new1.25 KB
new1.16 KB

Re-roll based on the do while approach.

mikeytown2’s picture

Works for me :)
Add in some code comments explaining why this is needed. Also should add in a 100ms usleep. With those 2 in place I see this going RTBC.

marcingy’s picture

StatusFileSize
new1.62 KB
new1.54 KB

New versions with comments and usleep.

Status: Needs review » Needs work

The last submitted patch, 1762772-d8.patch, failed testing.

mikeytown2’s picture

Sorry that wasn't clear. I mean Sleep for 100ms (milliseconds) usleep(100000);. This means that it will wait up to 1 second max 10*100ms = 1 second.

marcingy’s picture

Status: Needs work » Needs review
StatusFileSize
new1.62 KB
new1.53 KB

Doh - take 2 :)

Status: Needs review » Needs work

The last submitted patch, 1762772-d8.patch, failed testing.

mikeytown2’s picture

Status: Needs work » Needs review

#11: 1762772-d8.patch queued for re-testing.

Failure looked bogus. Retesting

Status: Needs review » Needs work

The last submitted patch, 1762772-d8.patch, failed testing.

marcingy’s picture

Status: Needs work » Needs review

#11: 1762772-d8.patch queued for re-testing.

kumkum29’s picture

Hello,

do you have a valid patch for Drupal 7?

Thanks.

mikeytown2’s picture

@kumkum29
#11 has 2 files attached to it. One has "d8" in it, the other has "d7" in it. The d7 one is for Drupal 7 :)

j0rd’s picture

Correct me if I'm wrote but I believe this patch has a race condition.

Shouldn't we be doing the usleeps() waiting for the to file_exist() before we release the lock.

Otherwise the file will not exist, and another request can come through and see that the file does not exist, successfully acquire the lock again and attempt to regenerate the file.

It would also simply the code in this patch as I believe you could just loop over file_exists before lock_release().

Additionally the D7 patch no longer applies due to the ?itok changes in latest 2 releases.

Feedback is required, as I believe this lock business could be bringing down my server after I installed imagecache_token's and started seeing a lot of missing image style requests via watchdog. Server was brought down yesterday with out this patch, looking in to it is how I found this issue and patch.

This seems very similar to an issue I had on a D6 project in rebuild_menu with poorly written lock code, although this doesn't seem as bad as we check for the return value of lock_acquire()
#246653: Duplicate entry in menu_router: Different symptoms, same issue comment #228

j0rd’s picture

StatusFileSize
new749 bytes

Here's some code to review to further understand my thinking

Code is for D7.

Status: Needs review » Needs work

The last submitted patch, 1762772-d7-do-not-test_18.patch, failed testing.

mikeytown2’s picture

Either way will work. Race condition will not happen.

Path A
image_load() -> image_get_info() -> is_file() -> Returns FALSE.

Path B
image_load() -> image_get_info() -> image_toolkit_invoke() -> image_gd_get_info()/image_imagemagick_get_info() -> getimagesize() -> warning or notice generated because file can not be read (file 1/2 done being generated) -> Returns FALSE.

Path B protects us from really slow write speeds. So the file does exist but the file is not fully written thus getimagesize will fail and the loop will try again 100ms later.

j0rd’s picture

The case I'm suggesting will cause multiple calls to image_style_create_derivative() to occur when only one is required.

This is the condition I'm talking about.

Because right now we have an issue where image_style_create_derivative() is finished by first thread, file does not exist yet, but lock is released. This allows the second thread to go through, file potentially won't exist and image_style_create_derivative() is called again.

This will overwrite the previous file (i believe) and create potential havoc no?

mikeytown2’s picture

Without this patch #1377740: file_unmanaged_move() should issue rename() where possible instead of copy() & unlink() there is potential for havoc.

The issue with your patch is the file will be available on the webhead that created it but it may not be available on other web heads; due to the local file cache (depending on the shared FS).

Have you ever seen this message in your logs in regards to a file being created? 'file', 'The file permissions could not be set on %uri.' The reason I ask is if you have then you patch is valid, if you have not then it won't help. I'm would bet that chmod() would return false if file_exists also returns false.

image_style_create_derivative() -> image_save() -> image_gd_save() -> file_unmanaged_move() -> file_unmanaged_copy() -> drupal_chmod()

In reality we most likely need both patches. Yours doesn't release the lock until the file exists on the webhead that saved it. Mine makes sure the file is readable if the file exists. Both cover different code paths in image_style_deliver()

j0rd’s picture

It looks like you have a good grip on this issue. It's probably best that you create the patches to resolve it.

file_unmanaged_move() copy & unlink patch needs to get into core. Seems like this could be a likely cause of this happening more often than it should.

And yes, the reason I found this issue was because of $file->uri not being set, after I installed the imagecache_token's module.

Here's the issue for imagecache_token's with some hacky fixes.
#1600266: no more uri property returned

After applying my patch, and some hacky fixes to imagecache_token module, I'm not longer seeing the empty($file->uri) issue though.

mikeytown2’s picture

Merged #19 & #11, D8; D7 once tests pass.

mikeytown2’s picture

Status: Needs work » Needs review
j0rd’s picture

I can test a D7 patch for you (on a live server with traffic), but not a D8 as i don't have it installed.

marcingy’s picture

Haven't tired the patch in #25 yet but the patch in #11 has been running on our live site since sometime at the end of January and we have had no issues. I will try update the patch tomorrow but it may be a couple of weeks before it actually hits the live server. The site in question has a decent amount of traffic and a good amount of nodes with images being created on a daily basis.

mikeytown2’s picture

Version: 8.x-dev » 7.x-dev
StatusFileSize
new2.02 KB

#25 but for D7

j0rd’s picture

Does it make sense to return anything other than a 500 error? 503 with a retry-after header (choosing a time is impossible to predict). 500 errors for me mean, something went wrong, while this error is most likely "resource is not yet available."

I'll put the patch on my live server today, and check the logs later and let you know how it works.

marcingy’s picture

Discussing changing the error status seems like a different and follow up issue to me, lets keep this focused on fixing the race condition.

j0rd’s picture

Alright, I've made this D7 patch live on my servers. Going to have a wait a couple days to see if I get any error messages. Although since they were removed with only my 1/2 fix in #19 and this patch also includes that, I doubt I will have any problems.

PS. The D7 patch could use a little formatting & logic clean up before it's RTBC. Difference in comment new lines bugs me at least.

mikeytown2’s picture

@j0rd
Upload a patch with the clean up in place so we can RTBC this :)

j0rd’s picture

I believe my "cleanup" was referring to checking to see if the lock was released succesfully and performing additional logic. In all honesty I have no idea.

With that said, I've had this patch live since may 9th, and have yet to run into this issue again.

RTBC for me.

mikeytown2’s picture

Status: Needs review » Reviewed & tested by the community
David_Rothstein’s picture

Version: 7.x-dev » 8.x-dev
Issue summary: View changes
Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs backport to D7

This was never committed to Drupal 8.

Also not clear this is Drupal's bug to fix, or if it is, that it's being fixed in the right place. When the image is saved, the image toolkit's save function is called, followed by clearstatcache() (see code in image_save()). If the file doesn't reliably exist after that point, is that really Drupal's bug to fix? And if it is (i.e. if we want to work around the multiple webhead issue), shouldn't it be fixed somewhere in that code instead?

David_Rothstein’s picture

Also, if the lock is being released too early, why not just move the lock release code later (rather than having two independent do-while loops)? There should really only be a need for a single do-while loop.

mikeytown2’s picture

Looks like D8 has dramatically changed since the patch in #25 was created and thus needs to be re-rolled. The D7 code doesn't have any existing loops from what I can see. Is this the location of the image style generation code in D8 ImageStyleDownloadController::deliver? Also appears that D8 doesn't have any loops currently ImageStyle::createDerivative. All I see is simple locking logic in D7 & D8.

If the file doesn't reliably exist after that point, is that really Drupal's bug to fix? And if it is (i.e. if we want to work around the multiple webhead issue), shouldn't it be fixed somewhere in that code instead?

Fixing it higher up in the code has been attempted in core but hasn't gotten anywhere.
http://mikeytown2.drupalgardens.com/content/all-file-operations-future-v...
#1377740: file_unmanaged_move() should issue rename() where possible instead of copy() & unlink()
I agree that fixing it in the image code isn't ideal. Can you point me to what code is actually ran when $image->save($derivative_uri) is called so that I can mark this issue a duplicate of the correct issue or create a new one?

Noted that this might be fixed in D8 if $image->save($derivative_uri) uses atomic file operations. Also noted that a fix could also be present depending on how

  if ($success) {
    $image = $this->imageFactory->get($derivative_uri);
    $uri = $image->getSource();

works. Example atomic save using D7 API's in advagg_save_data()

claudiu.cristea’s picture

@mikeytown2,

Here's the generic Image::save() — called in $image->save($derivative_uri).

This one calls the toolkit specific save. By default GDToolkit::save() is called.

mikeytown2’s picture

Status: Needs work » Closed (duplicate)

Going to mark this issue as a duplicate of #1377740: file_unmanaged_move() should issue rename() where possible instead of copy() & unlink() as GDToolkit::save() uses file_unmanaged_move(). If this is indeed a separate issue, please reopen with a good example/explanation.