Hi,

the new release might have introduced a regression in regard to CPU consumption/processing time; 6.x.3.0 was running robust on a dozend sites; the new release gives me sporadic WSODs when running ./admin/settings/file-system/transliteration.

The WSOD is probably caused by a timeout; Apache's error log reports:

[Sun Jun 03 16:10:24 2012] [error] [client 12.34.56.78] PHP Fatal error: Maximum execution time of 60 seconds exceeded in /var/www/drupal/includes/database.mysqli.inc on line 142, referer: http://{mydomain}/admin/settings/file-system/transliteration

When running the transliteration, the CPU load spikes to 5-6 on a dedicated Xeon server, when the normal load is 1-2. With 6.x.3.0, I have never encountered similar behaviour, neither those load spikes, nor timeouts.

So far I can not provide a procedure to reproduce the timeout; when lloking through the changelog, #1576986: Allow conversion of existing filenames to lower-case might be causing the WSOD; if this assumption is correct, the issue probably would only occur once per site; I'll try to verify this.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

amateescu’s picture

Yep, that change has quite an impact because now the filename case is respected in MySQL and that could bring up a lot of previously untransliterated files.

If you can, please do investigate further :)

asb’s picture

Category: bug » task

I did some tests to see if the WSOD is a "once in a lifetime" error, or if it disrupts day to day operation as well.

Small site:

  • Before running the transliteration: "The database currently lists 7,753 filenames containing non-ASCII characters."
  • CPU load before transliteration was started: 0.7
  • CPU load during transliteration: ~1.0

Result: No WSOD, no PHP fatal error; everything's fine.

Medium-sized site:

Test case a)

"The database currently lists 74,693 filenames containing non-ASCII characters."; CPU load before transliteration was started: 0.7.

Load of 2 after one minute; load of ~3 after 1:30; then staying around 2.5 for the next minutes, then WSOD with "PHP Fatal error: Maximum execution time of 30 seconds exceeded..." after about 5 minutes; the load stays around 2 for some more minutes and goes back to 0.7 after a total time of about 10 minutes.

Reloading ./admin/settings/file-system/transliteration now says: "The database currently lists 36397 filenames containing non-ASCII characters." Repeating the transliteration process once more behaves similarily to the first run: increased CPU load for several minutes with intense RAID activity, but this time no WSOD, but a long error screen listing the filenames that could not be transliterated ("Not all file names could be converted. The following files could not be accessed and have been ignored: [...]"/"Nicht alle Dateinamen konnte konvertiert werden. Auf die folgenden Dateien konnte nicht zugegriffen werden und sind ignoriert worden: [...]"); Now we're at "The database currently lists 1,594 filenames containing non-ASCII characters." Additional transliteration runs take only seconds, but the number of 1,594 filenames containing non-ASCII characters does not decrease.

Why transliteration fails on file names like 2002-03-03_Datum-einfügen.jpg (./sites/default/files/bilder/2002-03-03_Datum-einf) I do not understand. However, this is probably a completely different issue.

Assumption: "Transliterate" can process (on this system) around ~40k files until PHP bails out. Running transliteration a 2nd (3rd, 4th...) time hopefully finishes the process without leaving damage.

Test case b)

Before transliteration: "The database currently lists 105,052 filenames containing non-ASCII characters." The behaviour from test case a) repeats itself more or less; in this case I can only transliterate ~50% of the filenames. "The database currently lists 49,951 filenames containing non-ASCII characters" is the final number. Additional transliteration runs do not decrease this number.

Result: I assume that the WSOD is triggered by the new case conversion and does only occur a limited number of times, until the exisiting files are successfully transliterated. Probably depending on the server's performance, small sites probably won't encounter any WSODs at all, medium size sites will probably get a small number of "PHP Fatal errors"; more serious problems are to be expected with really image-heavy sites with several 100k files. Thus I'm changing the issue to a task: This behaviour is not a bug, but it should be mentioned on the project page. I don't know if something like Batch API would be an option, but timeouts, WSODs and PHP Fatal errors are really not optimal.

Very strange is the behaviour in regard to files that "could not be accessed and have been ignored"; at least some of them exist on the server and should be accessible (owned by the webserver user, readable & writable). However, if I have time I'll investigate this and open another isse, if necessary.

amateescu’s picture

Title: Possible regression: WSOD and PHP Fatal Error when transliterating files » Possible regression: WSOD and PHP Fatal Error when transliterating files, so implement Batch API for this operation

Indeed, seems that Batch API is our only option here. I'm not really sure why we're not using it already for this operation, though I bet the answer is: nobody bothered to implement it :)

Plazik’s picture

Hi.
I think I got a similar problem. After update to 6.x-3.1 I can't access to admin's pages. I got 500 error. If I delete transliteration 3.1 module site works fine.
PHP error logs:

[Mon Jun 04 20:35:17 2012] [warn] [client 111.111.111.111] mod_fcgid: read data timeout in 40 seconds
[Mon Jun 04 20:35:17 2012] [error] [client 111.111.111.111] Premature end of script headers: index.php
asb’s picture

Batch API: Probably this module was initially designed to just run when a (= one) file was uploaded; as far as I remember, it does not work with things like bulk image import (legacy 'image' module), because files are read from a directory on the server (somewhere there was an issue for that). Retroactive processing was probably added to the module later, but maybe even before Batch API was introduced (in D6, if I remember correctly). However, now we need volunteers to make 'Transliteration' use Batch API. For the meantime a warning notice on the project page or in README.TXT has to suffice ;)

#4 appears to be a totally different issue (other symptoms, and other errors in the logfiles. I'd suggest to open another issue for that.

asb’s picture

Priority: Normal » Major

Sorry, I have to push the priority for this a bit. Sadly, just running the transliteration a couple of times doesn't suffice; the database is left in an incoherent state, and this leads to a massive amount of errors. The most obvious symptom appears when just running ./admin/settings/file-system/transliteration again. This gives a massive list of invalid file paths and the error message:

Not all file names could be converted. The following files could not be accessed and have been ignored:

Example for the listed files (the listing actually includes 2,000 missing files):

sites/default/files/bilder/Darjeeling_Limited_01_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_02_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_03_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_04_468x700.jpg
sites/default/files/bilder/Darjeeling_Limited_05_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_06_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_07_700x468.jpg

However, these files do exist, and are read- and writable for the webserver user. Please notice the case:

# ls -lah darj*.jpg
-rw-rw-r-- 1 www-data www-data 325K  3. Jun 15:58 darjeeling_limited_01_700x468.jpg
-rw-rw-r-- 1 www-data www-data 321K  3. Jun 15:58 darjeeling_limited_02_700x468.jpg
-rw-rw-r-- 1 www-data www-data 348K  3. Jun 15:58 darjeeling_limited_03_700x468.jpg
-rw-rw-r-- 1 www-data www-data 190K  3. Jun 15:58 darjeeling_limited_04_468x700.jpg
-rw-rw-r-- 1 www-data www-data 282K  3. Jun 15:58 darjeeling_limited_05_700x468.jpg
-rw-rw-r-- 1 www-data www-data 331K  3. Jun 15:58 darjeeling_limited_06_700x468.jpg
-rw-rw-r-- 1 www-data www-data 353K  3. Jun 15:58 darjeeling_limited_07_700x468.jpg

So 'Transliteration' has transliterated and lower-cased some files, but didn't update the 'files' table properly:

mysql> select filename,filepath from files where filename like 'Darj%';
+-----------------------------------+--------------------------------------------------------------+
| filename                          | filepath                                                     |
+-----------------------------------+--------------------------------------------------------------+
| Darjeeling_Limited_01_700x468.jpg | sites/default/files/bilder/darjeeling_limited_01_700x468.jpg |
| Darjeeling_Limited_02_700x468.jpg | sites/default/files/bilder/darjeeling_limited_02_700x468.jpg |
| Darjeeling_Limited_03_700x468.jpg | sites/default/files/bilder/darjeeling_limited_03_700x468.jpg |
| Darjeeling_Limited_04_468x700.jpg | sites/default/files/bilder/darjeeling_limited_04_468x700.jpg |
| Darjeeling_Limited_05_700x468.jpg | sites/default/files/bilder/darjeeling_limited_05_700x468.jpg |
| Darjeeling_Limited_06_700x468.jpg | sites/default/files/bilder/darjeeling_limited_06_700x468.jpg |
| Darjeeling_Limited_07_700x468.jpg | sites/default/files/bilder/darjeeling_limited_07_700x468.jpg |
+-----------------------------------+--------------------------------------------------------------+
7 rows in set (0.06 sec)

I tried to fix this with the lower() function of MySQL:

Test:

mysql> select lower(filename),filepath from files where filename like 'Darj%';
+-----------------------------------+--------------------------------------------------------------+
| lower(filename)                   | filepath                                                     |
+-----------------------------------+--------------------------------------------------------------+
| darjeeling_limited_01_700x468.jpg | sites/default/files/bilder/darjeeling_limited_01_700x468.jpg |
| darjeeling_limited_02_700x468.jpg | sites/default/files/bilder/darjeeling_limited_02_700x468.jpg |
| darjeeling_limited_03_700x468.jpg | sites/default/files/bilder/darjeeling_limited_03_700x468.jpg |
| darjeeling_limited_04_468x700.jpg | sites/default/files/bilder/darjeeling_limited_04_468x700.jpg |
| darjeeling_limited_05_700x468.jpg | sites/default/files/bilder/darjeeling_limited_05_700x468.jpg |
| darjeeling_limited_06_700x468.jpg | sites/default/files/bilder/darjeeling_limited_06_700x468.jpg |
| darjeeling_limited_07_700x468.jpg | sites/default/files/bilder/darjeeling_limited_07_700x468.jpg |
+-----------------------------------+--------------------------------------------------------------+
7 rows in set (0.03 sec)

Appears to work; make it permanent:

mysql> update files set filename = lower(filename) where filename like 'Darj%';
Query OK, 7 rows affected (0.14 sec)
Rows matched: 7  Changed: 7  Warnings: 0

mysql> select filename,filepath from files where filename like 'Darj%';
+-----------------------------------+--------------------------------------------------------------+
| filename                          | filepath                                                     |
+-----------------------------------+--------------------------------------------------------------+
| darjeeling_limited_01_700x468.jpg | sites/default/files/bilder/darjeeling_limited_01_700x468.jpg |
| darjeeling_limited_02_700x468.jpg | sites/default/files/bilder/darjeeling_limited_02_700x468.jpg |
| darjeeling_limited_03_700x468.jpg | sites/default/files/bilder/darjeeling_limited_03_700x468.jpg |
| darjeeling_limited_04_468x700.jpg | sites/default/files/bilder/darjeeling_limited_04_468x700.jpg |
| darjeeling_limited_05_700x468.jpg | sites/default/files/bilder/darjeeling_limited_05_700x468.jpg |
| darjeeling_limited_06_700x468.jpg | sites/default/files/bilder/darjeeling_limited_06_700x468.jpg |
| darjeeling_limited_07_700x468.jpg | sites/default/files/bilder/darjeeling_limited_07_700x468.jpg |
+-----------------------------------+--------------------------------------------------------------+
7 rows in set (0.03 sec)

However, 'Transliteration' still claims not to be able to access files like "Darjeeling_Limited_01_700x468.jpg". Also, lower() wouldn't handle incomplete transliterations.

How do I fix this properly?
Thanks!

8ballsteve’s picture

Version: 6.x-3.1 » 7.x-3.x-dev
FileSize
151.34 KB

I've created a version of the transliteration module (albeit for D7) that uses the batch API to meet a client requirement.

Hopefully we can get some of the functionality into the dev branch - feel free to back port for D6 if the attached code helps.

derhasi’s picture

Version: 7.x-3.x-dev » 7.x-3.2
Issue summary: View changes
Status: Active » Needs review
FileSize
8.46 KB

I created a patch for D7 to implement the Batch API for admin/config/media/file-system/transliteration. See attached file.

asb’s picture

Issue still exists in D6 version. Any chance for a backport?

xamount’s picture

Status: Needs review » Needs work

I tried the patch in comment #8 but I am not seeing any option for batch operations.

xamount’s picture

Status: Needs work » Needs review

My apologies, I tried the patch again and it worked. Well almost.

I am trying to backport about 23,000 images. It worked for about 14,000 images and then I got a http 500 error.

xamount’s picture

I tried this patch again. I initially have approx 34,000 images.

The batch process starts, then I eventually get this error:

An AJAX HTTP error occurred. HTTP Result Code: 500 Debugging information follows. Path: /batch?id=35241&op=do StatusText: Internal Server Error ResponseText: PDOException: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'public://h7.jpg_0.jpg' for key 'uri': UPDATE {file_managed} SET uid=:db_update_placeholder_0, filename=:db_update_placeholder_1, uri=:db_update_placeholder_2, filemime=:db_update_placeholder_3, filesize=:db_update_placeholder_4, status=:db_update_placeholder_5, timestamp=:db_update_placeholder_6, type=:db_update_placeholder_7 WHERE (fid = :db_condition_placeholder_0) ; Array ( [:db_update_placeholder_0] => 1 [:db_update_placeholder_1] => H7.jpg.jpg [:db_update_placeholder_2] => public://h7.jpg_0.jpg [:db_update_placeholder_3] => image/jpeg [:db_update_placeholder_4] => 5443 [:db_update_placeholder_5] => 1 [:db_update_placeholder_6] => 1363180888 [:db_update_placeholder_7] => image [:db_condition_placeholder_0] => 5830 ) in drupal_write_record() (line 7278 of /srv/bindings/337667c549a94e50b05e6046e5dae58d/code/includes/common.inc).

At the page to restart the patch, I realise about 2000 images were transliterated. So this is what I meant by "almost works". I have to keep re-running the process every time I get the error. This may or may not be related to my environment, I'm not quite sure..

cmseasy’s picture

Same as #12 with 18000 images

cmseasy’s picture

Is this related to duplicate image names from other sources (ftp),
like: Image.jpg IMAGE.jpg and image.jpg?

Chris Matthews’s picture

Version: 7.x-3.2 » 7.x-3.x-dev
Priority: Major » Normal
Status: Needs review » Needs work
Issue tags: +Needs reroll, +Needs rework

The 4 year old patch in #8 does not apply to the latest 7.x-3.x development snapshot and is probably too old to reroll, but I went ahead and tagged the issue accordingly.

Checking patch transliteration.admin.inc...
error: while searching for:
 * Form submit function; retroactively transliterates existing file names.
 *
 * @see transliteration_retroactive()
 */
function transliteration_retroactive_submit($form, &$form_state) {
  $count = 0;
  $errors = array();
  $result = transliteration_file_query()->execute();

  while ($file = $result->fetchObject()) {
    $wrapper = file_stream_wrapper_get_instance_by_uri($file->uri);
    $scheme = file_uri_scheme($file->uri);

    // Missing implementation.
    if (!$wrapper) {
      $errors[] = file_uri_target($file->uri);
      continue;
    }

    // Skip non-writable stream wrappers.
    $writeable_stream_wrappers = file_get_stream_wrappers(STREAM_WRAPPERS_WRITE);
    if (!isset($writeable_stream_wrappers[$scheme])) {
      continue;
    }

    // Missing file.
    if (!file_exists($wrapper->realpath())) {
      $errors[] = file_uri_target($file->uri);
      continue;
    }

    // Sanitize file name.
    $filename = transliteration_clean_filename(basename($file->uri));
    // Build destination URI.
    $destination = file_stream_wrapper_uri_normalize(drupal_dirname($file->uri) . '/' . $filename);
    $destination = file_destination($destination, FILE_EXISTS_RENAME);

    // Rename and update the file record accordingly.
    if ($wrapper->rename($file->uri, $destination)) {
      // If possible, add a url redirect to handle old URL references.
      if (module_exists('redirect')) {
        $redirect = new stdClass();
        redirect_object_prepare($redirect, array(
          'source' => $wrapper->getDirectoryPath() . '/' . file_uri_target($file->uri),
          'redirect' => $wrapper->getDirectoryPath() . '/' . file_uri_target($destination),
          'status_code' => 301)
        );
        redirect_save($redirect);
      }
      $file->uri = $destination;
      // Don't use file_save() as it modifies the timestamp.
      drupal_write_record('file_managed', $file, 'fid');
      // Inform modules that the file has been updated.
      module_invoke_all('file_update', $file);
      $count++;
    }
    else {
      $errors[] = file_uri_target($file->uri);
    }
  }

  if ($errors) {
    $message = t('Not all file names could be converted. The following files could not be accessed and have been ignored:');
    $message .= theme('item_list', array('items' => $errors));
    drupal_set_message($message, 'error');
  }
  else {
    drupal_set_message(t('@filenames have been successfully transliterated.', array('@filenames' => format_plural($count, '1 file name', '@count file names'))));
  }

  // Flush page cache.
  cache_clear_all();
}

/**
 * Builds a query that returns all file names from the database containing non-ASCII characters.
 *
 * @return
 *   SelectQuery
 */
function transliteration_file_query() {
  // Regular expressions are not supported by Drupal's database layer and

error: patch failed: transliteration.admin.inc:54
error: transliteration.admin.inc: patch does not apply