Hi,

the new release might have introduced a regression in regard to CPU consumption/processing time; 6.x.3.0 was running robust on a dozend sites; the new release gives me sporadic WSODs when running ./admin/settings/file-system/transliteration.

The WSOD is probably caused by a timeout; Apache's error log reports:

[Sun Jun 03 16:10:24 2012] [error] [client 12.34.56.78] PHP Fatal error:  Maximum execution time of 60 seconds exceeded in /var/www/drupal/includes/database.mysqli.inc on line 142, referer: http://{mydomain}/admin/settings/file-system/transliteration

When running the transliteration, the CPU load spikes to 5-6 on a dedicated Xeon server, when the normal load is 1-2. With 6.x.3.0, I have never encountered similar behaviour, neither those load spikes, nor timeouts.

So far I can not provide a procedure to reproduce the timeout; when lloking through the changelog, #1576986: Allow conversion of existing filenames to lower-case might be causing the WSOD; if this assumption is correct, the issue probably would only occur once per site; I'll try to verify this.

Files: 
CommentFileSizeAuthor
#7 transliteration.zip151.34 KB8ballsteve

Comments

Yep, that change has quite an impact because now the filename case is respected in MySQL and that could bring up a lot of previously untransliterated files.

If you can, please do investigate further :)

Category:bug» task

I did some tests to see if the WSOD is a "once in a lifetime" error, or if it disrupts day to day operation as well.

Small site:

  • Before running the transliteration: "The database currently lists 7,753 filenames containing non-ASCII characters."
  • CPU load before transliteration was started: 0.7
  • CPU load during transliteration: ~1.0

Result: No WSOD, no PHP fatal error; everything's fine.

Medium-sized site:

Test case a)

"The database currently lists 74,693 filenames containing non-ASCII characters."; CPU load before transliteration was started: 0.7.

Load of 2 after one minute; load of ~3 after 1:30; then staying around 2.5 for the next minutes, then WSOD with "PHP Fatal error: Maximum execution time of 30 seconds exceeded..." after about 5 minutes; the load stays around 2 for some more minutes and goes back to 0.7 after a total time of about 10 minutes.

Reloading ./admin/settings/file-system/transliteration now says: "The database currently lists 36397 filenames containing non-ASCII characters." Repeating the transliteration process once more behaves similarily to the first run: increased CPU load for several minutes with intense RAID activity, but this time no WSOD, but a long error screen listing the filenames that could not be transliterated ("Not all file names could be converted. The following files could not be accessed and have been ignored: [...]"/"Nicht alle Dateinamen konnte konvertiert werden. Auf die folgenden Dateien konnte nicht zugegriffen werden und sind ignoriert worden: [...]"); Now we're at "The database currently lists 1,594 filenames containing non-ASCII characters." Additional transliteration runs take only seconds, but the number of 1,594 filenames containing non-ASCII characters does not decrease.

Why transliteration fails on file names like 2002-03-03_Datum-einfügen.jpg (./sites/default/files/bilder/2002-03-03_Datum-einf) I do not understand. However, this is probably a completely different issue.

Assumption: "Transliterate" can process (on this system) around ~40k files until PHP bails out. Running transliteration a 2nd (3rd, 4th...) time hopefully finishes the process without leaving damage.

Test case b)

Before transliteration: "The database currently lists 105,052 filenames containing non-ASCII characters." The behaviour from test case a) repeats itself more or less; in this case I can only transliterate ~50% of the filenames. "The database currently lists 49,951 filenames containing non-ASCII characters" is the final number. Additional transliteration runs do not decrease this number.

Result: I assume that the WSOD is triggered by the new case conversion and does only occur a limited number of times, until the exisiting files are successfully transliterated. Probably depending on the server's performance, small sites probably won't encounter any WSODs at all, medium size sites will probably get a small number of "PHP Fatal errors"; more serious problems are to be expected with really image-heavy sites with several 100k files. Thus I'm changing the issue to a task: This behaviour is not a bug, but it should be mentioned on the project page. I don't know if something like Batch API would be an option, but timeouts, WSODs and PHP Fatal errors are really not optimal.

Very strange is the behaviour in regard to files that "could not be accessed and have been ignored"; at least some of them exist on the server and should be accessible (owned by the webserver user, readable & writable). However, if I have time I'll investigate this and open another isse, if necessary.

Title:Possible regression: WSOD and PHP Fatal Error when transliterating filesPossible regression: WSOD and PHP Fatal Error when transliterating files, so implement Batch API for this operation

Indeed, seems that Batch API is our only option here. I'm not really sure why we're not using it already for this operation, though I bet the answer is: nobody bothered to implement it :)

Hi.
I think I got a similar problem. After update to 6.x-3.1 I can't access to admin's pages. I got 500 error. If I delete transliteration 3.1 module site works fine.
PHP error logs:

[Mon Jun 04 20:35:17 2012] [warn] [client 111.111.111.111] mod_fcgid: read data timeout in 40 seconds
[Mon Jun 04 20:35:17 2012] [error] [client 111.111.111.111] Premature end of script headers: index.php

Batch API: Probably this module was initially designed to just run when a (= one) file was uploaded; as far as I remember, it does not work with things like bulk image import (legacy 'image' module), because files are read from a directory on the server (somewhere there was an issue for that). Retroactive processing was probably added to the module later, but maybe even before Batch API was introduced (in D6, if I remember correctly). However, now we need volunteers to make 'Transliteration' use Batch API. For the meantime a warning notice on the project page or in README.TXT has to suffice ;)

#4 appears to be a totally different issue (other symptoms, and other errors in the logfiles. I'd suggest to open another issue for that.

Priority:Normal» Major

Sorry, I have to push the priority for this a bit. Sadly, just running the transliteration a couple of times doesn't suffice; the database is left in an incoherent state, and this leads to a massive amount of errors. The most obvious symptom appears when just running ./admin/settings/file-system/transliteration again. This gives a massive list of invalid file paths and the error message:

Not all file names could be converted. The following files could not be accessed and have been ignored:

Example for the listed files (the listing actually includes 2,000 missing files):

sites/default/files/bilder/Darjeeling_Limited_01_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_02_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_03_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_04_468x700.jpg
sites/default/files/bilder/Darjeeling_Limited_05_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_06_700x468.jpg
sites/default/files/bilder/Darjeeling_Limited_07_700x468.jpg

However, these files do exist, and are read- and writable for the webserver user. Please notice the case:

# ls -lah darj*.jpg
-rw-rw-r-- 1 www-data www-data 325K  3. Jun 15:58 darjeeling_limited_01_700x468.jpg
-rw-rw-r-- 1 www-data www-data 321K  3. Jun 15:58 darjeeling_limited_02_700x468.jpg
-rw-rw-r-- 1 www-data www-data 348K  3. Jun 15:58 darjeeling_limited_03_700x468.jpg
-rw-rw-r-- 1 www-data www-data 190K  3. Jun 15:58 darjeeling_limited_04_468x700.jpg
-rw-rw-r-- 1 www-data www-data 282K  3. Jun 15:58 darjeeling_limited_05_700x468.jpg
-rw-rw-r-- 1 www-data www-data 331K  3. Jun 15:58 darjeeling_limited_06_700x468.jpg
-rw-rw-r-- 1 www-data www-data 353K  3. Jun 15:58 darjeeling_limited_07_700x468.jpg

So 'Transliteration' has transliterated and lower-cased some files, but didn't update the 'files' table properly:

mysql> select filename,filepath from files where filename like 'Darj%';
+-----------------------------------+--------------------------------------------------------------+
| filename                          | filepath                                                     |
+-----------------------------------+--------------------------------------------------------------+
| Darjeeling_Limited_01_700x468.jpg | sites/default/files/bilder/darjeeling_limited_01_700x468.jpg |
| Darjeeling_Limited_02_700x468.jpg | sites/default/files/bilder/darjeeling_limited_02_700x468.jpg |
| Darjeeling_Limited_03_700x468.jpg | sites/default/files/bilder/darjeeling_limited_03_700x468.jpg |
| Darjeeling_Limited_04_468x700.jpg | sites/default/files/bilder/darjeeling_limited_04_468x700.jpg |
| Darjeeling_Limited_05_700x468.jpg | sites/default/files/bilder/darjeeling_limited_05_700x468.jpg |
| Darjeeling_Limited_06_700x468.jpg | sites/default/files/bilder/darjeeling_limited_06_700x468.jpg |
| Darjeeling_Limited_07_700x468.jpg | sites/default/files/bilder/darjeeling_limited_07_700x468.jpg |
+-----------------------------------+--------------------------------------------------------------+
7 rows in set (0.06 sec)

I tried to fix this with the lower() function of MySQL:

Test:

mysql> select lower(filename),filepath from files where filename like 'Darj%';
+-----------------------------------+--------------------------------------------------------------+
| lower(filename)                   | filepath                                                     |
+-----------------------------------+--------------------------------------------------------------+
| darjeeling_limited_01_700x468.jpg | sites/default/files/bilder/darjeeling_limited_01_700x468.jpg |
| darjeeling_limited_02_700x468.jpg | sites/default/files/bilder/darjeeling_limited_02_700x468.jpg |
| darjeeling_limited_03_700x468.jpg | sites/default/files/bilder/darjeeling_limited_03_700x468.jpg |
| darjeeling_limited_04_468x700.jpg | sites/default/files/bilder/darjeeling_limited_04_468x700.jpg |
| darjeeling_limited_05_700x468.jpg | sites/default/files/bilder/darjeeling_limited_05_700x468.jpg |
| darjeeling_limited_06_700x468.jpg | sites/default/files/bilder/darjeeling_limited_06_700x468.jpg |
| darjeeling_limited_07_700x468.jpg | sites/default/files/bilder/darjeeling_limited_07_700x468.jpg |
+-----------------------------------+--------------------------------------------------------------+
7 rows in set (0.03 sec)

Appears to work; make it permanent:

mysql> update files set filename = lower(filename) where filename like 'Darj%';
Query OK, 7 rows affected (0.14 sec)
Rows matched: 7  Changed: 7  Warnings: 0
mysql> select filename,filepath from files where filename like 'Darj%';
+-----------------------------------+--------------------------------------------------------------+
| filename                          | filepath                                                     |
+-----------------------------------+--------------------------------------------------------------+
| darjeeling_limited_01_700x468.jpg | sites/default/files/bilder/darjeeling_limited_01_700x468.jpg |
| darjeeling_limited_02_700x468.jpg | sites/default/files/bilder/darjeeling_limited_02_700x468.jpg |
| darjeeling_limited_03_700x468.jpg | sites/default/files/bilder/darjeeling_limited_03_700x468.jpg |
| darjeeling_limited_04_468x700.jpg | sites/default/files/bilder/darjeeling_limited_04_468x700.jpg |
| darjeeling_limited_05_700x468.jpg | sites/default/files/bilder/darjeeling_limited_05_700x468.jpg |
| darjeeling_limited_06_700x468.jpg | sites/default/files/bilder/darjeeling_limited_06_700x468.jpg |
| darjeeling_limited_07_700x468.jpg | sites/default/files/bilder/darjeeling_limited_07_700x468.jpg |
+-----------------------------------+--------------------------------------------------------------+
7 rows in set (0.03 sec)

However, 'Transliteration' still claims not to be able to access files like "Darjeeling_Limited_01_700x468.jpg". Also, lower() wouldn't handle incomplete transliterations.

How do I fix this properly?
Thanks!

Version:6.x-3.1» 7.x-3.x-dev
StatusFileSize
new151.34 KB

I've created a version of the transliteration module (albeit for D7) that uses the batch API to meet a client requirement.

Hopefully we can get some of the functionality into the dev branch - feel free to back port for D6 if the attached code helps.