So i was thinking about further performance tuning and I came to a conclusion (and I was thinking about this for a long time) that queries like following are never going to perform well:

SELECT DISTINCT s.sid, s.value, t.tid, t.language, t.translation, t.uid_entered, t.uid_approved, t.time_entered, t.time_approved, t.has_suggestion, t.is_suggestion, t.is_active FROM 2_l10n_community_string s LEFT JOIN 2_l10n_community_translation t ON s.sid = t.sid AND t.language = 'cs' AND t.is_active = 1 AND t.is_suggestion = 0 WHERE (s.value LIKE '% contents of this block updates%' OR t.translation LIKE '% contents of this block updates%') LIMIT 0, 10;

My laptop is 2GHz Core 2 Duo, 4MB cache so we may conclude that most server may be 20-30 % faster, but my laptop is not doing anything else while computing these queries...With 5 million translations, this one takes around 20 seconds (Searching for a string at Edit screen).

My first concept was to split databases for every language which immediatelly came to be very dull. My second concept - the current one is much more simple and actually also much more simple to implement:

Split l10n_community_translation table per language

With 5.x and 6.x, we have around 130 000 strings, 180 (constant) languages. If every string in every language is translated, that means 23 million translations. That can happend easily on d.org - not every language and every string but more suggestions per string and release will roll the same number.

What is the gain of splitting? 180 tables, probably less than 200 000 strings in every, that means that the query above will not scan 23m but 200k rows and that's an significant improvement. Actually, these are the numbers for query above:

5m translations: 8 seconds
260k translations: 2 seconds

That is a good improvement...

Pros:
- Much better performance, usually a constant because number of translations for one language wouldn't grow over 200 000 and when that time happens, we will have much different computing power :)
- User is probably willing to wait 2 seconds for a page but not 8.
- Easy to implement basic functionality

Cons and/or coding inconveniences:
- We need an upgrade path. Should be easy to loop through all languages, create tables for them and move their translations there but it is going to take serious amount of time. But we can split that into separate update_60NN() functions for every language and batch API will take care. Is there any other way?
- Computing statistics will be more difficult but it doesn't matter when we cache everything.
- locale_add_language() doesn't give any API so we can't react on creating a language by creating it's translations table. Therefore all tables for every language needs to be created during l10n installation.

What do you guys think?

CommentFileSizeAuthor
#2 l10n_tables.patch17.04 KBmeba
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gábor Hojtsy’s picture

Status: Postponed (maintainer needs more info) » Active

Not a bad idea, but there are of course consequences. The translations table at this point in time also stores information on all previous translations and suggestions, which might also make sense to move out to another history table (per language). The history could grow quite big, and at the moment, we are not displaying or counting is_active = 0 items anywhere, they are just kept for historical reasons. We can either move out the older stuff to the history right when they become history (on the is_active transition), or in some batch processing later, to save some time for editing. Right now they are stored in the same table to save time on SQL updates. We do loose considerable time on queries however.

Separation per language, and then separation per is_active would be two different patches, so let's not fold that in here :)

As far as your comments go:
- The upgrade path. It could be made quite quick with "INSERT INTO .. SELECT .." and "DELETE ..." with which we can quickly move over data matching our criteria. Existing servers run a small number of languages, so this should run pretty quick.
- Stats computing will be harder. Also any cross-language operation you might want to do (like how many languages "Home" is translated to"?) will be harder. Such cross-language operations don't exist right now in the code, so we don't feel an immediate pain. Not sure we need to support such cross-language operations well, when it goes against our core job of doing translations quick.
- Oh, no, we can whip up a quick admin page, where you can enable certain languages for l10n_server. No need to set up 180 language tables on each l10n_server instance, when they would only run 1 or at most a couple languages. Even the drupal.org server itself will not run this many languages (just look at the translations CVS directory for a running list of translations we have now).

So all-in-all I think this is a workable way, but we should not neglect improving performance in whatever other ways we can.

meba’s picture

FileSize
17.04 KB

Attaching a first version of a patch to:

  • hook to a #submit of language enable/disable screen - create translation tables for enabled language
  • hook to a #submit of language delete screen - delete all translations for this language
  • supports d6 upgrade path - create translation tables during update.php
  • uses translation tables on View & Edit translation screens per language

TODO:

  • Ajax.inc needs a lot of love. These functions are using $tid, but $tid is not unique now, it needs to use $tid + $langcode
  • hook to a #submit of language add screen
  • statistics don't work at all
  • delete all old translations after upgrade
  • This patch provides about 100 % performance gain on View & Edit screens where page load time has gone from 8 to 4 seconds which is time i am not satisfied with yet :)

    I am attaching the patch now to ensure I won't accidentally delete it (AGAIN) :-)

    Specific questions: is l10n_community_translation_table secure enough? I didn't use db_escape_string because it probably wouldn't work with table names.

Gábor Hojtsy’s picture

Title: Performance: split translations table » Split translations table into per-language tables
Status: Active » Needs work

Updating as a possible future thing to explore.

andypost’s picture

Update function is there but no changes in schema is done.

Suppose l10n_community_translation_table should care about fall-back to original table while l10n_community_update_6005() is not applied or if there's only one language in system

SebCorbin’s picture

Version: 6.x-1.x-dev » 7.x-1.x-dev
Issue tags: +Performance

Bumping to 7.x