locale.module: adding an index on the source field improves speeds
janekb - December 27, 2005 - 19:44
| Project: | Drupal |
| Version: | x.y.z |
| Component: | locale.module |
| Category: | task |
| Priority: | normal |
| Assigned: | Stefan Nagtegaal |
| Status: | closed |
Description
Location module creates queries which perform full table scan of locales_source table:
SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = 'Users can use their own name or handle and can fine tune some personal configuration settings through their individual my account page. Registered users need to authenticate by supplying either a local username and password, or a remote username and password such as DelphiForums ID, or one from a Drupal powered website. A visitor accessing your website is assigned an unique ID, the so-called session ID, which is stored in a cookie. For security\'s sake, the cookie does not contain personal information but acts as a key to retrieve the information stored on your server. ' AND t.locale = 'pl'Solution:
Add index on 'source' field for 'locales_source' table
(for MySQL)
ALTER TABLE `schema_name`.`locales_source` DROP INDEX `source`,
ADD INDEX `SOURCE`(`source`(30));
#1
Problems are only when location module is enabled and there are some untranslated strings. In such case no caching is performed and Drupal atempts to look up translation in this table.
Patch hasn't been tested with PostgresSQL (I'm not shure if index can be created on 'text' type fields).
#2
Thanks, can you provide a patch that includes an update for updates.inc?
#3
Here it is, a new formed patch includes:
- updates database.mysql and database.pgsql;
- updates update.inc!
This does _really_ speed up your site when using the locale.module..
Please try, and see if we can get this little speed improvement into 4.7
#4
As an example:
I took at the following query:
"SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = '\n <p>Welcome to your new <a href=\"%drupal\">Drupal</a>-powered website. This message will guide you through your first steps with Drupal, and will disappear once you have posted your first piece of content.</p>\n <p>The first thing you will need to do is <a href=\"%register\">create the first account</a>. This account will have full administration rights and will allow you to configure your website. Once logged in, you can visit the <a href=\"%admin\">administration section</a> and <a href=\"%config\">set up your site\'s configuration</a>.</p>\n <p>Drupal comes with various modules, each of which contains a specific piece of functionality. You should visit the <a href=\"%modules\">module list</a> and enable those modules which suit your website\'s needs.</p>\n <p><a href=\"%themes\">Themes</a> handle the presentation of your website. You can use one of the existing themes, modify them or create your own from scratch.</p>\n <p>We suggest you look around the administration section and explore the various options Drupal offers you. For more information, you can refer to the <a href=\"%handbook\">Drupal handbooks online</a>.</p>' AND t.locale = 'nl'"It takes 20.22 ms to execute before this patch was in. After that, it only took 1.02 ms..
That would mean that the query has been executed 20 times faster as before, which means a tremendous speed up for sites which do use the locale.module..
Steef
#5
Postgresql part looks good, but you should use
switchstatement instead ofifand also includemysqlicase (same as mysql), thus setting to 'needs work'#6
Would this also be a good place to check/find out if there is any reason why we use a BLOB here (MySQL) instead of TEXT ?
-K
#7
Caveats:
You can add an index on a column that can have NULL values only if you are using MySQL 3.23.2 or newer and are using the MyISAM, InnoDB, or BDB storage engine. This is also true for MEMORY tables as of MySQL 4.0.2. You can only add an index on a BLOB or TEXT column if you are using MySQL 3.23.2 or newer and are using the MyISAM or BDB storage engine, or MySQL 4.0.14 or newer and the InnoDB storage engine.In MySQL 5.0:
* You can add an index on a column that can have NULL values only if you are using the MyISAM, InnoDB, BDB, or MEMORY storage engine.
* You can add an index on a BLOB or TEXT column only if you are using the MyISAM, BDB, or InnoDB storage engine.
The field length also bears investigating along with the NULL/duplicates issue.
Cheers
-K
#8
#9
Would this also be a good place to check/find out if there is any reason why we use a BLOB here (MySQL) instead of TEXT ?
@Zen: IMO that should be a new issue. I don't want to cover more than absolutely nessecary in this issue because I would like to see this committed against HEAD before release of 4.7.
@Zen: What do you ment with your comment about the caveats? I don't know if I understand what you mean.. Does it give you any errors?
@Cvbge: Formatting of updates.inc has been improved as you pointed out.
Both, thanx for the review, new patch is attached.
#10
Setting status back to "patch (code needs review)"..
#11
I did an apache benchmark. And attached the outcome. AFAIK the patch had no real effect, the unpatched version looks faster even.
Disclaimer, i am clueless when it comes to benchmarks ;)
#12
Can you do the test again with ab -n 200 -c1? Your test did indeed no tell very much. You did have locale enabled, did you? I think the patch will only affect pages where there are long strings (strlen > 75) which will be retrieved directly from the db.
#13
http://drupal.org/node/50669 <-- This patch makes MySQL support engine agnostic, basically stating that Drupal will work on all (common) engines on MySQL v3.23.58+. As stated above, innoDB does not support TEXT/BLOB indexing in versions before 4.0.14, thus making this patch rather dicey.
Just a thought.
-K
#14
And: have you had cache enabled? (should be disabled). How many translations (different languages) you had imported? (I suppose 1 non-english translation is most common, but testing with 2-3 could be good too).
Also, I wonder if index on the first 30 characters is optimal.
#15
The rest of my comment appears to have been eaten up.
The issue in my last statement makes MySQL support in Drupal engine agnostic. So we are essentially supporting all mainstream MySQL engines (and innoDB is one of them) on versions of MySQL in v.3.23.58 onwards. As stated above, innoDB does not support TEXT/BLOB field indexing on MySQL versions older than 4.0.14, thereby making this patch a little on the dicey side.
Cheers
-K
#16
I will try to rerun that benchmark later.
1) no cache enabled
2) I ran it on ?q=tracker. I am not aware of any page with large strings that are available to anonymous. Ill create a php page for that.
3) I did the c4 on purpose; in my system the slow queries, those wayting while the lock is on affect speed most.
4) modules: all standard modules + locale + statsictics (the imo one-but biggest slow-downer)
5) 3 translations. approx 600 strings 'imported'.
6) only four strnigs were translated. That could be the cause?
Bèr
#17
if the locale_target table is empty an index cannot have an impact.
#18
tavon tested changing blob to text in both source and target tables and it's ok.
#19
I've included the change the blob to text change
#20
AFAIK we use blob because mysql treats text in a case insensitive fashion.
#21
Don't think so:
http://dev.mysql.com/doc/refman/4.1/en/blob.html
BLOB columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values. TEXT columns have a character set, and values are sorted and compared based on the collation of the character set assigned to the column as of MySQL 4.1. Before 4.1, TEXT sorting and comparison are based on the collation of the server character set.
#22
Seems Steven was right and I wasn't.
we have two choices:
1) not apply the patch
2) apply it (minus the blob2text change), and change the system requiremts to state that you can't use old innodb and locales.
#23
3) make the locales_source table explicitly MyISAM again.
#24
I run a french localized site. A lot of admin pages are not translated, and the views/edit page for instance (quite heavy, all untranslated strings) was taking ages to build.
I tested the patch (well I just created the index - no blob2text thing)
This is in no way a regular benchmark - i just rely on query times displayed by devel.module
Before :
40 to 50 ms (!!) for every locale query
After :
all these queries drop to 2 or 3 ms...
local server (Win), MySQL5
one language and 3000 records in locale_target
I guess my personnal opinion is settled : commit it or not, I keep it ! :-)
Naive question : why don't we store in cache the fact that a string has no translation ?
We have to go through all those unsuccessful queries on each page load ?
views/edit page triggers something like 200+ of these queries, that's half a second...
#25
So killes, what method do you prefer and would let you commit this to HEAD before the release of drupal 4.7?
I'll write the patch which is needed, if time permits...
Steef
#26
This has been postponed to 4.8 - I've created a documentation issue so that this goes into the handbooks for now.. http://drupal.org/node/56943
-K
#27
Let's get this into the next version!
#28
IIRC the problem was that adding an index on blob colums wasn't supported for all database backends we currently support. IIRC, InnoDB on mysql 3 was not supported.
#29
+1 for getting some work on this ... the speed improvements are really cool > from ~2500ms to ~550ms page generation time on /node for my homepage for example, more than 4 times faster, so less load on the server, so my hosting provider is happier too
IMO it's really important too fix this at least for incomplete translations, because many translations are incomplete as soon as you use contrib modules in a site. For example translating of admin pages is not a priority for most modules, and even worse translation sources will not be deleted when you remove a module.
Another question: why are translations not cached when a translation is incomplete anyway?
This has significant impact, and only if you really need all settings pages of a site translated does it have a real reason. I would not mind that some untranslated strings are cached if that speeds up my site, as long as the cache get refreshed when I change strings (which would not be a large technical problem to do, and should already work that way)
#30
applying the patch gave an amazing increased of speed for my site (speed was on of the reasons not to upgrade my other sites)
If you guys are serious about 4.7 this patch is a must. Or someone needs to rewrite locale.....
#31
Dries, now that time is on my side I'm willing to update the patch.
Question remains how you want this to be implemented..
#32
OT:
@jadwigo,
@yched:
Just out of interest: could it be possible that you suffer from this bug? Are you using Organic Groups or SMTP module together with localization?
#33
Patch no longer applies.
#34
@smk-ka : no OG nor SMTP on my side.
I'm not home right now, so I cant't really check for other possible modules triggering the bug you mention
#35
@smk-ka
no og.module or smtp.module on my side
#36
killes wrote "IIRC, InnoDB on mysql 3 was not supported.". I can't find proof of that ... do you have a pointer or a reference?
#37
Just re-roll the patch. The index is added in a seperate SQL statement so in the event that your database doesn't allow an index to be added, the table would still be created.
#38
These queries are not using indexes either:
SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = '' AND t.locale = 'cs'
SELECT locale, name FROM locales_meta WHERE enabled = 1 ORDER BY isdefault DESC, name ASC;
Could anyone update the patch to include indexes for these queries?
#39
Since weve decided o drop mysql 3 support, my objections from #28 are no longer valid.
#40
Since weve decided o drop mysql 3 support, my objections from #28 are no longer valid.
#41
#42
Probably needs to be re-rolled. Let's get this patch in ...
#43
Updated to apply against HEAD.
#44
Committed to CVS HEAD.
#45