locale.module: adding an index on the source field improves speeds

janekb - December 27, 2005 - 19:44
Project:Drupal
Version:x.y.z
Component:locale.module
Category:task
Priority:normal
Assigned:Stefan Nagtegaal
Status:closed
Description

Location module creates queries which perform full table scan of locales_source table:

SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = 'Users can use their own name or handle and can fine tune some personal configuration settings through their individual my account page. Registered users need to authenticate by supplying either a local username and password, or a remote username and password such as DelphiForums ID, or one from a Drupal powered website. A visitor accessing your website is assigned an unique ID, the so-called session ID, which is stored in a cookie. For security\'s sake, the cookie does not contain personal information but acts as a key to retrieve the information stored on your server. ' AND t.locale = 'pl'

Solution:
Add index on 'source' field for 'locales_source' table
(for MySQL)

ALTER TABLE `schema_name`.`locales_source` DROP INDEX `source`,
ADD INDEX `SOURCE`(`source`(30));

#1

janekb - January 27, 2006 - 21:23
Priority:critical» normal
Assigned to:Anonymous» janekb
Status:active» patch (code needs review)

Problems are only when location module is enabled and there are some untranslated strings. In such case no caching is performed and Drupal atempts to look up translation in this table.
Patch hasn't been tested with PostgresSQL (I'm not shure if index can be created on 'text' type fields).

AttachmentSize
locales_soruce_db_patch.txt1.05 KB

#2

killes@www.drop.org - January 27, 2006 - 21:26
Title:Full table scan in location module» Full table scan in locale module
Status:patch (code needs review)» patch (code needs work)

Thanks, can you provide a patch that includes an update for updates.inc?

#3

Stefan Nagtegaal - March 17, 2006 - 23:03
Title:Full table scan in locale module» locale.module tremendous speedup!
Version:x.y.z» 4.7.0-beta6
Assigned to:janekb» Stefan Nagtegaal
Status:patch (code needs work)» patch (code needs review)

Here it is, a new formed patch includes:
- updates database.mysql and database.pgsql;
- updates update.inc!

This does _really_ speed up your site when using the locale.module..

Please try, and see if we can get this little speed improvement into 4.7

AttachmentSize
locales_speed_improvement.diff2.26 KB

#4

Stefan Nagtegaal - March 17, 2006 - 23:12

As an example:

I took at the following query:
"SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = '\n <p>Welcome to your new <a href=\"%drupal\">Drupal</a>-powered website. This message will guide you through your first steps with Drupal, and will disappear once you have posted your first piece of content.</p>\n <p>The first thing you will need to do is <a href=\"%register\">create the first account</a>. This account will have full administration rights and will allow you to configure your website. Once logged in, you can visit the <a href=\"%admin\">administration section</a> and <a href=\"%config\">set up your site\'s configuration</a>.</p>\n <p>Drupal comes with various modules, each of which contains a specific piece of functionality. You should visit the <a href=\"%modules\">module list</a> and enable those modules which suit your website\'s needs.</p>\n <p><a href=\"%themes\">Themes</a> handle the presentation of your website. You can use one of the existing themes, modify them or create your own from scratch.</p>\n <p>We suggest you look around the administration section and explore the various options Drupal offers you. For more information, you can refer to the <a href=\"%handbook\">Drupal handbooks online</a>.</p>' AND t.locale = 'nl'"

It takes 20.22 ms to execute before this patch was in. After that, it only took 1.02 ms..

That would mean that the query has been executed 20 times faster as before, which means a tremendous speed up for sites which do use the locale.module..

Steef

#5

Cvbge - March 18, 2006 - 07:24
Version:4.7.0-beta6» x.y.z
Category:bug report» feature request
Status:patch (code needs review)» patch (code needs work)

Postgresql part looks good, but you should use switch statement instead of if and also include mysqli case (same as mysql), thus setting to 'needs work'

#6

Zen - March 18, 2006 - 07:35

Would this also be a good place to check/find out if there is any reason why we use a BLOB here (MySQL) instead of TEXT ?

-K

#7

Zen - March 18, 2006 - 08:03
Category:feature request» task

Caveats:

  • http://dev.mysql.com/doc/refman/4.1/en/create-index.html : You can add an index on a column that can have NULL values only if you are using MySQL 3.23.2 or newer and are using the MyISAM, InnoDB, or BDB storage engine. This is also true for MEMORY tables as of MySQL 4.0.2. You can only add an index on a BLOB or TEXT column if you are using MySQL 3.23.2 or newer and are using the MyISAM or BDB storage engine, or MySQL 4.0.14 or newer and the InnoDB storage engine.
  • http://dev.mysql.com/doc/refman/5.0/en/create-index.html :
    In MySQL 5.0:

        * You can add an index on a column that can have NULL values only if you are using the MyISAM, InnoDB, BDB, or MEMORY storage engine.
        * You can add an index on a BLOB or TEXT column only if you are using the MyISAM, BDB, or InnoDB storage engine.

The field length also bears investigating along with the NULL/duplicates issue.

Cheers
-K

#8

Zen - March 18, 2006 - 08:12
Title:locale.module tremendous speedup!» locale.module: adding an index on the source field improves speeds

#9

Stefan Nagtegaal - March 18, 2006 - 12:42

Would this also be a good place to check/find out if there is any reason why we use a BLOB here (MySQL) instead of TEXT ?
@Zen: IMO that should be a new issue. I don't want to cover more than absolutely nessecary in this issue because I would like to see this committed against HEAD before release of 4.7.

@Zen: What do you ment with your comment about the caveats? I don't know if I understand what you mean.. Does it give you any errors?

@Cvbge: Formatting of updates.inc has been improved as you pointed out.

Both, thanx for the review, new patch is attached.

AttachmentSize
locale_speed_improvement.diff2.3 KB

#10

Stefan Nagtegaal - March 18, 2006 - 12:42
Status:patch (code needs work)» patch (code needs review)

Setting status back to "patch (code needs review)"..

#11

Bèr Kessels - March 18, 2006 - 13:30

I did an apache benchmark. And attached the outcome. AFAIK the patch had no real effect, the unpatched version looks faster even.

Disclaimer, i am clueless when it comes to benchmarks ;)

AttachmentSize
42463_ab_log2.95 KB

#12

killes@www.drop.org - March 18, 2006 - 13:52

Can you do the test again with ab -n 200 -c1? Your test did indeed no tell very much. You did have locale enabled, did you? I think the patch will only affect pages where there are long strings (strlen > 75) which will be retrieved directly from the db.

#13

Zen - March 18, 2006 - 14:52

http://drupal.org/node/50669 <-- This patch makes MySQL support engine agnostic, basically stating that Drupal will work on all (common) engines on MySQL v3.23.58+. As stated above, innoDB does not support TEXT/BLOB indexing in versions before 4.0.14, thus making this patch rather dicey.

Just a thought.
-K

#14

Cvbge - March 18, 2006 - 14:53

And: have you had cache enabled? (should be disabled). How many translations (different languages) you had imported? (I suppose 1 non-english translation is most common, but testing with 2-3 could be good too).
Also, I wonder if index on the first 30 characters is optimal.

#15

Zen - March 18, 2006 - 15:05

The rest of my comment appears to have been eaten up.

The issue in my last statement makes MySQL support in Drupal engine agnostic. So we are essentially supporting all mainstream MySQL engines (and innoDB is one of them) on versions of MySQL in v.3.23.58 onwards. As stated above, innoDB does not support TEXT/BLOB field indexing on MySQL versions older than 4.0.14, thereby making this patch a little on the dicey side.

Cheers
-K

#16

Bèr Kessels - March 18, 2006 - 15:58

I will try to rerun that benchmark later.
1) no cache enabled
2) I ran it on ?q=tracker. I am not aware of any page with large strings that are available to anonymous. Ill create a php page for that.
3) I did the c4 on purpose; in my system the slow queries, those wayting while the lock is on affect speed most.
4) modules: all standard modules + locale + statsictics (the imo one-but biggest slow-downer)
5) 3 translations. approx 600 strings 'imported'.
6) only four strnigs were translated. That could be the cause?

Bèr

#17

killes@www.drop.org - March 18, 2006 - 16:02

if the locale_target table is empty an index cannot have an impact.

#18

chx - March 18, 2006 - 16:12

tavon tested changing blob to text in both source and target tables and it's ok.

#19

killes@www.drop.org - March 18, 2006 - 16:56

I've included the change the blob to text change

AttachmentSize
locales_speed_improvement.patch2.67 KB

#20

Steven - March 18, 2006 - 17:14

AFAIK we use blob because mysql treats text in a case insensitive fashion.

#21

killes@www.drop.org - March 18, 2006 - 19:58

Don't think so:

http://dev.mysql.com/doc/refman/4.1/en/blob.html

BLOB columns have no character set, and sorting and comparison are based on the numeric values of the bytes in column values. TEXT columns have a character set, and values are sorted and compared based on the collation of the character set assigned to the column as of MySQL 4.1. Before 4.1, TEXT sorting and comparison are based on the collation of the server character set.

#22

killes@www.drop.org - March 24, 2006 - 19:01

Seems Steven was right and I wasn't.

we have two choices:

1) not apply the patch
2) apply it (minus the blob2text change), and change the system requiremts to state that you can't use old innodb and locales.

#23

killes@www.drop.org - March 24, 2006 - 19:11

3) make the locales_source table explicitly MyISAM again.

#24

yched - March 24, 2006 - 22:23

I run a french localized site. A lot of admin pages are not translated, and the views/edit page for instance (quite heavy, all untranslated strings) was taking ages to build.

I tested the patch (well I just created the index - no blob2text thing)
This is in no way a regular benchmark - i just rely on query times displayed by devel.module

Before :
40 to 50 ms (!!) for every locale query

After :
all these queries drop to 2 or 3 ms...

local server (Win), MySQL5
one language and 3000 records in locale_target

I guess my personnal opinion is settled : commit it or not, I keep it ! :-)

Naive question : why don't we store in cache the fact that a string has no translation ?
We have to go through all those unsuccessful queries on each page load ?
views/edit page triggers something like 200+ of these queries, that's half a second...

#25

Stefan Nagtegaal - March 25, 2006 - 10:22

So killes, what method do you prefer and would let you commit this to HEAD before the release of drupal 4.7?

I'll write the patch which is needed, if time permits...

Steef

#26

Zen - April 3, 2006 - 14:00
Status:patch (code needs review)» postponed

This has been postponed to 4.8 - I've created a documentation issue so that this goes into the handbooks for now.. http://drupal.org/node/56943

-K

#27

Dries - May 2, 2006 - 12:13
Status:postponed» patch (code needs work)

Let's get this into the next version!

#28

Gerhard Killesreiter - May 2, 2006 - 12:20

IIRC the problem was that adding an index on blob colums wasn't supported for all database backends we currently support. IIRC, InnoDB on mysql 3 was not supported.

#29

jadwigo - May 8, 2006 - 18:21

+1 for getting some work on this ... the speed improvements are really cool > from ~2500ms to ~550ms page generation time on /node for my homepage for example, more than 4 times faster, so less load on the server, so my hosting provider is happier too

IMO it's really important too fix this at least for incomplete translations, because many translations are incomplete as soon as you use contrib modules in a site. For example translating of admin pages is not a priority for most modules, and even worse translation sources will not be deleted when you remove a module.

Another question: why are translations not cached when a translation is incomplete anyway?
This has significant impact, and only if you really need all settings pages of a site translated does it have a real reason. I would not mind that some untranslated strings are cached if that speeds up my site, as long as the cache get refreshed when I change strings (which would not be a large technical problem to do, and should already work that way)

#30

bwynants - May 23, 2006 - 21:13

applying the patch gave an amazing increased of speed for my site (speed was on of the reasons not to upgrade my other sites)

If you guys are serious about 4.7 this patch is a must. Or someone needs to rewrite locale.....

#31

Stefan Nagtegaal - May 26, 2006 - 21:02

Dries, now that time is on my side I'm willing to update the patch.
Question remains how you want this to be implemented..

#32

smk-ka - May 27, 2006 - 08:47

OT:

@jadwigo,
@yched:
Just out of interest: could it be possible that you suffer from this bug? Are you using Organic Groups or SMTP module together with localization?

#33

Dries - May 27, 2006 - 11:00

Patch no longer applies.

#34

yched - May 27, 2006 - 23:55

@smk-ka : no OG nor SMTP on my side.
I'm not home right now, so I cant't really check for other possible modules triggering the bug you mention

#35

jadwigo - May 28, 2006 - 16:13

@smk-ka
no og.module or smtp.module on my side

#36

Dries - July 4, 2006 - 14:02

killes wrote "IIRC, InnoDB on mysql 3 was not supported.". I can't find proof of that ... do you have a pointer or a reference?

#37

Dries - July 10, 2006 - 19:15

Just re-roll the patch. The index is added in a seperate SQL statement so in the event that your database doesn't allow an index to be added, the table would still be created.

#38

meba - July 12, 2006 - 17:23

These queries are not using indexes either:

SELECT s.lid, t.translation FROM locales_source s INNER JOIN locales_target t ON s.lid = t.lid WHERE s.source = '' AND t.locale = 'cs'
SELECT locale, name FROM locales_meta WHERE enabled = 1 ORDER BY isdefault DESC, name ASC;

Could anyone update the patch to include indexes for these queries?

#39

killes@www.drop.org - July 27, 2006 - 08:24

Since weve decided o drop mysql 3 support, my objections from #28 are no longer valid.

#40

killes@www.drop.org - July 27, 2006 - 08:24

Since weve decided o drop mysql 3 support, my objections from #28 are no longer valid.

#41

drumm - July 27, 2006 - 08:27
Status:patch (code needs work)» patch (code needs review)

#42

Dries - July 27, 2006 - 09:42

Probably needs to be re-rolled. Let's get this patch in ...

#43

Heine - August 27, 2006 - 08:05

Updated to apply against HEAD.

AttachmentSize
locale_index.patch.txt2.19 KB

#44

Dries - August 27, 2006 - 09:33
Status:patch (code needs review)» fixed

Committed to CVS HEAD.

#45

Anonymous - September 10, 2006 - 09:45
Status:fixed» closed
 
 

Drupal is a registered trademark of Dries Buytaert.