Replace LOWER() with db_select() and LIKE() where possible [#279851]

Comment	File	Size	Author
#158	279851-more-lower-2.patch	10.03 KB	c960657
#154	279851-more-lower-1.patch	10.19 KB	c960657
#150	279851-150.patch	5.31 KB	catch
#145	279851-remove-lower-145.patch	5.06 KB	coltrane
#139	lower.patch	2.36 KB	catch
#138	lower.patch	1.55 KB	catch
#136	lower.patch	832 bytes	catch
#110	remove_lower_5.patch	18.91 KB	asimmonds
#108	remove_lower_3.patch	17.45 KB	asimmonds
#101	remove_lower.patch	16.44 KB	david strauss
#99	remove_lower.patch	16.44 KB	david strauss
#98	remove_lower.patch	16.28 KB	david strauss
#95	remove_lower.patch	16.28 KB	david strauss
#91	lower.patch	18.94 KB	robertdouglass
#75	user_lower.patch	20.16 KB	catch
#75	user_lower.patch	18.82 KB	catch
#74	user_lower.patch	17.66 KB	catch
#55	user-lower-279851-55.patch	62.06 KB	pwolanin
#48	user_lower_3.patch	12.71 KB	drawk
#45	user_lower.patch	12.81 KB	johnalbin
#44	user_lower.patch	13.04 KB	catch
#38	user_lower.patch	13.18 KB	catch
#36	user_lower.patch	13.15 KB	catch
#32	userlower_11.patch	12.85 KB	damien tournoud
#31	userlower_10.patch	12.58 KB	catch
#28	userlower_8.patch	12.48 KB	catch
#27	userlower.patch	12.44 KB	catch
#24	userlower.patch	12.39 KB	catch
#23	userlower-5.patch	10.16 KB	damien tournoud
#21	userlower.patch	9.46 KB	catch
#20	userlower.patch	8.5 KB	catch
#18	userlower.patch	8.14 KB	catch
#10	userlower.patch	7.84 KB	catch
#7	userlower.patch	7.86 KB	catch
	userlower.patch	7.86 KB	catch

Comment #1

senpai commented 8 July 2008 at 18:27

Subscribing for a review.

Log in or register to post comments

Comment #2

cburschka

they

commented 8 July 2008 at 20:05

You didn't mention whether {user}.username will be constrained to lower case, but I presume that that's how you got rid of the LOWER() query?

I'll review and run some test cases...

Log in or register to post comments

Comment #3

cburschka

they

commented 8 July 2008 at 20:58

I'm getting the same 6 fails in registration, plus 25 fails in uploading a picture (haven't compared with the unpatched site yet). I'll investigate a bit.

Log in or register to post comments

Comment #4

catch

he/him

English

commented 8 July 2008 at 21:22

{user}.username will be constrained to lower case, yeah.

Log in or register to post comments

Comment #5

cburschka

they

commented 8 July 2008 at 21:23

False alarm on upload. It's just curl being stupid again, returning zero-length responses. When I'm done with this, I'll rewrite my copy of simpletest to use drupal_http_request; it's really completely unusable as it is for me.

This is the content of a debug file for user_check_password:

Password: foo
Account:  {"uid":"3","name":"simpletest_zyFE","username":"simpletest_zyfe","pass":"$P$Cy92GxnBfhOWNhWMZJq7wXKTQ3J13I.","mail":"simpletest_zyFE@example.com","theme":"","signature":"","created":"1215551808","access":"0","login":"0","status":"1","timezone":"3600","language":"","picture":"","init":"simpletest_zyFE@example.com","data":null}
Old hash: $P$Cy92GxnBfhOWNhWMZJq7wXKTQ3J13I.
New hash: $P$Cy92GxnBfSETyVzJNGe2gNaGPEMKBQ1

Password: 4xB6FEL37E
Account:  {"uid":"3","name":"simpletest_zyFE","username":"simpletest_zyfe","pass":"$P$Cy92GxnBfhOWNhWMZJq7wXKTQ3J13I.","mail":"simpletest_zyFE@example.com","theme":"","signature":"","created":"1215551808","access":"1215551816","login":"1215551816","status":"1","timezone":"3600","language":"","picture":"","init":"simpletest_zyFE@example.com","data":null,"roles":{"2":"authenticated user"}}
Old hash: $P$Cy92GxnBfhOWNhWMZJq7wXKTQ3J13I.
New hash: $P$Cy92GxnBfI76LOCSfUyxhbe1n0no1K1

Password: 4xB6FEL37E
Account:  {"uid":"3","name":"simpletest_zyFE","username":"simpletest_zyfe","pass":"$P$Cy92GxnBfhOWNhWMZJq7wXKTQ3J13I.","mail":"simpletest_zyFE@example.com","theme":"","signature":"","created":"1215551808","access":"1215551816","login":"1215551816","status":"1","timezone":"3600","language":"","picture":"","init":"simpletest_zyFE@example.com","data":null}
Old hash: $P$Cy92GxnBfhOWNhWMZJq7wXKTQ3J13I.
New hash: $P$Cy92GxnBfI76LOCSfUyxhbe1n0no1K1

I'll look at _crypt_password next...

Log in or register to post comments

Comment #6

pwolanin commented 8 July 2008 at 22:37

Status:

Needs review

» Needs work

bug here:

+    else if ($key == 'username') {
+      $query[] = "username = LOWER '%s')";
+      $params[] = $value;
+    }

Log in or register to post comments

Comment #7

catch

he/him

English

commented 8 July 2008 at 23:14

Status:

Needs work

» Needs review

Status	File	Size
new	userlower.patch	7.86 KB

Fixed that bug. Also spoke to pwolanin about this in irc, and he suggested a way to avoid drupal_strotolower in user_save() - which means that all the lowercasing is handled by the database now instead of some php, some database.

Still 6 fails in user registration tests. Thanks Arancaytar for the debugging :)

Log in or register to post comments

Comment #8

pwolanin commented 9 July 2008 at 01:55

perhaps this line in the test should change too?

+    $this->assertEqual($user->username, drupal_strtolower($name), t('Username matches. '));

Log in or register to post comments

Comment #9

Crell commented 9 July 2008 at 04:14

Bless you. :-) Visual review:

- I still do not like not having a unique constraint on the display name (nee name). As coded, what would happen if someone set their name to "Crell" and someone else to "crell"? The unique key is on username, which is the case-folded version. So what would happen is username would get set to "crell" for both of them, which would create an SQL error, wouldn't it?

- This code:

+  // Set the username to a lower case version of $user->name if not set.
+  if (empty($array['username'])) {
+    $array['username'] = db_result(db_query("SELECT LOWER('%s')", $array['name']));
+  }

Why are you using the database for the lowercasing? Would its result be any different than doing it in PHP space and avoiding the extra query?

- So then for a contrib module to modify the name value, it would need to:
+ form_alter to show "username"
+ form_alter to NOT show "name"
+ inject its own validate and submit hooks *before* the default ones to set the "name" field by its own logic.

Or alternatively, do the same in hook_user validate/save. Right?

Otherwise I think this is a good direction at this point.

Log in or register to post comments

Comment #10

catch

he/him

English

commented 9 July 2008 at 08:38

Status	File	Size
new	userlower.patch	7.84 KB

@Crell: I've added the unique constraint back to users.name - if a contrib module wants to handle that other issue that's been discussed elsewhere, they could remove the unique key via schema API.

Using the database for lowercasing exclusively was pwolanin's idea, which I reluctantly agree with. There's a chance that certain collations on certain databases are going to operate differently to mb_strtolower (I don't have enough practical experience with collations to know how much of a risk this is) - but by using the database we ensure consistency. Also the database is only being used for lowercasing on writes now.

Additionally, php.net suggests that LOWER() is actually faster than mb_strtolower, so for micro-optimisation there's probably not much in it.

Contrib needs to do exactly what you suggest, yes.

I'll probably update user.test to attempt creating users with siMiLar usernames if that's not in there already. Still at a loss as to why the password hashing is failing.

Log in or register to post comments

Comment #11

dries commented 9 July 2008 at 09:37

Personally, I'd find it very confusing if my display name was 'Dries Buytaert' but my login/username was 'dries'. I'd consider that to be a serious usability regression. Unfortunately, this patch seems to take us down that road and I'm not a big fan of that. Maybe I misunderstood where this was heading though.

Log in or register to post comments

Comment #12

cburschka

they

commented 9 July 2008 at 10:08

I would argue that it makes sense for login names to be permanent and displayed names to be changeable. The big problem with changing login names is that it is hell for the password manager - you don't necessarily want a different login each time the displayed name changes...

Log in or register to post comments

Comment #13

damien tournoud commented 11 July 2008 at 08:53

Edit: I misunderstood the scope of this patch.

@Dries: the new 'username' column is just pregenerated to 'LOWER(name)', in order to avoid a LOWER operation during SELECTs (that can't be indexed).

There is no change whatsoever here on the UI or user-experience part.

@others: would it be possible to improve that patch a little by removing spaces from the username column? That way you will not be able to register both 'Dries Buytaert' and 'DriesBuytaert'.

Did a quick visual review of the code, and it looks great.

Log in or register to post comments

Comment #14

catch

he/him

English

commented 9 July 2008 at 10:46

@Dries, my primary concern here is to remove the LOWER() in user queries. Robert's comparisons from 2006 show the difference in performance with 500k users. And the queries haven't changed since then:

in a typical case we're comparing 1273.97 ms to 0.37 ms

I agree 100% on the 'display name' issue, however, there's a lot of demand for it (including from people who wouldn't argue for it without good reason like Crell and Webchick). This issue is an enabler for that - either in core or contrib, but I'm absolutely not going to introduce any explicit support for such a thing with this patch - there are zero API or UI changes here. The only connection is that this patch doesn't introduce the extra column in such a way that it would block the other issue (by requiring yet another column or something).

@Damien - existing sites might already have a Dries Buytaert and DriesBuytaert - which'd give us a duplicate key error on update and lock one of them out. This patch currently has zero effect on the username policies of any existing sites, and I'd like to keep it that way.

@Arancaytar: #102679: Add a Display Name field to core in addition to Username is over there ;)

Log in or register to post comments

Comment #15

cburschka

they

commented 9 July 2008 at 14:19

Sorry, I misunderstood the scope of this patch.

How severe should we make this uniqueness constraint, as in how much should be stripped from the username field? Beyond spaces, we could strip everything but letters, so that "Dries Buytaert02345" would be unavailable as well. However, the stronger the constraint, the more problems people will have when updating their databases that may violate the new rules.

Log in or register to post comments

Comment #16

Crell commented 9 July 2008 at 15:01

@Dries: There are actually extremely good reasons to allow that distinction. In my case, I can't control the login name people choose when signing up. However, nobody on my site knows each other by their login name. We actually use real mother-given names for everything. Then when I try to select someone's name using the auto-complete author field, how do I know that "Dries Buytaert", who I'm trying to set as the author of a node, is really "trkfan849"? I don't. Allowing these fields to be split is a usability improvement, because right now if you use theme_username then you get a disconnect between user's identifying strings in different contexts. That said, this patch as pwolanin said doesn't change the default behavior other than silently setting up a pre-lowercased username for performance. I suspect Drupal.org would never override that.

@pwolanin: If that's the reason for using LOWER() in SQL, that should be documented in the code.

Also, having written a D5 module that tries to custom-set the login name field, I can tell you that it is not as easy as that to set a custom value there because the user workflow sucks. :-) I'd have to try it with this mechanism, but let's be careful that "just form alter it" isn't our answer unless we know that will actually work. Anyone have time to write up a demo module as proof of concept? (I wish I did, but I probably don't.)

Log in or register to post comments

Comment #17

catch

he/him

English

commented 9 July 2008 at 15:16

The unique constraint is exactly like it is now - a unique index on the column and form validation to avoid duplicates with case variations. This means you can't have Dries and dries, but you can have Dries1 etc. - again, exactly the same as now. Anyone wanting to force more unique usernames also needs to find another issue to do so in, since it's not within scope.

Log in or register to post comments

Comment #18

catch

he/him

English

commented 9 July 2008 at 15:26

Status	File	Size
new	userlower.patch	8.14 KB

@Crell. I've added documentation to both user.module and user.test to explain the justification for using LOWER(). If someone wants to come along and tell me it's not necessary, that's fine too.

Log in or register to post comments

Comment #19

johnalbin

they/他

English

Taipei, Taiwan

commented 9 July 2008 at 19:51

I think this patch also dovetails nicely as an excellent first step on the road to "display name" capabilities in core (and that’s the last I’ll say about it in this issue.) Given how out of control the others issues have become, we definitely need to limit the scope of this issue.

Catch, are you re-rolling the patch after your IRC chat with chx? Or did I misread that discussion?

Log in or register to post comments

Comment #20

catch

he/him

English

commented 9 July 2008 at 19:34

Status	File	Size
new	userlower.patch	8.5 KB

Following discussion with chx, we keep LOWER() within the database during user_save(). I've also modified user.test to use a real SELECT query as well.

Also, this version of the patch passes all user tests now (as a result of that change), and doesn't harm any other tests either.

Log in or register to post comments

Comment #21

catch

he/him

English

commented 9 July 2008 at 20:34

Status	File	Size
new	userlower.patch	9.46 KB

Last patch had some silly errors in it, this one should be good.

Log in or register to post comments

Comment #22

dahacouk commented 11 July 2008 at 08:43

@catch: Well done for coming up with this approach! Will your patch make it easier for a contrib modules to add non-unique display name functionality (emphasis on "non-unique")? In other words will $user->name (the display name) be able to be made non-unique if desired?

@Dries in #11: There are many webadmins calling out for display names being distinct from login names. There are many hacks being produced for what, some see, as a core function of a web site: being able to display a real name as different from their login name. This scenario would be optional for webadmins. And there's no reason why the current situation (of just one display/login name) couldn't be maintained too. But it's good to get some standardisation for this functionality into core so that developers can start to deliver more consistent solutions. Have a peek at #102679: Add a Display Name field to core in addition to Username to see more people wanting display name functionality in core. Sure, this patch doesn't go all the way but it's a start.

@Arancaytar in #12: Why is the "big problem with changing login names... hell for the password manager"? Surely, before a login name change request is actioned, the password manager just has to check that there are no duplicates. How is that "hell"? I say keep things flexible unless there's a real security issue or big hit on performance.
-1 to permanent login names

@Damien Tournoud in #13: Why "removing spaces"? If you are talking about "removing spaces from the username" - as in display name - then we wont be able to mimic real names. So, I see no advantage here.
-1 to removing spaces from the username column

@JohnAlbin #19: Nicely restrained! ;-) You're right, it does seem that we need to do this in small steps. Looks like we'll get "display name" capabilities in core around Drupal9. ;-)

Cheers Daniel

Log in or register to post comments

Comment #23

damien tournoud commented 11 July 2008 at 10:18

Status	File	Size
new	userlower-5.patch	10.16 KB

@dahacouk: I misunderstood the scope of this patch at first, please don't broaden it. This patch will ease the job of contrib modules to implement display names distinct from login names. It is only a first step in that direction.

Here is an updated version of catch#21:

Change queries in user_save() to use uid not name as the key,
Change user_is_blocked() to use the username, (this needs to be discussed)
Fix query in user_search(),
Fix _user_edit_validate() to allow modules to implement not distinct names.

Log in or register to post comments

Comment #24

catch

he/him

English

commented 11 July 2008 at 10:50

Title:

Add column 'username' to {user}, remove LOWER(name)

» Performance: Add column 'username' to {user}, remove LOWER(name)

Status	File	Size
new	userlower.patch	12.39 KB

Changing the title to be more descriptive.

I checked coverage.cwgordon.com and noticed that some of the user queries aren't being tested yet (i.e. user search), so this adds a test for search/user to verify they work as expected.

I'm not really set up for benchmarks, but will see if I can do some basic comparisons.

Log in or register to post comments

Comment #25

johnalbin

they/他

English

Taipei, Taiwan

commented 11 July 2008 at 11:03

Title:	Performance: Add column 'username' to {user}, remove LOWER(name)	» Add column 'username' to {user}, remove LOWER(name)
Status:	Needs review	» Needs work

Visual review:

user_is_blocked still has a $name variable in it. Should be $username.
In _user_edit_validate, it should be $has_username = !empty($edit['username']); note the missing “!”.
Also, Damien, I think your additions to _user_edit_validate to allow non-distinct names is scope creep.
So I’m looking at the un-patched user_is_blocked and I’m wondering how WHERE status = 0 AND name = LOWER('%s') even works; shouldn't that be WHERE status = 0 AND LOWER(name) = LOWER('%s')?

Log in or register to post comments

Comment #26

damien tournoud commented 11 July 2008 at 11:11

@JohnAlbin:

1. user_is_blocked still has a $name variable in it. Should be $username.
2. In _user_edit_validate, it should be $has_username = !empty($edit['username']); note the missing “!”.

Indeed, thanks.

3. Also, Damien, I think your additions to _user_edit_validate to allow non-distinct names is scope creep.

We are not far from being able to drop the unique constraint on {user}.name, why not following the route a little further? At the minimum, we need to only validate the uniqueness of {user}.username, because if we don't, contrib modules will not be able to drop the constraint themselves.

4. So I’m looking at the un-patched user_is_blocked and I’m wondering how WHERE status = 0 AND name = LOWER('%s') even works; shouldn't that be WHERE status = 0 AND LOWER(name) = LOWER('%s')?

It works on MySQL because the default collation is case-insensitive. Obviously there is a bug here.

Log in or register to post comments

Comment #27

catch

he/him

English

commented 11 July 2008 at 11:16

Title:	Add column 'username' to {user}, remove LOWER(name)	» Performance: Add column 'username' to {user}, remove LOWER(name)
Status:	Needs work	» Needs review

Status	File	Size
new	userlower.patch	12.44 KB

Crossposted with Damien, and looks like JohnAlbin crossposted with me. #23 introduces 5 test failures which aren't in #24 and also uses uid as a string in user_save.

Rolled the user_search fix from #23 into this one, which otherwise is identical to #24. I'm not knee-jerk against the other changes, but please no test regressions or scope creep.

Log in or register to post comments

Comment #28

catch

he/him

English

commented 11 July 2008 at 11:19

Status	File	Size
new	userlower_8.patch	12.48 KB

Or typos ;)

Log in or register to post comments

Comment #29

johnalbin

they/他

English

Taipei, Taiwan

commented 11 July 2008 at 11:30

Status:

Needs review

» Needs work

Trying VERY hard not to cross post again…

My review in #25 was of Damien's patch in #23. :-p

We are not far from being able to drop the unique constraint on {user}.name, why not following the route a little further? At the minimum, we need to only validate the uniqueness of {user}.username, because if we don't, contrib modules will not be able to drop the constraint themselves.

Again, Damien, this idea is scope creep. Dropping the unique constraint on {user}.name is highly debatable (even for contrib.) But the debate is out-of-scope for this issue. We can revisit your proposed change to _user_edit_validate in a follow up issue. :-)

Catch, I think Damien is correct; the user_save db queries do need to do WHERE uid = %d instead of WHERE name = '%s'", $array['name']. That’s what user_save uses currently: http://api.drupal.org/api/function/user_save/7 Or does that cause simpletest breakage? I haven't run them as I’m still visually reviewing the patches.

Also patch in #28 has “No newline at end of file”. But that’s highly pedantic, so you can ignore me on that one if you want.

Log in or register to post comments

Comment #30

damien tournoud commented 11 July 2008 at 11:33

Dropping the unique constraint on {user}.name is highly debatable (even for contrib.) But the debate is out-of-scope for this issue. We can revisit your proposed change to _user_edit_validate in a follow up issue. :-)

Well, we can drop the INDEX constraint on the {user}.name column while keeping it on {user}.username. In the standard behavior, the uniqueness of the {user}.name column is still guaranteed (because out of the box the {user}.username column is the lowercase form of {user}.name, thus if the former is unique, the latter also is).

Log in or register to post comments

Comment #31

catch

he/him

English

commented 11 July 2008 at 11:42

Status:

Needs work

» Needs review

Status	File	Size
new	userlower_10.patch	12.58 KB

Bug in the test. Should try to get patches right harder than I try not to crosspost.

@JohnAlbin's question in #25 - using uid instead of $array['name'] there caused simpletest breakage, yeah. Ways around this much appreciated.

Log in or register to post comments

Comment #32

damien tournoud commented 13 July 2008 at 14:09

Status	File	Size
new	userlower_11.patch	12.85 KB

Catch: sorry for #23 I should have indicated that those were completely untested.

Here is a update from #31, including the change to use uid as a key in user_save().

Edit: fix bad spelling that hurt my eyes.

Log in or register to post comments

Comment #33

catch

he/him

English

commented 11 July 2008 at 12:18

#32 is fine. Thanks Damien :)

Here's a quick comparison for performance on 98,000 users generated by devel, these run from the command line:

SELECT * FROM users WHERE LOWER(name) = LOWER('nogahi');
1 row in set (0.14 sec)

Subsequent queries for different names always take between 0.13 to 0.14 seconds. query_cache is only used if the name is identical.

SELECT * FROM users WHERE username = LOWER('nogahi');
1 row in set (0.04 sec)

query_cache gets used even for different names - so 0.00 seconds for subsequent similar queries.

So we get a double improvement - 3-4 times faster on these selects the first time they're run, but the query_cache in MySQL will handle most subsequent requests.

If someone has a decent benchmarking setup, it'd be great to get more scientific data than this - or with some advice I can probably do something a bit better at home.

Log in or register to post comments

Comment #34

pwolanin commented 13 July 2008 at 13:51

minor concerns re: text/code comments:

The schema description text seems a little incorrect or misleading:

-        'description' => t('Unique user name.'),
+        'description' => t('User name for display.'),
+      ),
+      'username' => array(
+        'type' => 'varchar',
+        'length' => 60,
+        'not null' => TRUE,
+        'default' => '',
+        'description' => t('Unique user name for login.'),

$user->name *is* still constrained to be unique, so I think the description does not need to change at all.
The username description should reference the fact that it's a LOWER() version of $user->name used to speed lookups during login, search, etc.

+    // We use the database to do the lowercasing to ensure that there are\
+    // no discrepencies in collation handling between the database and PHP.

I think should reference character handling or character set, not collation, since I think collation just determines sort order.

something like:

+    // We use the database to do the lowercasing to ensure correct searches
+    // in case of any discrepancy in character handling between the database and PHP.

Log in or register to post comments

Comment #35

damien tournoud commented 13 July 2008 at 14:15

@pwolanin: in MySQL terminology, "collations" refers to the set of rules for comparing characters (for both inequalities and equalities). Collations for are used for sorting, for matching and for normalizing strings before putting them into index classes.

This said, I like the wording of your comment better :)

Log in or register to post comments

Comment #36

catch

he/him

English

commented 13 July 2008 at 15:49

Status	File	Size
new	user_lower.patch	13.15 KB

Those changes are fine, and while collation is correct, I agree with Damien that the revised version explains the motivation better.

New patch attached.

Log in or register to post comments

Comment #37

pwolanin commented 13 July 2008 at 17:38

to be consistent with other schema comments, this:

t('LOWER() version of $user->name for login, search performance.'),

should be written more like:

t('LOWER() version of {users}.name for login, search performance.'),

Log in or register to post comments

Comment #38

catch

he/him

English

commented 13 July 2008 at 20:05

Status	File	Size
new	user_lower.patch	13.18 KB

Yes. good point.

Log in or register to post comments

Comment #39

R.Muilwijk commented 14 July 2008 at 07:28

Tests look good to me , ran them all. It doesn't break any tests that don't fail normally. I also like the idea of not having to use a LOWER() on the database. So +1 from me!

Log in or register to post comments

Comment #40

senpai commented 14 July 2008 at 20:09

Status:

Needs review

» Reviewed & tested by the community

Tested against a fresh checkout of HEAD, but not a clean DB.

I created 10 users with all manner of mixed case, spaces, symbols, and the like. All form submissions were treated as they should have been, with illegal symbols being prohibited, and mixed-case or spaces being allowed. In each instance, I checked the database to verify core's actions and they were good.

I then deleted nine of those users, and in each case the deletion happened flawlessly as it should have. I kept the last user for a Change Own Username test, and it passed that with flying colors.

This patch allows users to login with either their uppercase name, lowercase name, or any mixed-case variant as long as the $user->name field matches the same sequence of letters and numbers stored in $user->username. I don't see that as a problem, but some people might. It basically allows me to login as either 'Dries Buytaert' or 'dries buytaert' but not 'driesbuytaert' because there's a space in the original $user->name.

Simpletests
User.mod == 231 passes, 0 fails, 0 exceptions
(should we be testing for user name changes as well?)

I'm good with this patch. RTBC.

Log in or register to post comments

Comment #41

dahacouk commented 15 July 2008 at 06:13

Is this patch meant to enable me to have a (display name) $user->name = "Daniel Harris" and a (login name) $user->username = "dahacouk" - as I usually like to do?

Or is $user->username only just a lowercase representation of $user->name?

Log in or register to post comments

Comment #42

damien tournoud commented 15 July 2008 at 09:50

@dahacouk: this is out of this patch's scope, but this patch do open the way for contribution modules to implement what you describe.

Log in or register to post comments

Comment #43

senpai commented 15 July 2008 at 23:00

Daniel, This patch creates a duplicate username of every user in the system, and automatically lowercases it. That's all it does. It's an enabler, if you will. A stepping stone to further greatness. It has to go into core before anything like the Add a Display Name to core issue could even begin to be realized.

The great thing about this patch is that it speeds up Drupal by a magnitude or two when trying to locate or manage users within the system. Once this is committed, other patches can surface to handle what actually happens to these usernames. For now, we're all going to be extremely happy when this is committed.

Log in or register to post comments

Comment #44

catch

he/him

English

commented 17 August 2008 at 18:14

Status	File	Size
new	user_lower.patch	13.04 KB

Re-rolled following #237381: user_save parameter poorly named

Log in or register to post comments

Comment #45

johnalbin

they/他

English

Taipei, Taiwan

commented 17 August 2008 at 21:53

Status	File	Size
new	user_lower.patch	12.81 KB

The short description of a function's docblock should be on one line, so this updated patch only changes the docblock for user_update_7002(). No code changes from catch's patch, so credits still go to him. :-)

Re-ran ALL tests; they all still pass after the patch. “4940 passes, 0 fails, 0 exceptions”

Log in or register to post comments

Comment #46

senpai commented 23 August 2008 at 15:40

Bumping this as an RRTBCTRBC (Really Really Reviewed and Tested by the Community and Thoroughly Ready to be Committed).

Commit commit commit!

Log in or register to post comments

Comment #47

maartenvg commented 7 September 2008 at 11:52

Status:

Reviewed & tested by the community

» Needs work

Patch no longer applies.

patching file modules/system/system.install
Hunk #1 FAILED at 366.
1 out of 1 hunk FAILED -- saving rejects to file modules/system/system.install.rej
patching file modules/user/user.install
patching file modules/user/user.module
Hunk #7 succeeded at 1545 (offset 2 lines).
patching file modules/user/user.pages.inc
Hunk #1 FAILED at 12.
1 out of 1 hunk FAILED -- saving rejects to file modules/user/user.pages.inc.rej
patching file modules/user/user.test
Hunk #3 FAILED at 468.
1 out of 3 hunks FAILED -- saving rejects to file modules/user/user.test.rej

Log in or register to post comments

Comment #48

drawk commented 7 September 2008 at 13:36

Status:

Needs work

» Needs review

Status	File	Size
new	user_lower_3.patch	12.71 KB

Reroll

Log in or register to post comments

Comment #49

catch

he/him

English

commented 7 September 2008 at 23:15

This probably ought to be re-rolled again for the new db layer changes. However, since it was at RTBC for a very long time with no comment, I'd quite like view from webchick or Dries on the general approach - which I don't expect to change after conversion.

Log in or register to post comments

Comment #50

BioALIEN commented 8 September 2008 at 15:45

@catch: It's a performance improvement, but I guess they want to see some more benchmarks before they commit this? What you've covered in #33 is great. Can we see some benchmarks on bigger (or even tiny) sites to better understand the performance gain?

+1 on testing this out on scratch.d.o

Log in or register to post comments

Comment #51

pwolanin commented 8 September 2008 at 18:47

The patch adds a new unique key for 'username' but if we are doing all look ups using the new column is there any reason to retain the unique key on user.name?

Log in or register to post comments

Comment #52

david strauss

he/him

commented 8 September 2008 at 19:57

@pwolanin: There is no need to keep any key/index on {user}.name. We should never be putting conditions or sort criteria on {user}.name after this patch.

Log in or register to post comments

Comment #53

Crell commented 8 September 2008 at 20:46

Not true. I may want to list all users alphabetically by their public name as part of a member directory. At minimum having an index on that field will keep such queries from killing the server. (I've already said my piece on the uniqueness question.)

Log in or register to post comments

Comment #54

pwolanin commented 8 September 2008 at 21:29

@Crell - I don't understand your comment. This patch maintains a 1-to-1 mapping of name to username.

Log in or register to post comments

Comment #55

pwolanin commented 9 September 2008 at 00:58

Status	File	Size
new	user-lower-279851-55.patch	62.06 KB

I re-rolled the patch to remove the index on name, and altered the code in other modules so all WHERE or ORDER BY should be using username rather than name.

Ran all tests - all tests pass with this patch (5431 passes, 0 fails, 0 exceptions).

Log in or register to post comments

Comment #56

maartenvg commented 9 September 2008 at 10:07

It applies fine, and I get no errors. But I was wondering: although this is a database change, there is nothing for update.php to do, is that by design?

Log in or register to post comments

Comment #57

pwolanin commented 9 September 2008 at 11:36

If you already applied this patch before, the update might not run.

Log in or register to post comments

Comment #58

maartenvg commented 9 September 2008 at 13:09

Nope, that is not the case, I applied it to a fresh install.
- download fresh 7x tarball
- extract
- install
- apply patch
- run update.php

The last step says "No pending updates."

I've checked and the installation does not detect the presence of user_update_7002() (database reports last scheme to be 7001). I already came across the same strange behavior when creating a patch for #165786: DB Silliness: aggregator_item.description ought to be called body, but thought it might be my fault.

I believe this behavior was introduced when #286035: Remove update.php number dropdowns went in, not sure though.

EDIT: I've traced this back to http://drupal.org/cvs?commit=137218, therefore it is unrelated to this issue. You can ignore my comments about not functioning updates.

Log in or register to post comments

Comment #59

johnalbin

they/他

English

Taipei, Taiwan

commented 9 September 2008 at 16:00

Status:

Needs review

» Needs work

Yes, Maarten, #286035: Remove update.php number dropdowns is the culprit for the "no pending updates" issue. But it has just now been fixed (again).

@pwolanin: re: Crell's last comment. He meant he's already talked about the uniqueness several times. starting with #9 above in this issue and all over #102679: Add a Display Name field to core in addition to Username.

And, after a really, REALLY long discussion, we finally got everyone in #102679 to agree that name needs to remain unique.

In addition, removing the unique constraint on name isn't needed for a performance patch. Let's keep the scope focused so as not to repeat #102679.

Log in or register to post comments

Comment #60

catch

he/him

English

commented 9 September 2008 at 16:22

Status:

Needs work

» Needs review

Unnecessary indexes are a performance hit on insert, there's no reason to add that hit if we don't need it. Uniqueness is enforced in PHP anyway, at least without contrib intervention, so I'm marking this back to needs review.

Log in or register to post comments

Comment #61

david strauss

he/him

commented 9 September 2008 at 17:07

@Crell: I do agree that a unique index on the case-preserved name would enhance data integrity a little, but it's going to be inconsistent. The index is built off the collation that DB server is using, so the uniqueness on "name" will be case-insensitive on MySQL and case-sensitive on PostgreSQL. Trying to create equivalent contraints on "name" brings us back to the problem of making a portable, case-insensitive index, which is the problem this patch is trying to solve.

Log in or register to post comments

Comment #62

Crell commented 9 September 2008 at 20:07

OK, I think everyone misunderstood what I was saying in #53.

1) I've said my piece on having a UNIQUE constraint on the name column, so I'm not going to say more on it here.

2) There are use cases where you will want to ORDER BY name, so having an index (unique or otherwise) on that field will help performance in those cases. So -1 on not having an index on the name column at all. That field won't be modified often enough that the extra cost on INSERT will matter, IMO.

Log in or register to post comments

Comment #63

maartenvg commented 9 September 2008 at 20:42

I agree with Crell on at least 2). There are enough use cases where we would want to sort on the `name` column, rather than the `username` column. Especially with the future (by core or contributions) in mind where `name` won't be a 1-to-1 mapping to `username` anymore.

Log in or register to post comments

Comment #64

dahacouk commented 16 September 2008 at 11:11

@JohnAlbin in #59: I am not trying to "repeat #102679" here - this is a performance patch after all.

But I must take exception to your statement "And, after a really, REALLY long discussion, we finally got everyone in #102679 to agree that name needs to remain unique."

Firstly, I did not agree to uniqueness of name so please either remove that statement for the record or qualify who exactly you mean by "everyone". Lets be accurate folks!

#102679 and this issue have different goals, remember. This issue does not solve the issue of #102679. Please don't muddy the waters.

After this patch has been accepted I'll be excited to find out how it can really help fulfil #102679.

Full steam ahead! ;-)

Cheers Daniel

Log in or register to post comments

Comment #65

johnalbin

they/他

English

Taipei, Taiwan

commented 16 September 2008 at 14:37

Re: catch: "Unnecessary indexes are a performance hit on insert, there's no reason to add that hit if we don't need it."

Agreed, but I don't feel this is an unnecessary index.

Since core won't be using the name column for any of its querying (just for displaying a capitalization-preserved version of username), it opens up the possibility that contrib can extend the data of that field. But actively deciding to remove the existing index means we have to fight to get it back in another issue, because (regardless if its unique or not) that column would need an index to query on it. And if the length of #102679 is any indication, getting it back will never, never happen.

And, yes, I know my argument is scope-creep. But I'm not asking for a new feature of the patch, I'm asking for you to not remove an existing index.

Also, how big is this hit for a db with a large number of users? seconds? miliseconds? http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html shows that, for a table with 320,000,000 records (everyone in the US), each index insertion would be log(320,000,000) = 8.5 slower than an empty table, but that page doesn't show actual numbers. When I tested inserting a single user into a 200,000 users table with and without a name index, both results were the same "Query OK, 1 row affected (0.01 sec)". That, of course, is nowhere near exhaustive testing, but extrapolating to a table with 320 million users means the performance hit would be somewhere less than 0.085 seconds. Unless my math is wrong, which it could be, leaving in the index seems a negligible performance "hit".

Log in or register to post comments

Comment #66

Crell commented 16 September 2008 at 15:04

JohnAlbin said what I was trying to say far better than I did. :-)

Let's also remember we're talking about a table that on a super-busy site like d.o has had ~300,000 inserts in 7 years. It's not like we're talking about hammering watchdog. If we lose a tenth of a second when creating a new user but avoid a full table scan for any contrib that wants to do something useful with that column, that's a huge win.

Really, removing the index is unnecessary premature optimization.

Log in or register to post comments

Comment #67

catch

he/him

English

commented 16 September 2008 at 16:53

Status:

Needs review

» Needs work

I'm fine with putting the index back in. This needs re-rolling for dbtng as well.

Log in or register to post comments

Comment #68

webchick

she/they

English

Vancouver 🇨🇦

commented 17 September 2008 at 02:17

Holy cow, guys. ;)

Ok. My take on this patch is:

The only point of this patch is to add a lowercase username column is for improved query performance. That's it. It's not for a usability improvement. It's not so we can do non-unique display names. It's not so users can have separate display names from internal usernames. It's not for enhanced security. It's not for $insert_pet_registration_feature_here. Performance! That's it. Any other related features/changes enabled by this patch should be discussed in other issues. Thanks for doing your part to keep patches to one context.
It should not affect any of the existing behaviour of the registration/login system, from a user perspective. They should not see the user.username field exposed to them, and should not be required to log in with anything but their normal users.name. This was Dries's concern, and it's mine as well. The patch as it stands does not (or at least doesn't intend to; haven't tested it myself yet to make sure it works), and that is wonderful and as it should be.
I'm not that fussed about index performance on inserts, to be honest. I agree that there are many use cases, particularly with Views, where pulling up a list of users sorted by users.name, so having an index on it probably makes a lot of sense. The important thing to me, again, is to NOT change the behaviour of the current code in any way.

I think that's everything that needs weighing in on here. Hope that helps.

Log in or register to post comments

Comment #69

pwolanin commented 17 September 2008 at 03:09

In terms of an extra and unneeded index - I think the size in memory of the index is much more of a concern than insert speed - hence I'd assert that #52 is correct. No query should be putting a condition on name after this patch, only on username. Hence we don't need the index.

Log in or register to post comments

Comment #70

robertdouglass commented 17 September 2008 at 09:35

I'm thrilled with the direction of this patch. Thorn in my side since 2006.

Log in or register to post comments

Comment #71

catch

he/him

English

commented 17 September 2008 at 19:13

Since username and name with be identical in core, if contrib wants to try to do something funky with name, they could also add the index back via schema API. If I can find the motivation I'll re-roll with both options though if no-one beats me to it.

Log in or register to post comments

Comment #72

Crell commented 17 September 2008 at 22:22

admin/user/user in core allows sorting on the username. Presumably that will use "name" not "login name" in the new setup, so that when contribs do fork off the display name it will make sense. (And let's be honest, that's not the main purpose of this patch but we all know that is going to happen.) Thus, a core page will be sorting by name, not login name. Thus, we do need the index on that column.

Log in or register to post comments

Comment #73

pwolanin commented 18 September 2008 at 00:56

@Crell - see above - a contrib module that needs a different index can add it. Let's deal with what core needs.

Log in or register to post comments

Comment #74

catch

he/him

English

commented 18 September 2008 at 01:47

Status	File	Size
new	user_lower.patch	17.66 KB

Here's a re-roll to re-sync with HEAD and use named placeholders in all cases except the user_load query builder (which needs updating in a separate patch) and wildcards since I'm not sure how they work now. I'm getting ten fails with dblog.test and some in the user tests which I'll track down tomorrow. Looks like user.admin.inc was missed in pwolanin's patch to me, we should be able to use username for sorting there too really.

Log in or register to post comments

Comment #75

catch

he/him

English

commented 18 September 2008 at 12:40

Status:

Needs work

» Needs review

Status	File	Size
new	user_lower.patch	18.82 KB
new	user_lower.patch	20.16 KB

Fixed the typo that was causing test failures and updated user.admin.inc to use the correct column for tablesort_sql. So here's a fully functional patch for review.

Log in or register to post comments

Comment #76

yoroy commented 18 September 2008 at 12:43

Status:

Needs review

» Needs work

Catch polled me about what made moste sense to me in the sorting: A-Z, then a-z,or a combined Aa - Zz.
I'd strongly suggest using Aa - Zz, otherwise you basically get 2 seperate lists.

Log in or register to post comments

Comment #77

catch

he/him

English

commented 18 September 2008 at 13:05

Status:

Needs work

» Needs review

Sticking back to needs review since this is the behaviour in the current patch, I think yoroy cross-posted.

Log in or register to post comments

Comment #78

yoroy commented 18 September 2008 at 14:04

yes, I didn't change status, must've been a crosspost.

Log in or register to post comments

Comment #79

chx commented 18 September 2008 at 17:42

Status:

Needs review

» Reviewed & tested by the community

This patch is a good step forward. While I am not entirely surely this is the best direction we can go, there are indeed a big number of advantages of this, namely the possibility of having a separate username (edit:), the small code footprint, the simplicity. By fulfilling this popular request, you nicely sidestepped the necessity to sync the username and LOWER(name) . For now, I think it will do. We can always do more later.

Log in or register to post comments

Comment #80

webchick

she/they

English

Vancouver 🇨🇦

commented 18 September 2008 at 18:17

Status:

Reviewed & tested by the community

» Needs review

So. Since the upgrade path from D6 => D7 is horribly broken right now, I /know/ none of you all tested the upgrade path. :P

Can someone please bring over their users table from a production D5/D6 site with lots of real life examples including names like "Bill O'Malley" and "SüP3RguRL27! and run user_update_7002() and report results?

Log in or register to post comments

Comment #81

catch

he/him

English

commented 18 September 2008 at 19:28

hmm, I tested the upgrade path from d7 - d7 yesterday, but not 6-7. I'm also pretty sure I tested the updates before the 6-7 path got broken (last time it was RTBC), but it seems I didn't post about it and I wouldn't want to swear 100% I actually did.

Log in or register to post comments

Comment #82

maartenvg commented 18 September 2008 at 19:39

I did a test on a user base (about 1200 users). Everything went smooth but during the update, as was the endresult in the database itself. Although I have to admit that I have nothing as extreme as "SüP3RguRL27!, most users have numbers, there are a lot with punctuations (_.:;'^<>) and some use the occasional é and è.

Perhaps someone using a non-Latin alphabet can confirm my positive results?

EDIT: It was a user table from D5, that I copied into the D7 database (and changed the schema a bit to match that of the old D7)

Log in or register to post comments

Comment #83

webchick

she/they

English

Vancouver 🇨🇦

commented 18 September 2008 at 19:43

Ok, and those people with punctuation and és are able to log in okay, and are displayed properly in the user list at admin/user/user?

Log in or register to post comments

Comment #84

maartenvg commented 18 September 2008 at 20:35

I'm getting some conflicting results. I'll do some more rigorous testing.

Log in or register to post comments

Comment #85

damien tournoud commented 18 September 2008 at 21:09

I upgraded the whole users table from drupalfr.org (5000+) users. All went smoothly. The conversion was done properly.

But users with special characters in their names could not login... That's a different issue thou: #310447: Add back SET NAMES="utf8".

With SET NAMES back, all works as expected.

Log in or register to post comments

Comment #86

xzilla commented 18 September 2008 at 21:12

I'm curious if anyone has done testing to see if there is a performance regression by copying all of the usernames into a second column within the table, thereby adding additional footprint and/or i/o load to all of the other queries referencing the table? I ask because I have addressed the above problem (slow lower(name) queries) by adding a function index on lower(name) which solves this specific problem much better while not cuasing additional overhead where it isn't needed.

Log in or register to post comments

Comment #87

maartenvg commented 18 September 2008 at 21:37

Ok. The problems I am having are not related to this issue, but with collation and other char set problems during importing .e.d.

But when I manually create a few users with strange names, and then apply the patch, I do not get problems with displaying or editing these users.

Log in or register to post comments

Comment #88

chx commented 18 September 2008 at 22:30

Status:

Needs review

» Reviewed & tested by the community

Setting status based on the latest comments especially #85 .

Log in or register to post comments

Comment #89

lilou commented 10 October 2008 at 17:27

Status:

Reviewed & tested by the community

» Needs work

Patch #75 cannot apply against CVS/HEAD :

drupal\7.x\modules\system\system.install (Cannot apply hunk @@ 17 )
drupal\7.x\modules\user\user.module (Cannot apply hunk @@ 7 )

Back to CNW.

Log in or register to post comments

Comment #90

robertdouglass commented 12 October 2008 at 09:19

In system.install, why is status left out of this query?

+  db_query("INSERT INTO {users} (name, username, mail, created, data) VALUES(:placeholder, :placeholder, :placeholder, :time, :serial)", array(':placeholder' => 'placeholder-for-uid-1', ':time' =>  REQUEST_TIME, ':serial' => serialize(array())));
-  db_query("INSERT INTO {users} (name, mail, created, status, data) VALUES('%s', '%s', %d, %d, '%s')", 'placeholder-for-uid-1', 'placeholder-for-uid-1', REQUEST_TIME, 1, serialize(array()));

Log in or register to post comments

Comment #91

robertdouglass commented 12 October 2008 at 09:33

Status:

Needs work

» Needs review

Status	File	Size
new	lower.patch	18.94 KB

Rerolled.

Log in or register to post comments

Comment #92

robertdouglass commented 12 October 2008 at 09:55

100% tests pass: 6988 passes, 0 fails, 0 exceptions

Log in or register to post comments

Comment #93

catch

he/him

English

commented 12 October 2008 at 09:57

Status:

Needs review

» Needs work

We should be using db_insert() for that query too.

Log in or register to post comments

Comment #94

c960657 commented 30 November 2008 at 17:39

Patch no longer applies to HEAD.

Log in or register to post comments

Comment #95

david strauss

he/him

commented 7 December 2008 at 12:04

Status	File	Size
new	remove_lower.patch	16.28 KB

Here's a re-roll. I have not tested it.

Log in or register to post comments

Comment #96

david strauss

he/him

commented 7 December 2008 at 12:05

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #97

swentel commented 7 December 2008 at 13:46

Status:

Needs review

» Needs work

We don't use t() anymore in schema descriptions, so that has to go out.

Log in or register to post comments

Comment #98

david strauss

he/him

commented 7 December 2008 at 21:16

Status:

Needs work

» Needs review

Status	File	Size
new	remove_lower.patch	16.28 KB

Re-roll to remove t() from schema description.

Log in or register to post comments

Comment #99

david strauss

he/him

commented 7 December 2008 at 21:19

Status	File	Size
new	remove_lower.patch	16.44 KB

Let's also fix the erroneous description of the old "name" column.

Log in or register to post comments

Comment #100

7 December 2008 at 21:25

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #101

david strauss

he/him

commented 7 December 2008 at 22:39

Status:

Needs work

» Needs review

Status	File	Size
new	remove_lower.patch	16.44 KB

Re-roll with non-conflicting update.

Log in or register to post comments

Comment #102

7 December 2008 at 22:55

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #103

david strauss

he/him

commented 7 December 2008 at 23:24

From what I can tell, we seem to be experiencing a bug in the test framework. It doesn't make sense for user login to fail only under the picture submission test case. The exceptions result from trying to insert "u" as the line number into the test results table.

Log in or register to post comments

Comment #104

c960657 commented 7 December 2008 at 23:41

This is due to #328781: Fix horrible things in the error reporting. If you apply the patch from that issue, you will get better error messages when running the tests.

Log in or register to post comments

Comment #105

david strauss

he/him

commented 7 December 2008 at 23:45

All tests pass if we apply #328781, though we still get "Undefined index: name" exceptions in various places.

Log in or register to post comments

Comment #106

stewsnooze

he / him

commented 26 December 2008 at 22:59

subscribe

Log in or register to post comments

Comment #107

dave reid

he/him

English

Nebraska USA

commented 20 January 2009 at 05:22

Issue tags:

+Performance

Log in or register to post comments

Comment #108

asimmonds commented 22 January 2009 at 21:38

Status:

Needs work

» Needs review

Status	File	Size
new	remove_lower_3.patch	17.45 KB

Rerolled for current HEAD.

The "Undefined index: name" exceptions were from that user_save() can be called with only what needs to be updated in the $edit array. For example, in /admin/user when you block/unblock multiple users, $edit only has the status attribute set, and not the user or pass attributes.

Log in or register to post comments

Comment #109

27 January 2009 at 00:55

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #110

asimmonds commented 7 February 2009 at 10:08

Status:

Needs work

» Needs review

Status	File	Size
new	remove_lower_5.patch	18.91 KB

Rerolled for current HEAD.

Log in or register to post comments

Comment #111

dave reid

he/him

English

Nebraska USA

commented 7 February 2009 at 15:27

Status:

Needs review

» Needs work

1) These in system_install() will fail on either PostgreSQL or SQLite because of a bug in PDO that placeholders cannot be reused:

+  db_query("INSERT INTO {users} (name, username, mail) VALUES(:empty, :empty, :empty)", array(':empty' => ''));
+  db_query("INSERT INTO {users} (name, username, mail, created, data) VALUES(:placeholder, :placeholder, :placeholder, :time, :serial)", array(':placeholder' => 'placeholder-for-uid-1', ':time' =>  REQUEST_TIME, ':serial' => serialize(array())));

2) Having the following code twice in user_save is dubious:

+    // Ensure $user->username is lowercase if it's the same as $user->name.
+    // We use the database to do the lowercasing to ensure correct searches
+    // in case of any discrepencies in character handling between the
+    // database and PHP.
+    // LOWER() is also slightly faster than mb_strtolower() for this task.
+    if (!empty($edit['name']) && $edit['username'] == $edit['name']) {
+      db_query("UPDATE {users} SET username = LOWER(username) WHERE uid = :uid", array(':uid' => $account->uid));
+    }

I don't see why we can't do something in the beginning of user_save() instead of having to add an extra query to every call to user_save(). Are the comments still valid even though we now require PHP 5.2 and have drupal_strtolower()? In fact, you have an added section earlier in user_save:

+  // If username isn't set, populate it with name.
+  if (!empty($edit['name']) && empty($edit['username'])) {
+    $edit['username'] = $edit['name'];
+  }

That should be simply:

  $edit['username'] = drupal_strtolower($edit['name']);

3) On the new test case, class UserSearchTestCase needs a phpdoc block right above it, getInfo() does not, setUp() does not, and testUserSearch() does need a phpdoc block. See the testing code standards at http://drupal.org/node/325974.

4) In addition, there are some sections in testUserSearch() that look smooshed together.

5) A few coding issues:

+  db_query("UPDATE {users} SET uid = uid - uid WHERE username = :empty", array(':empty' =>''));

Log in or register to post comments

Comment #112

Crell commented 7 February 2009 at 20:45

To be clear, the non-reusability of placeholders in PDO is a design decision, not a bug. Some databases support reuse, some don't. Earlier versions of PDO tried to work around that limitation for those databases that didn't, and failed miserably. So in later PDO versions they ripped that out and declared placeholders non-reusable on any DB. (I spoke with Wez Furlong about that about 2 years ago. :-))

Log in or register to post comments

Comment #113

catch

he/him

English

commented 7 February 2009 at 21:00

@davereid, we can't guarantee that the result of drupal_strtolower() and LOWER() will be exactly the same on all collations due to (potentially) different handling of characters in their lowercase versions. I haven't yet seen an example of where this would actually break at all, but it's one of the annoyances which makes both this and the equivalent taxonomy issue ugly. If anyone's got decent information on that subject - whether it's critical, whether we could just document the potential behaviour, that'd be handy.

Log in or register to post comments

Comment #114

catch

he/him

English

commented 27 March 2009 at 19:49

Marked #83738: LOWER(name) queries perform badly; add column name_lower and index. as duplicate despite being the older issue to concentrate efforts in one area.

Log in or register to post comments

Comment #115

markus_petrux commented 27 March 2009 at 20:16

Minor issue about the patch in comment #110: it mixes 'Username' and 'User name' in translatable strings. Which one is correct?

Log in or register to post comments

Comment #116

kbahey commented 27 March 2009 at 21:27

Subscribing.

(Before dww fixes +1 subscribe)

Log in or register to post comments

Comment #117

chx commented 5 April 2009 at 04:04

I think the db_query("UPDATE {users} SET username = LOWER(username) WHERE uid = :uid", array(':uid' => $edit['uid'])); piece needs to go in a helper function. It can't be avoided, agreed.

Log in or register to post comments

Comment #118

catch

he/him

English

commented 5 May 2009 at 11:29

Title:

Performance: Add column 'username' to {user}, remove LOWER(name)

» Performance: Add column 'username' to {users}, remove LOWER(name)

Log in or register to post comments

Comment #119

blake_ commented 8 May 2009 at 14:20

I don't think we should be storing the LOWER() value in the database at all, especially if there's a unique index on it. For any characters that aren't ASCII, each different database may handle the LOWER call differently. For example: MySQL & MSSQL use whatever collation is currently relevant to lower characters, I couldn't find out what Postgres does, SQLlite only lowers ASCII.

My view is that we're introducing here a new column, to provide a sort of "normalised name" that can be used to search for users / prevent a user creating a new account with "similar" characters. I think that the column should be named after what it actually does (something like "normalized_name", "standard_name" etc), and there should be a function (in PHP) that generates this value that then gets stored in the DB when we insert a user. To start with this function can just use PHP string lower function.

The advantages are:

We don't depend on an external function (LOWER) that could be implemented differently between different databases.
A module can strengthen the function if desired. The admin might decide that spaces should be removed, or that vowels with accents/umlauts/macrons were treated as undecorated, or that the characters I l 1 | all got transformed into one value etc. This would reduce similar looking usernames on such sites.
We actually fix this performance issue! We will just be using an index over values in a column, rather than computing something every row.
It extends to any future database engine. It is only another database column. We don't need to write triggers, and we don't need to worry about DB case sensitivity/insensitivity.

My $0.02 :) Discuss.

Log in or register to post comments

Comment #120

joshmiller

he/him

English

Indianapolis, Indiana, USA

commented 23 August 2009 at 23:47

Just read through this post and wanted to find out how it went from RTBC to needs work and not updated for some time. So here's where we are (for all of us):

Why should this go in? A stat in by catch #14 and a quote by Angie in #65

in a typical case we're comparing 1273.97 ms to 0.37 ms

The only point of this patch is to add a lowercase username column is for improved query performance. That's it.

Issue summary since last RTBC #89

#85 through #111 - Issues raised have been addressed
#111 - Dave Reid has some issues (see below for summary)
#112 - Crell says #111-01 issue is real and should be addressed (but, BTW it is by design) -- I added my own "reading between the lines" filter
#113 - catch raises a new issue for LOWER()
#118 - chx agrees that #111-02 is the way to go
#119 - blake_ points out that we could just make it a php function to avoid #113

What should the next patch include to help it be RTBC?

Ok, so that's the summary, and depending on how people feel about #119, we could potentially find another RTBC prior to 9/1 by addressing the following from #111 and #112

Queries cannot include multiple placeholders
We have duplicated code in user_save
On the new test case, class UserSearchTestCase needs a phpdoc block right above it, getInfo() does not, setUp() does not, and testUserSearch() does need a phpdoc block.
Code formatting issues in testUserSearch()
There is a code problem with an SQL statement
Document or fix the LOWER() issue raised by catch in #113

That's what the next patch needs. Hopefully this helps. Also, lets keep in mind Angie's guidance in #65

Log in or register to post comments

Comment #121

joshmiller

he/him

English

Indianapolis, Indiana, USA

commented 24 August 2009 at 00:10

Ignore my last comment, as we have a much better solution provided by catch in IRC:

"I think we can just convert these queries to db_select(), use LIKE which is case insenstive and mapped to ILIKE on postgresql and that'll work without the extra column."

Josh

Log in or register to post comments

Comment #122

joshmiller

he/him

English

Indianapolis, Indiana, USA

commented 24 August 2009 at 00:11

Title:

Performance: Add column 'username' to {users}, remove LOWER(name)

» Rework username queries to improve performance

changing title

Log in or register to post comments

Comment #123

Crell commented 24 August 2009 at 04:08

That particular feature of DBTNG was added with this issue in mind. I am not convinced that it is a good solution, mind you, and a lookup column would likely be faster and more flexible, but in theory it should work.

Log in or register to post comments

Comment #124

kbahey commented 24 August 2009 at 04:49

I am actually in favor of a solution that does not add an extra column. An extra column can introduce inconsistencies, and complicates teh schema. Since only PostgreSQL is case sensitive and the ILIKE solves that, I like that solution since it will remove a slow query for MySQL.

And it seems that we already solved it this way when catch did the user_load_multiple() patch. It already uses db_select(). So the one that caused the most trouble is already out of the way!

    // If the conditions array is populated, add those to the query.
    if ($conditions) {
      // TODO D7: Using LIKE() to get a case insensitive comparison because Crell <--- COMMENT
      // and chx promise that dbtng will map it to ILIKE in postgres. <--- COMMENT
      if (isset($conditions['name'])) {
        $query->condition('u.name', $conditions['name'], 'LIKE');
        unset($conditions['name']);
      }
      if (isset($conditions['mail'])) {
        $query->condition('u.mail', $conditions['mail'], 'LIKE');
        unset($conditions['mail']);
      }
      foreach ($conditions as $field => $value) {
        $query->condition('u.' . $field, $value);
      }
    }

It just needs cleanup by taking out the comment above.

The others that are done by the admin (e.g. searching for a user, ...etc.) are infrequent and not a cause for performance issues, but for the sake of code consistency and DX, it is a nice to have.

There is one in comment that is checked when a comment is posted. So only applicable for sites that get very frequent comments.

Log in or register to post comments

Comment #125

robertdouglass commented 24 August 2009 at 07:02

Sorry. Kbahey, you've stated this many times in the history of the issue:

I am actually in favor of a solution that does not add an extra column. An extra column can introduce inconsistencies, and complicates teh schema.

But you've never really explained what these inconsistencies are? Or how the "complication" of the schema is so evil that it offsets the known performance gain?

I'm glad we've come to a possible solution with LIKE() that we think will make everyone happy. Maybe, like kbahey says, we're just a cleanup patch away from closing it. It seems that we did so without any benchmarking, so I hope that it actually does make Drupal faster.

Log in or register to post comments

Comment #126

robertdouglass commented 24 August 2009 at 07:02

And joshmiller, thank you for the excellent summary!

Log in or register to post comments

Comment #127

kbahey commented 24 August 2009 at 17:07

@Robert

It is all about data redundancy vs normalization. If you have the same data in different columns, then there are risks that contradict good database design. For example, there is the risk that at some point a new piece of code will update one place and forget the other, leading to inconsistency. Also looking at different places gives different answers. There is also the added storage overhead. This is why deriving data that can be derived is better, for consistency, than storing multiple pieces.

Having triggers and stored procedure in the database help minimize this risk, since no external code need to be modified, but we are not using any of those, and they have their own challenges (database portability, opaqueness, ...).

I understand that breaking normalization rules is necessary in some cases. Done right, they are a great help (e.g. tracker2, prev_next, ...). They are sort of external to the schema and the data is totally replaceable any time.

Since we have better options here, this is a case where an extra column is bad.

Yes, we only need cleanup for this just for DX and consistency. The LOWER that is used now is only for admin pages, which are infrequently used.

Log in or register to post comments

Comment #128

pwolanin commented 2 September 2009 at 13:17

Just discussed this problem with Davide Strauss - our consensus was that we should add a case insensitive comparison operator to DBTNG and push this into the driver layer.

This would allow us to optimize this for MySQL (at least), would allow this to be used also for terms, etc, where we are doing text-based lookups and would be readily available for contrib.

postgres version would initially at least just use the LOWER() = LOWER(). So performance would be the same as it is currently in core.

Log in or register to post comments

Comment #129

pwolanin commented 2 September 2009 at 13:43

talking with catch, he confirms what Khalid says above - essentially this is in core already - so this is really a cleanup issue to make sure it works right in SQLite3 and that we are removing LOWER() where we don't need it in core queries.

The bad part is that we are still generally doing %string% matches, which cannot be indexed.

Log in or register to post comments

Comment #130

Dave Cohen commented 2 September 2009 at 14:54

Do I understand correctly that with this patch username is used for authentication and searches, while name is used for display and third-party module can choose to make name non-unique?

If so, FINALLY!

Log in or register to post comments

Comment #131

pwolanin commented 2 September 2009 at 20:20

@Dave Cohen - that has been proposed elsewhere, and there was no consensus. This issue is just about finding ways to improve query performance.

Log in or register to post comments

Comment #132

dahacouk commented 3 September 2009 at 14:41

Display Name Code Sprint Saturday September 5th...

Dave Cohen and I just had a really good face to face discussion with Angie (webchick). In summary, we are going to use the Saturday code sprint to make sure that Drupal 7 will facilitate us building contributed modules to implement display names.

This may mean altering some parts of core in a very small way. If you are passionate about cleaning up "theme_username" throughout core and making it easier for contributed modules to implement display names in a clean and consistent way, then join us at DrupalCon Paris or over the wires.

Saturday September 5th 2009 10:00 to 17:00 CEST
Location: DrupalCon Paris and online elsewhere

If you use IRC then #kendra is ready and waiting for you during those times.

I'll make sure developers are kept well feed and watered. So, if you are at DrupalCon and passionate about making it easy to implement display names then get in touch.

This is pretty much our last chance for making Drupal 7 display name friendly! Let's make it happen!

Cheers Daniel

mobile: +44 7853 627 355
username: dahacouk on skype, jabber, irc...
Kendra Initiative - http://www.kendra.org.uk

Log in or register to post comments

Comment #133

dahacouk commented 3 September 2009 at 14:47

Log in or register to post comments

Comment #134

jromine commented 4 September 2009 at 22:16

subscribe

Log in or register to post comments

Comment #135

dahacouk commented 5 September 2009 at 10:34

OK. So, the strategy over the next few weeks (during code sludge) is to get theme_username() used consistently in core. From there contributed modules will be able to implement display names with more ease.

So, if you have time over next few weeks please help Dave Cohen work on the following issue:

#192056: User's raw login name should not be output directly

Many thanks,

Daniel

Log in or register to post comments

Comment #136

catch

he/him

English

commented 1 December 2009 at 07:09

Title:	Rework username queries to improve performance	» Replace LOWER() with db_select() and LIKE() where possible
Assigned:	catch	» Unassigned
Category:	task	» bug

Status	File	Size
new	lower.patch	832 bytes

Now we have LIKE/ILIKE support in the database layer, it's a bug not to be using it where appropriate.

Here's a start on a conversion, hoping someone else can pick up the other places in core where this needs doing.

Wherever you see LOWER(:placeholder) this can be converted to db_select() and LIKE. The only exception is LIKE LOWER(%:placeholder%) which has its own problems we can't deal with here.

catch@catch-laptop:~/www/7$ grep -rl "LOWER(" *
modules/system/system.install
modules/user/user.module
modules/comment/comment.module
modules/taxonomy/taxonomy.module
modules/statistics/statistics.admin.inc
modules/profile/profile.admin.inc
modules/profile/profile.pages.inc
modules/profile/profile.module

Marking CNR to give test bot a go at it.

Log in or register to post comments

Comment #137

catch

he/him

English

commented 1 December 2009 at 07:35

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #138

catch

he/him

English

commented 1 December 2009 at 08:11

Status	File	Size
new	lower.patch	1.55 KB

Did a couple more.

TODO:
user_autocomplete()
profile_admin_settings_autocomplete()
profile_autocomplete()

Log in or register to post comments

Comment #139

catch

he/him

English

commented 1 December 2009 at 11:15

Status	File	Size
new	lower.patch	2.36 KB

Somehow managed not to inciude the previous patch, todos remain the same.

Also user_is_blocked() should cast to bool, but keeping this to minimum necessary changes for now.

Log in or register to post comments

Comment #140

meba commented 1 December 2009 at 13:04

A small review:

- Applied succesfully
- Created a new account with some uppercase characters - works
- Logged in as this user - works
- Created a new article and commented on it - works
- Created a profile field with a category, including some uppercase characters. Created second with in the same category - works
- Blocked my new user, tried to log in as the blocked user - doesn't allow login - works

Log in or register to post comments

Comment #141

c960657 commented 1 December 2009 at 18:18

Status:

Needs review

» Needs work

The wildcard characters % and _ need to be escaped. Otherwise confusing things happend if they appear in the user-supplied string. I suggest something like this:

-    $query->where('LOWER(category) = LOWER(:category)', array(':category' => $category));
+    $query->condition('category', addcslashes($category, '%_\\'), 'LIKE');

AFAICT SQL-92 says that there is no escape character if ESCAPE is omitted in the LIKE expression (... WHERE foo LIKE 'abc%' ESCAPE '\'), though MySQL and PostgreSQL uses \. In order to make all databases behave the same, we can make the following change to DatabaseCondition::mapConditionOperator() in database/query.inc:

@@ -1326,15 +1326,15 @@ class DatabaseCondition implements Query
       'NOT IN' => array('delimiter' => ', ', 'prefix' => ' (', 'postfix' => ')'),
       'IS NULL' => array('use_value' => FALSE),
       'IS NOT NULL' => array('use_value' => FALSE),
       // These ones are here for performance reasons.
       '=' => array(),
       '<' => array(),
       '>' => array(),
       '>=' => array(),
       '<=' => array(),
-      'LIKE' => array(),
+      'LIKE' => array('postfix' => " ESCAPE '\\''"),
     );
     if (isset($specials[$operator])) {
       $return = $specials[$operator];
     }
     else {

This pattern should be applied elsewhere in core, e.g. in various autocomplete callbacks.

Alternatively, perhaps we can make a new pseudo-operator in mapConditionOperator() that handles escaping automatically:

+    $query->condition('category', $category, 'CASE_INSENSITIVE_COMPARE');

It would make the code easier to read at a glance, though the reader may have trouble figuring out where that operator is defined. If the schema API some day supports case-sensitivity to be specified for each column individually, we cannot necessarily rely on LIKE (on MySQL the behaviour of LIKE depends on the column's collation), and then a special operator may be useful.

Log in or register to post comments

Comment #142

Crell commented 1 December 2009 at 21:25

There was an issue open for a year to figure out what to do with LIKE vs. ILIKE and similar mapping and no one cared about it so we just left it at "fold everything to use case-insensitive LIKE like MySQL". As it's an API change to rewrite that mapping I don't think we're able to do that at this point.

SelectQuery is not cheap, and does incur an additional PHP-space cost due to all the stack calls. I'd want to see benchmarks before we start using it everywhere.

The % question is... ugh. I'm not sure what the correct way to handle that is, since we *do* want to allow both real wildcards and % in a literal. That's probably separate from this thread, though ,which is a disturbingly long performance thread.

Log in or register to post comments

Comment #143

catch

he/him

English

commented 2 December 2009 at 01:53

@Crell I'm pretty sure there's benchmarks earlier in this issue or its predecessors, db_select() takes at most 1-2ms, LOWER() can take hundreds of milliseconds or seconds afaik on large datasets.

We're already relying on LIKE/ILIKE mapping elsewhere in core, this issue is just applying it consistently. That means we probably need a spin-off for the % and _ issue which is nasty, and perhaps CASE_INSENSITIVE_COMPARE - both of these would be API additions rather than changes as such, and fixing a critical performance issue, so doesn't seem too bad at all.

Log in or register to post comments

Comment #144

c960657 commented 8 December 2009 at 20:30

I have spun off the LIKE xxx ESCAPE '\\' issue into #654662: No way to escape wildcard characters in LIKE queries.

Log in or register to post comments

Comment #145

coltrane

he/him

commented 9 December 2009 at 22:10

Status:

Needs work

» Needs review

Status	File	Size
new	279851-remove-lower-145.patch	5.06 KB

While #654662 is pending here's 139 rerolled with the following done

user_autocomplete()
profile_admin_settings_autocomplete()
profile_autocomplete()

I removed LOWER, which I think is OK in db_query_range() but let me know if not.

Log in or register to post comments

Comment #146

catch

he/him

English

commented 10 December 2009 at 01:58

In db_query_range() we still need LIKE() - so those would have to instead be converted to a db_select() with ->limit().

Log in or register to post comments

Comment #147

chx commented 10 December 2009 at 06:40

with ->range. limit is a --badly named-- pager method.

Log in or register to post comments

Comment #148

c960657 commented 14 December 2009 at 16:44

Status:

Needs review

» Needs work

#654662: No way to escape wildcard characters in LIKE queries has now landed.

Log in or register to post comments

Comment #149

sun

German

Karlsruhe

commented 29 December 2009 at 09:35

subscribing

Log in or register to post comments

Comment #150

catch

he/him

English

commented 30 December 2009 at 03:30

Status:

Needs work

» Needs review

Status	File	Size
new	279851-150.patch	5.31 KB

Updated this after #654662: No way to escape wildcard characters in LIKE queries - haven't run all tests although the ones I did passed, see what the test bot says.

Our code style for db_select() doesn't work well for if ($query) { so I've left that on one line in the case that appeared since it's chained and everything else I tried looked worse.

Log in or register to post comments

Comment #151

chx commented 12 January 2010 at 17:04

Status:

Needs review

» Reviewed & tested by the community

OK!

Log in or register to post comments

Comment #152

webchick

she/they

English

Vancouver 🇨🇦

commented 13 January 2010 at 06:45

Status:	Reviewed & tested by the community	» Needs work
Issue tags:		+Needs documentation

I read through this extremely long issue again. Thanks for the summary mid-way through, Josh!

I too have witnessed LOWER() queries wreaking absolute havoc on large datasets (I think Robert and I are both thinking of the same project ;)). So I'd like to see this fixed.

The gist is that the first solution for this was an extra DB column to store the lowercase version of the username. This felt wrong for a variety of reasons: you'd need a term_lower, an aggregator_item_title_lower, etc. which is just not scalable to all of core/contrib.

This new approach takes the existing db_like() operator, which is already in core, and applies it consistently, instead of using LOWER() in queries. This means we can optimize MySQL/PostgreSQL to use the behaviour that makes the most sense.

Syntax-wise, it looks like:

->condition('name', db_like($name), 'LIKE')

Which is kind of a WTF, esp when compared with a straight-up LOWER() query. But I can't think of anything better, and as noted in the op, many, many, many issues have been started over the past 3.5 years to solve this problem. :P

So.... Committed to HEAD. This practice will need to be documented in the module upgrade guide.

Log in or register to post comments

Comment #153

catch

he/him

English

commented 13 January 2010 at 08:17

Status:

Needs work

» Fixed

Three years, woot! Added docs http://drupal.org/update/modules/6/7#nomorelower

Log in or register to post comments

Comment #154

c960657 commented 13 January 2010 at 19:28

Status:

Fixed

» Needs review

Status	File	Size
new	279851-more-lower-1.patch	10.19 KB

The patch left a number of LOWER() calls untouched. Was that intentional?

This patch removes the remaining calls. It also introduces a 'NOT LIKE' operator.

Which is kind of a WTF, esp when compared with a straight-up LOWER() query. But I can't think of anything better,

#141 suggests an approach that at least is easier to read. This may even be extended to STARTS_WITH, CONTAINS and ENDS_WITH operators. But still somewhat WTF.

Log in or register to post comments

Comment #155

catch

he/him

English

commented 13 January 2010 at 19:34

Somewhat intentional yeah.

If you use a LIKE query with a string like '%foo', that also doesn't use an index. I've not benchmarked using LOWER() vs. not using LOWER() in that scenario but as far as I know it won't make a lot of difference - would need to be tested on a very large dataset. If for some reason it does make a difference, or if we just wanted to remove LOWER() completely for consistency (or because we hate it so much) I'd be happy though.

Log in or register to post comments

Comment #156

webchick

she/they

English

Vancouver 🇨🇦

commented 13 January 2010 at 19:45

I think I'd prefer the consistency, personally. I read your description of when to use it and when not to use it like 5 times and I still don't get it. ;)