See http://api.drupal.org/api/HEAD/function/user_load
The code
else {
$query[]= "LOWER($key) = LOWER('%s')";
$params[] = $value;
}
applies to the search on the name. Although names are often case insensitive due to the default collations in e.g. MySQL, nothing forbids the site admin to allow names that differ only in case by changing the collation for the table user from e.g. utf8_general_ci to utf8_bin.
Moreover, enabling the utf8_bin collation in MySQL is often unavoidable while migrating from the earlier versions of MySQL, and this post explain why.
There are three levels of fixes possible:
- Make a separate condition for the
($key == 'name) case
- Get rid of
LOWER() in the else part
- Review the rest of the code to drop the LOWER() function when it duplicates the database functionality.
Comments
Comment #1
ChrisKennedy commentedThis is by design to prevent usernames that differ only by capitalization. It would be confusing to others and would make it difficult to differentiate the posts between users. Are you sure that having the same username but different capitalizations is something that would be useful to you as a drupal administrator?
Comment #2
mikhailian commentedThe database software (e.g.MySQL) already enforces this constraint through unique indexes. It goes even further by making this constraint configurable, allowing to specify different collations for any single charset, enabling the admins to adapt the application for the innumerable uses.
A collation does not only specify the case sensitiveness but other relationships between characters. As an example, cyrillic e and ë are considered one character or two depending on the collation. An official site would rather use one collation, a literary site would prefer another.
An admin of an english site would probably configure the system to make e, ê è and é to be recognized one and the same letter, for the sake of simplicity. He could have also chosen to differ e, ê, é and è everywhere except the
search_indexandsearch_totaltables. A french admin would definitely want to differ these e-letters everywhere.Let's go back to the case-sensitiveness. The use of LOWER() breaks the flexibility provided by the collation mechanism of the database. This is why I would rather consider the use LOWER() as a bug.
It became apparent as people started to migrate their UTF-8 date from the latin1 to the utf8 datastore (see the link in my original bugreport) as enabling case-sensitiveness is the only way to keep the data intact.
Comment #3
chx commentedYes. It should be the task of the database and not Drupal to decide on collation (for example lowercase collates with uppercase). Also, these
LOWERhurt performance, too.Comment #4
dries commentedI don't think it would be a good idea if there was a 'dries' on drupal.org (in addition to 'Dries'), or both an 'Admin' and an 'admin'. People really expect case-insensitivity.
So for the behavior to remain the same, the database needs to use the utf8_general_ci collation, not? How do we ensure that is the case before after we applied this patch?
Comment #5
chx commentedWe can provide an update path to utf8_general_ci but I think that's the default AND http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html there are a great many utf8_something_ci sets so upping to general might not be ideal.
Comment #6
chx commentedTo further clarify:
and the default is _ci , so I believe we do not need to make any special steps because the only way it can be _cs or _bin is when the admin sets up so and then we should not override that. In short: being ci or cs is not Drupal's to decide.
Comment #7
dries commentedAt table creation time, we now use:
I'd still like to see if we can change something at table creation time to enforce '_ci'.
Also, what does postgresql think of this? :)
Comment #8
chx commentedWe could but we do not want to. Leaving out explicit collation definiton from there, we honor whatever the mysql administrator set up. But, if you really want to , I can add collate utf8_general_ci there but that somewhat defies the point of this patch.
Comment #9
chx commentedpostgresql people can solve for themselves:
from http://developer.postgresql.org/index.php/Todo:ICU
from http://www.thescripts.com/forum/thread173373.html
Comment #10
chx commentedThere are many solutions for postgresql even when some OS does not support proper strcoll() like OpenBSD, http://gborg.postgresql.org/project/citext/projdisplay.php for text and for example http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg37618.html for varchar. I believe those who run that obscure database system can deal with their obscure problems.
Comment #11
sammys commentedHi all,
I'm looking at this with a flu affected head so please bear with me. :)
I agree that code shouldn't make assumptions on how data is stored.
I confirm that PostgreSQL doesn't currently support a table specific (nor field specific) collation setting making the collation setting used at database creation time effective for all tables. This is a major problem for sites with PostgreSQL. Considering the efficiency benefit is huge enough, something has to be done to ensure PostgreSQL users don't get a surprise "feature" of case sensitivity on usernames.
One solution is to keep the LOWER() parts for PostgreSQL only. Another possible fix, albeit a hack, for this LOWER() performance problem was suggested in http://drupal.org/node/83738 where a lowercase version of the string is placed in another field. This would cover any of the other DBMSs automatically should they not implement table/field based collation settings. It was shot down by Dries understandably for its hackness. However, it does provide a fix for the problem regardless of collation and DBMS.
I'm going to sit on the fence and let you guys debate which way to go. I'll just provide a patch, if necessary, for my favourite obscure DBMS.
--
Sammy Spets
Synerger
http://synerger.com
Comment #12
catchduplicate of: http://drupal.org/node/83738
Comment #13
catchComment #14
mikhailian commentedWhat would you say about 'Driеs', 'Drіes' and 'Drіеs' in addition to 'Dries'? These are perfectly valid even within utf8_general_ci ;-)
P.S. I used cyrillic 'i' and 'e'.
Comment #15
mikhailian commentedComment #16
richmck commentedAre there any mysql optimizers that are smart enough to *NOT* do a table scan (i.e. ignore LOWER()) when LOWER() is applied to a case-insensitive column within a where clause?
This could be, perhaps, done at the mysql client library level?
Comment #17
mikhailian commentedShouldn't we solve this in 7.x before it is too late again?
Comment #18
damien tournoud commentedOn the contrary, we *should* make assumptions about case sensitivity in the database. Some matching should be case insensitive (username, etc.), while other shouldn't (paths, cache ids, locale source strings, etc.). To do that, we will have to manage collation in both the Schema API and the query builder, and I'm pretty confident we will manage to do that in D7. But this issue by itself is a won't fix.
Comment #19
mikhailian commentedDamien,
When you say "we *should* make assumptions about case sensitivity in the database", you fail to provide further reasoning.
I already demonstrated a few comments above that checking case-sensitivity will not help to fight impersonation. I do not see any other reason checking case-sensitivity on usernames should be mandatory and not left as an option to the one who builds a drupal-based site.
For the list of items where you suggest not checking case-sensitivity, I see several problems:
Comment #20
mikhailian commentedI realise that I use wrong wording. Case-sensitivity does not require checking. There is no action needed from the developer to have it. It is case-insensitivity that bring problems and requires development effort, and, in case of Drupal, creates a preformance hit that is hard to solve elegantly.
Comment #21
lilou commentedAdd tag.
Comment #22
mikhailian commentedAdded two more tags
Comment #23
damien tournoud commentedI think we agree here, even if @mikhailian understood the opposite of what I was saying in #19.
Comment #24
mikhailian commentedCan someone else from the core developers look at this issue? Damien and me have opposing views (that probably originate in the different ways we use Drupal).
I would like to hear more opinions.
Comment #25
damien tournoud commentedThere is no real point in keeping this issue open. Those LOWER() calls will only go away if we introduce collation support (and the associated operator support) to the Schema API. That whole issue is a portability nightmare:
Bumping to D8.
Comment #26
mikhailian commentedI see that this issue has been solved in #279851: Replace LOWER() with db_select() and LIKE() where possible, closing this one as well.