Make all varchar columns case sensitive (regression) (was make page cache case sensitive)

Scott Reynolds - November 11, 2008 - 19:12
Project:Drupal
Version:7.x-dev
Component:base system
Category:bug report
Priority:critical
Assigned:Unassigned
Status:active
Issue tags:caching
Description

Introduced here: http://drupal.org/node/99970
The issue noted above introduced case sensitive path caching. Drupal 6 removes this case sensitive patch caching. Are paths case sensitive? Should this be part of Drupal 6?

And if this shouldn't be part of Drupal 6, then an update() needs to happen to update Drupal 5 page_cache tables

#1

m3avrck - November 11, 2008 - 19:15

Subscribing...

#2

Gábor Hojtsy - January 8, 2009 - 11:48
Title:Page Cache Binary» Page cache should be case sensitive (regression)

Oh, ok, this looks like a regression, since #99970: Page cache matching is case insensitive is indeed not reflected in the Drupal 6 schema.

#3

Damien Tournoud - January 8, 2009 - 12:00
Version:6.x-dev» 7.x-dev

That's a fun issue: the schema API introduced in Drupal 6 doesn't know anything about collation, and can't set the "BINARY" flag. Bumping to 7. This is a critical issue to solve and backport to 6.

#4

Scott Reynolds - January 9, 2009 - 17:25

Wow never thought this would ever get any attention. So I never explain a bit more.

With the new menu system, not sure that u can have case sensitive paths any more. For instance in Drupal 6, /NODE and /node are the same on MySQL. It is actually a lot more involved then just adding a feature to schema. Menu system needs to updated first, before changing the page_cache table. Otherwise, u end up caching NoDE and node but both line up with the same callback and result. This then puts two records in the page cache that are redundant. There is no difference in Drupal between NoDE and node, so caching two sets is useless.

Now this behaviour I believe will be different on Postgres as it is case sensitive ? Thus the menu system will behave like it does in D5( NoDE and node will be different).

#5

c960657 - March 1, 2009 - 14:14

#6

c960657 - March 5, 2009 - 20:57

This patch makes all varchar columns case-sensitive. AFAIK this makes MySQL behave like PostgreSQL. Amazingly enough all tests still pass (I have only tested with MySQL).

So far the patch does not convert existing tables. In fact, making URLs case-insentive may break some non-standard URLs when upgrading from D6->D7 (e.g. http://example.com/NODE/1), unless some other trickery is done.

Also, I have no idea about the consequences for other parts of Drupal.

AttachmentSize
case-sensitive-1.patch 57.1 KB
Testbed results
case-sensitive-1.patchfailedFailed: Failed to apply patch. Detailed results

#7

c960657 - March 5, 2009 - 20:56

Sorry, wrong patch.

AttachmentSize
case-sensitive-2.patch 1.13 KB
Testbed results
case-sensitive-2.patchfailedFailed: Failed to apply patch. Detailed results

#8

c960657 - March 27, 2009 - 00:01
Status:active» needs review

This also makes existing tables case-sensitive during upgrade from D6.

I chose to upgrade tables from D6 core modules. I couldn't find a way to get the tables belonging to the currently installed modules only - drupal_get_schema() reflects the table names after the upgrade process has completed, but I think it's better to modify tables belonging to external modules before their hook_update_N has been invoked (so they have a chance to change back the collation if actually really want the case-insensitive behaviour and are MySQL-only). Another option is to modify all tables in the database, but that is probably a bit to bold.

The system_update_N function was added as 0012 in order to appear prior to any table renames.

Note that to test this, you may need to comment out system_update_7019 after having applied the patch (see #363262: Missing index on url_alias table).

AttachmentSize
case-sensitive-3.patch 7.87 KB
Testbed results
case-sensitive-3.patchfailedFailed: Failed to apply patch. Detailed results

#9

Dries - March 29, 2009 - 13:52

At some point, it is probably useful to have a bit more control over this at the API level but right now, this patch improves the current situation. This patch looks good to me, but

1. Let's add a code comment to:
+      $table['mysql_suffix'] = 'DEFAULT CHARACTER SET UTF8 COLLATE utf8_bin';
That code comment should capture some of the knowledge in this issue.

2. Please also add a test case for this problem to support future database backends; e.g. it helps make sure that new backends works as expected.

Thanks!

#10

Scott Reynolds - March 29, 2009 - 17:05

Do we really want to make every VARCHAR case sensitive ? Kindof a scary statement to me. I think it probably warrants investigation into how it affects other things.

My first reaction to this issue was to make the menu router and the page cache case sensitive, not all VARCHARs.

#11

c960657 - March 29, 2009 - 22:34

I became aware that at least one place in core (user_load_multiple()) currently relies on LIKE being case-insensitive. With this patch, this is no longer the case. This can be fixed (I think) by adding COLLATE utf8_general_ci to mapConditionOperator() for LIKE, though this prevents index usage. But do we want to keep LIKE case-insensitive? If we make it case-sensitive, people needing to do a case-insensitive match will need to resort to e.g. LOWER(username) = LOWER('foo') - is that acceptable? If there is a great need to this, possibly the query builder could offer support for ILIKE.

Also note that SQLite does not support proper case-insensitive comparison for strings containing non-ASCII characters.

Do we need to do something to catch old non-standard URLs and redirect them to the proper URL (e.g. from /nOdE/1 to /node/1) to prevent the D6->D7 upgrade from breaking existing URLs?

@Scott Reynolds: I agree that it's scary. But on PostgreSQL all VARCHARs are case-sensitive, so we have to move in that direction in order to keep consistency across databases.

#12

Scott Reynolds - March 30, 2009 - 03:02

Ok couple interesting questions that not prepared to answer myself at this time.

  • In PostgreSQL, can I make a user with name Kathy and another with kathy? Is this desirable and intended?
  • Performance impact? Easy case to examine user name auto complete.
  • does the PostgreSQL behave the same as MySQL? Does PostgreSQL on some fields case insensitive and can we emulate that behavior in MySQL?

Ok so thats all I got, if that makes sense to anybody else then /me shakes ur hand. I will take a look at these issues this week and come up with answers. But if anyone else beats me too it plz post.

#13

c960657 - March 30, 2009 - 23:15

Usernames are made case-insensitive using LOWER(), also on PostgreSQL, so you cannot have both Kathy and kathy. This behaviour is consistent between MySQL and PostgreSQL, and the patch doesn't change this.

I changed user_load_multiple() to use LOWER() like elsewhere in core. This will have a negative performance impact on MySQL. This may be reduced by #279851: Performance: Add column 'username' to {users}, remove LOWER(name). Autocomplete already uses LOWER(), so no performance impact there.

I also added some tests as requested, and removed the explicit use of mysql_type in locale.install.

AttachmentSize
case-sensitive-4.patch 21.12 KB
Testbed results
case-sensitive-4.patchfailedFailed: Failed to apply patch. Detailed results

#14

Dries - March 31, 2009 - 02:35

The last version of this patch looks good and addresses my issues. I will commit it tomorrow if no one objects. Thanks.

#15

System Message - March 31, 2009 - 23:35
Status:needs review» needs work

The last submitted patch failed testing.

#16

c960657 - April 1, 2009 - 15:32
Status:needs work» needs review

Reroll.

AttachmentSize
case-sensitive-5.patch 21.15 KB
Testbed results
case-sensitive-5.patchfailedFailed: Failed to apply patch. Detailed results

#17

Dries - April 1, 2009 - 20:01
Status:needs review» fixed

Committed to CVS HEAD. Thanks!

#18

catch - April 2, 2009 - 12:38
Title:Page cache should be case sensitive (regression)» Make all varchar columns case sensitive (regression) (was make page cache case sensitive)
Status:fixed» needs work

Seems like I missed the boat on voicing objections, but I've got to re-open this:

I became aware that at least one place in core (user_load_multiple()) currently relies on LIKE being case-insensitive. With this patch, this is no longer the case. This can be fixed (I think) by adding COLLATE utf8_general_ci to mapConditionOperator() for LIKE, though this prevents index usage. But do we want to keep LIKE case-insensitive? If we make it case-sensitive, people needing to do a case-insensitive match will need to resort to e.g. LOWER(username) = LOWER('foo') - is that acceptable? If there is a great need to this, possibly the query builder could offer support for ILIKE.

Also note that SQLite does not support proper case-insensitive comparison for strings containing non-ASCII characters.

I changed user_load_multiple() to use LOWER() like elsewhere in core. This will have a negative performance impact on MySQL. This may be reduced by #279851: Performance: Add column 'username' to {user}, remove LOWER(name). Autocomplete already uses LOWER(), so no performance impact there.

It's not acceptable to use LOWER(foo) like we currently do - this is the cause of slow queries on a lot of Drupal sites and there's only two ways to fix it.

1. Add an extra column
--- previously this was the only cross-database compatible way to deal with this issue. You'll note that #279851: Performance: Add column 'username' to {users}, remove LOWER(name) is a spin-off of at least 3 similar issues which tried to do the same thing - all have so far been rejected by Dries or stalled for months or years. I don't think that issue is ever going to be committed. If Dries is prepared to look at the approach on there and +1 it for at least users and taxonomy terms, then let's close this one again, but given previous feedback on adding extra columns I don't have high hopes.

2. Map LIKE() to case-insensitive operators (ILIKE in postgresql) - which was removed in this patch, and only made possible by dbtng.
--- doing it that way gives us indexable queries on MySQL - something we've never had before, without requiring more than a few lines additional logic in database drivers. It also isn't any penalty in other database systems. I much prefer this approach to #1 personally, this patch makes it impossible to achieve.

SQLITE not having case-insensitive LIKE() for non-ASCII characters is a red-herring, it doesn't do LOWER() for non-ascii characters either unless you load the icu extension. http://www.sqlite.org/lang_corefunc.html - so it's going to be equally broken either way.

We can fix the page cache issue without completely hamstringing our only real opportunity to remove a major cause of slow queries in core. I don't see why it's necessary to conflate the two especially when c960657 already picked up potential issues with it.

#19

catch - April 2, 2009 - 12:41

Here's at least one of the taxonomy issues which is now unfixable without an additional column #277209: Remove lower from taxonomy autocomplete

#20

Damien Tournoud - April 2, 2009 - 12:59

Oh my... this is a big change, nearly not tested. No input was taken from any of the database maintainers. I believe we should revert that, and think about it a little bit more.

#21

catch - April 2, 2009 - 13:19
Status:needs work» needs review

Here's a patch to revert the change.

AttachmentSize
case-sensitive-revert.patch 14.14 KB
Testbed results
case-sensitive-revert.patchfailedFailed: Failed to apply patch. Detailed results

#22

catch - April 2, 2009 - 22:08

And a pending patch with tests which now fail due to the change - #343788: Taxonomy module doesn't use it's own APIs (yes we could use LOWER() there if we really have to, but can you tell I was hoping not...)

#23

Dries - April 3, 2009 - 20:19

I agree that we should revert this patch and discuss it more. It's broken indeed -- too bad the tests didn't catch this regression. The patch in #21 that reverts the change should be rolled back.

#24

System Message - April 4, 2009 - 23:50
Status:needs review» needs work

The last submitted patch failed testing.

#25

c960657 - April 17, 2009 - 23:12
Status:needs work» needs review

Updated patch for reverting the change.

AttachmentSize
case-sensitive-revert-1.patch 18.99 KB
Testbed results
case-sensitive-revert-1.patchpassedPassed: 10802 passes, 0 fails, 0 exceptions Detailed results

#26

catch - April 18, 2009 - 11:35
Status:needs review» reviewed & tested by the community

Since this is a straight revert and Dries suggested rolling this back, marking to RTBC.

#27

webchick - April 20, 2009 - 02:23
Status:reviewed & tested by the community» active

Ok, committed the rollback patch. Setting back to active, since it sounds like we're back to the drawing board on this. What a thorny issue. :\

 
 

Drupal is a registered trademark of Dries Buytaert.