Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Nick_vh’s picture

I heavily support this and willing to fight for this! :-)
So, the idea would be a checkbox next to the search box that enables/disables the case-sensitive search
This has been an issue since Drupal 6, sometimes you do not know if a certain word started with a capital letter, so it makes sense to allow this.

By default, it should be case-sensitive

User interface translation | drupal8.dev_.jpg

svenryen’s picture

I'm one of "a number of us"..

This makes sense because the search field should return all relevant forms of the search string, allowing translators to translate both upper and lower case variations at the same time.

Otherwise, the way it's implemented now (wrongly), a translator will have to search for both upper and lower case variations to make sure everything is translated correctly across all strings on the site.

Nick_vh’s picture

Issue tags: +D8MI, +language-ui
svenryen’s picture

Comment/question to the mockup in comment #1 by Nick_vh

Do we need a checkbox?
Or should we just return both the upper and lower case versions when somebody performs a search?

I can't see any use case when somebody does NOT want to see both the upper and lower case versions of what they search for, so to me it makes most sense to always return any case variation of the string somebody searches for.

Nick_vh’s picture

depends, from what point on does this become a performance issue? Do we have data that can support this?

LoMo’s picture

Case sensitivity does have an impact on performance. Less specific matches require more processing. But of course it should be an option, anyway.

I can't see any use case when somebody does NOT want to see both the upper and lower case versions of what they search for, so to me it makes most sense to always return any case variation of the string somebody searches for.

That presumes that a user is not searching for a specific string that is in upper case (or whatever); i.e. they just want to translate the string they copied from the interface and replace it (or fix it's translation)—not any other variants or similar strings. This is not at all an uncommon use case, IMHO.

I think it might make most sense for the default to be non-case sensitive (as suggested) and add a checkbox for case-sensitive.

I'm also open to making it non-default, i.e. providing a checkbox: "Ignore case" (e.g. as this function is labeled in TextMate’s search). Either way should be fine as long as it's a filter value that is "remembered" along with the other filter settings and as long as it's an option that's provided.

svenryen’s picture

Nick_vh.. Never meant to insinuate it was a performance issue, and after reading my post again I can't see how that came up... Anyway, I'll try to reiterate so that the issue becomes clearer.

The issue is that users need to perform *two searches* to catch both the capitalized and lower case versions of any word.
This can be avoided if the search simply returns all variations that matches the letters and not the particular case.

For example if I want to translate a site containing the strings "Log in" (as a link) and "to log in..." (in some sort of description) the search needs to be performed both with "log" and "Log" to make sure I catch everything while I'm translating.

Gábor Hojtsy’s picture

Ok, so the source and translation columns are stored as BLOBs rather than TEXTs if you check out your schemas. The reason for this that I recall is that for the 99% lookup (t(), format_plural() and friends), we always need to do a case sensitive lookup, so rather than doing the case sensitivity runtime (with the BINARY keyword), we store the value itself as binary, so it is naturally looked up like that. Doing case insensitive lookups on BLOB fields sounds like a performance problem, since you need to uppercase or lowercase each column value and then compare it to the uppercase/lowercase value of your search string respectively. That could be a pretty slow operation.

So to solve this I see two options:

A. We either convert the BLOB columns to TEXT columns and do a BINARY match from t() et. al. Needs benchmarks that this does not slow down performance. It might or might not be equivalent to current performance.

B. Or we keep the BLOB column but do uppercase/lowercase likes. This is likely more of a hit for performance BUT not in the crucial frontend part of the site, its merely for this single admin UI. However, the performance decrease can still be prohibitive.

So needs performance testing either way.

Also, for the UI I think making it just do case insensitive searches with no option for case sensitive is best. No need to clutter up the UI with more options, and people definitely rather have insensitive searches.

Gábor Hojtsy’s picture

Component: translation.module » locale.module

Putting in the right module queue too.

Désiré’s picture

Issue tags: +sprint

tagging

Désiré’s picture

Assigned: Unassigned » Désiré

assign the issue to myself

Désiré’s picture

I've made some quick performance testing, on a locale_source table with about 15000 record: I've searched for 5200 existing partial string, the results are the following (with MySQL):

For the original method (case sensitive search in blob field): 14.24 sec
db_query('SELECT source FROM locales_sources_orig WHERE source LIKE \'%' . $string . '%\'');

Case insensitive, lower case strings stored in blobs: 20.92 sec
db_query('SELECT source FROM locales_sources_2blob WHERE source_lower LIKE \'%' . strtolower($string) . '%\'');

Case insensitive, lower case strings stored in text: 90.14 sec
db_query('SELECT source FROM locales_sources_string WHERE source_lower LIKE \'%' . strtolower($string) . '%\'');

So I think we should use an other blob field to store the lowercase values. This can a little slows down the searching on the GUI, but it will not touch the t() and related searching.

Gábor Hojtsy’s picture

That sounds like could slow down importing considerably too :/

Désiré’s picture

Here is the first patch for the lowercase blob based insensitive searching.

- schema updated
- update hook
- search in the new fields
- some test updated too

still missing:
- update tests
- tests for searching

Désiré’s picture

As I see for importing we need to write an other blob field converted to lowercase, so I've made a quick performance test for this too:

I've write about 12000 real string (taken from a working D7 site), and write them to the locales_source table:

With the original table (only the original strings are stored as blob) the writing was 8.69 sec.
With the additional blob field for searching it was 8.88 sec.

Status: Needs review » Needs work
Issue tags: -D8MI, -sprint, -language-ui
Désiré’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Désiré’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
Issue tags: +D8MI, +sprint, +language-ui
jthorson’s picture

Note: Ignore the failure result text in the above patch ... the test was manually killed because it was cycling indefinately on one of our testbots; and the result text above is due to how we killed it.

Not sure why it was re-queued if it failed testing the first two times ... if there was something strange about the test results on those first two runs, please let us know.

Gábor Hojtsy’s picture

@jthorson: Thanks for the feedback! I guess it looked puzzling how would any patch cause testbot to not be able to check out Drupal from git. How could this patch cause it? If it could not, then retesting it should possibly find testbot at a happier place and work, or at least give real testing feedback. Do you know how would the git checkout failures be related to the patch and what can we do about it in the patch?

Désiré’s picture

There was a fatal error because of a MySQL insert error in the t(), but only in certain cases (this needs automated tests)

Status: Needs review » Needs work
pp’s picture

+++ b/core/modules/locale/lib/Drupal/locale/PoDatabaseWriter.php
@@ -283,6 +286,7 @@ class PoDatabaseWriter implements PoWriterInterface {
diff --git a/core/modules/locale/locale.install b/core/modules/locale/locale.install

diff --git a/core/modules/locale/locale.install b/core/modules/locale/locale.install
index 334b401..b9a291d 100644

index 334b401..b9a291d 100644
--- a/core/modules/locale/locale.install

--- a/core/modules/locale/locale.install
+++ b/core/modules/locale/locale.install

+++ b/core/modules/locale/locale.install
+++ b/core/modules/locale/locale.install
@@ -64,6 +64,12 @@ function locale_schema() {

@@ -64,6 +64,12 @@ function locale_schema() {
         'not null' => TRUE,
         'description' => 'The original string in English.',
       ),
+      'source_lower' => array(
+        'type' => 'text',
+        'mysql_type' => 'blob',
+        'not null' => TRUE,
+        'description' => 'The lowercase string in English for searching on the UI translation page.',
+      ),

@@ -623,6 +636,52 @@ function locale_update_8010() {
+function locale_update_8011() {
+  // Sets field for lowercase strings in locales_source table.
+  $spec = array(
+    'type' => 'text',
+    'mysql_type' => 'blob',
+    'not null' => TRUE,
+    'description' => 'The lowercase string in English for searching on the UI translation page.',
+  );
+  db_add_field('locales_source', 'source_lower', $spec);

BLOB can not be NOT NULL, because you can add default value. Set it to FALSE.

+++ b/core/modules/locale/locale.install
@@ -623,6 +636,52 @@ function locale_update_8010() {
+  while($source = $results->fetchAssoc()) {
+    db_update('locales_source')
+      ->fields(array('source_lower' => strtolower($source['source'])))
+      ->condition('lid', $source['lid'])
+      ->execute();
+  }
+
+  // Updates locales_target table with the lowercase translations.
+  $results = db_select('locales_target', 's')
+    ->fields('s', array('lid', 'translation', 'language'))
+    ->execute();
+
+  // Updates translation records one by one because LOWER() isn't a standard SQL function.
+  while($translation = $results->fetchAssoc()) {
+    db_update('locales_target')
+      ->fields(array('translation_lower' => strtolower($translation['translation'])))
+      ->condition('lid', $translation['lid'])
+      ->condition('language', $translation['language'])
+      ->execute();
+  }
+}

Why not use $sandbox? If locales_source contains many string it will run to long.

pp’s picture

Corrected the not null problem.

Status: Needs review » Needs work
Désiré’s picture

OK, I have no idea why the updates fails:

We have a 'blob' and 'not null' in the schema, and it works on new install.
But with db_add_field() in the update hook it throws an error. BUT: If I just create a new table with a 'blob', 'not null' field and then add an another to the table, it works... So I'll keep searching, but now here is a patch without the update hook, just to test the other parts of the patch.

Status: Needs review » Needs work
Gábor Hojtsy’s picture

We discussed it is possible to do the comparison on the fly, and the performance for that did not seem to be prohibitive. The only "little" (hahaha) issue is that the code devised was only working on MySQL. Asked Désiré to post updates, so we can point SQL experts here to help figure this out in a more compatible way :)

Gábor Hojtsy’s picture

Issue tags: -sprint

Moving off of sprint given no activity and not being critical to drive more activity here ATM.

YesCT’s picture

I just ran into this, searching for "read more" when I should have searched for "Read more".

YesCT’s picture

Issue summary: View changes

Added brackets around issue number

MantasK’s picture

I have created some custom translation interface in d7 and to make search case insensitive I did this:
$query->where("CONVERT(r.source USING utf8) like :search", array(':search'=> '%' . db_like($filterString) . '%'));

regarding speed:
SELECT source FROM locales_source WHERE source LIKE '%string%' 0.0129 sec
SELECT source FROM locales_source WHERE CONVERT( source USING utf8 ) LIKE '%string%' 0.0321 sec
actually speed is not accurate. depending on word it was different. and sometimes one query is faster than another. but no big difference. It would go between 0.01 and 0.06s
and I have 45,757 records

geek-merlin’s picture

nicrodgers’s picture

In case anyone lands on this page via Google and is looking for a quick and easy way to make the existing string search case insensitive, I've created a sandbox module that does exactly that (for Drupal 7):

https://www.drupal.org/sandbox/nicrodgers/2593305

Tested on MySQL, and from limited performance tests on sites with between 20,000 and 50,000 strings, search performance is still speedy.

It's obviously just an interim quick-fix solution until it can be addressed with a more permanent solution in locale itself.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

nsputnik’s picture

@MantasK Where do you place that line?

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

jkdev’s picture

Hi,

I have looked in the code, and found that the reason this behaviour is happening is because of this line:

locale/src/StringDatabaseStorage.php:441 - StringDatabaseStorage:dbStringSelect

$filter->condition($this->dbFieldTable($field) . '.' . $field, '%' . db_like($string) . '%', 'LIKE');

Now, if this function serves the admin pages only, and not site wide t() function,
We could theoretically change it to:

$filter->where('LOWER(' . $this->dbFieldTable($field) . '.' . $field . ') LIKE \'%' . db_like(strtolower($string)) . '%\'');

(the quotes looks rather ugly.. but you get the idea)

This should have minimal impact of performance.
further more, as we support mysql, sqlite3 and pgsql, all of those engines use LOWER function, so that supposed to be safe to use.

Version: 8.6.x-dev » 8.8.x-dev

Drupal 8.6.x will not receive any further development aside from security fixes. Bug reports should be targeted against the 8.8.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.9.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.7 was released on June 3, 2020 and is the final full bugfix release for the Drupal 8.8.x series. Drupal 8.8.x will not receive any further development aside from security fixes. Sites should prepare to update to Drupal 8.9.0 or Drupal 9.0.0 for ongoing support.

Bug reports should be targeted against the 8.9.x-dev branch from now on, and new development or disruptive changes should be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.2.x-dev

Drupal 8 is end-of-life as of November 17, 2021. There will not be further changes made to Drupal 8. Bugfixes are now made to the 9.3.x and higher branches only. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.2.x-dev » 9.3.x-dev
jboxberger’s picture

Hello,

i made up an easy solution and wanted to share it with you. Since there is no issue with the BINARY format itself i do not see a task to change it here. The problem is the search query. So i created a view on the locales_source which formats the "source" column in utf8mb4 and then made an override for the StringDatabaseStorage.php in my custom module. Now i can route the query against the locales_source_ci view where the 'LIKE' behaves case-insensitive as wanted.

  CREATE VIEW locales_source_ci as
      SELECT lid, CONVERT(source USING utf8mb4) as source, context, version
      FROM locales_source
  protected function dbStringSelect(array $conditions, array $options = []) {
    // Start building the query with source table and check whether we need to
    // join the target table too.
    $query = $this->connection->select('locales_source_ci', 's', $this->options)
      ->fields('s');

I made a module out of that with auto installing an dropping the view and also a dropdown for searching case-sensitive and case-insensitive, so if there is any interest in it i can upload it somehow.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.15 was released on June 1st, 2022 and is the final full bugfix release for the Drupal 9.3.x series. Drupal 9.3.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.4.x-dev branch from now on, and new development or disruptive changes should be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.9 was released on December 7, 2022 and is the final full bugfix release for the Drupal 9.4.x series. Drupal 9.4.x will not receive any further development aside from security fixes. Drupal 9 bug reports should be targeted for the 9.5.x-dev branch from now on, and new development or disruptive changes should be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.