Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
A number of us agree that it would be useful to allow searching UI strings without case-sensitivity when translating the UI. This should not be too hard to implement.
Depends on #1452188: New UI for string translation being committed.
Comments
Comment #1
Nick_vhI heavily support this and willing to fight for this! :-)
So, the idea would be a checkbox next to the search box that enables/disables the case-sensitive search
This has been an issue since Drupal 6, sometimes you do not know if a certain word started with a capital letter, so it makes sense to allow this.
By default, it should be case-sensitive
Comment #2
svenryen CreditAttribution: svenryen commentedI'm one of "a number of us"..
This makes sense because the search field should return all relevant forms of the search string, allowing translators to translate both upper and lower case variations at the same time.
Otherwise, the way it's implemented now (wrongly), a translator will have to search for both upper and lower case variations to make sure everything is translated correctly across all strings on the site.
Comment #3
Nick_vhComment #4
svenryen CreditAttribution: svenryen commentedComment/question to the mockup in comment #1 by Nick_vh
Do we need a checkbox?
Or should we just return both the upper and lower case versions when somebody performs a search?
I can't see any use case when somebody does NOT want to see both the upper and lower case versions of what they search for, so to me it makes most sense to always return any case variation of the string somebody searches for.
Comment #5
Nick_vhdepends, from what point on does this become a performance issue? Do we have data that can support this?
Comment #6
LoMo CreditAttribution: LoMo commentedCase sensitivity does have an impact on performance. Less specific matches require more processing. But of course it should be an option, anyway.
That presumes that a user is not searching for a specific string that is in upper case (or whatever); i.e. they just want to translate the string they copied from the interface and replace it (or fix it's translation)—not any other variants or similar strings. This is not at all an uncommon use case, IMHO.
I think it might make most sense for the default to be non-case sensitive (as suggested) and add a checkbox for case-sensitive.
I'm also open to making it non-default, i.e. providing a checkbox: "Ignore case" (e.g. as this function is labeled in TextMate’s search). Either way should be fine as long as it's a filter value that is "remembered" along with the other filter settings and as long as it's an option that's provided.
Comment #7
svenryen CreditAttribution: svenryen commentedNick_vh.. Never meant to insinuate it was a performance issue, and after reading my post again I can't see how that came up... Anyway, I'll try to reiterate so that the issue becomes clearer.
The issue is that users need to perform *two searches* to catch both the capitalized and lower case versions of any word.
This can be avoided if the search simply returns all variations that matches the letters and not the particular case.
For example if I want to translate a site containing the strings "Log in" (as a link) and "to log in..." (in some sort of description) the search needs to be performed both with "log" and "Log" to make sure I catch everything while I'm translating.
Comment #8
Gábor HojtsyOk, so the source and translation columns are stored as BLOBs rather than TEXTs if you check out your schemas. The reason for this that I recall is that for the 99% lookup (t(), format_plural() and friends), we always need to do a case sensitive lookup, so rather than doing the case sensitivity runtime (with the BINARY keyword), we store the value itself as binary, so it is naturally looked up like that. Doing case insensitive lookups on BLOB fields sounds like a performance problem, since you need to uppercase or lowercase each column value and then compare it to the uppercase/lowercase value of your search string respectively. That could be a pretty slow operation.
So to solve this I see two options:
A. We either convert the BLOB columns to TEXT columns and do a BINARY match from t() et. al. Needs benchmarks that this does not slow down performance. It might or might not be equivalent to current performance.
B. Or we keep the BLOB column but do uppercase/lowercase likes. This is likely more of a hit for performance BUT not in the crucial frontend part of the site, its merely for this single admin UI. However, the performance decrease can still be prohibitive.
So needs performance testing either way.
Also, for the UI I think making it just do case insensitive searches with no option for case sensitive is best. No need to clutter up the UI with more options, and people definitely rather have insensitive searches.
Comment #9
Gábor HojtsyPutting in the right module queue too.
Comment #10
Désiré CreditAttribution: Désiré commentedtagging
Comment #11
Désiré CreditAttribution: Désiré commentedassign the issue to myself
Comment #12
Désiré CreditAttribution: Désiré commentedI've made some quick performance testing, on a locale_source table with about 15000 record: I've searched for 5200 existing partial string, the results are the following (with MySQL):
For the original method (case sensitive search in blob field): 14.24 sec
db_query('SELECT source FROM locales_sources_orig WHERE source LIKE \'%' . $string . '%\'');
Case insensitive, lower case strings stored in blobs: 20.92 sec
db_query('SELECT source FROM locales_sources_2blob WHERE source_lower LIKE \'%' . strtolower($string) . '%\'');
Case insensitive, lower case strings stored in text: 90.14 sec
db_query('SELECT source FROM locales_sources_string WHERE source_lower LIKE \'%' . strtolower($string) . '%\'');
So I think we should use an other blob field to store the lowercase values. This can a little slows down the searching on the GUI, but it will not touch the t() and related searching.
Comment #13
Gábor HojtsyThat sounds like could slow down importing considerably too :/
Comment #14
Désiré CreditAttribution: Désiré commentedHere is the first patch for the lowercase blob based insensitive searching.
- schema updated
- update hook
- search in the new fields
- some test updated too
still missing:
- update tests
- tests for searching
Comment #15
Désiré CreditAttribution: Désiré commentedAs I see for importing we need to write an other blob field converted to lowercase, so I've made a quick performance test for this too:
I've write about 12000 real string (taken from a working D7 site), and write them to the locales_source table:
With the original table (only the original strings are stored as blob) the writing was 8.69 sec.
With the additional blob field for searching it was 8.88 sec.
Comment #17
Désiré CreditAttribution: Désiré commented#14: 1635298-Allow_non_case_sensitive_searching_of_strings_in_UI_translation-14-8.x.patch queued for re-testing.
Comment #19
Désiré CreditAttribution: Désiré commented#14: 1635298-Allow_non_case_sensitive_searching_of_strings_in_UI_translation-14-8.x.patch queued for re-testing.
Comment #21
jthorson CreditAttribution: jthorson commentedNote: Ignore the failure result text in the above patch ... the test was manually killed because it was cycling indefinately on one of our testbots; and the result text above is due to how we killed it.
Not sure why it was re-queued if it failed testing the first two times ... if there was something strange about the test results on those first two runs, please let us know.
Comment #22
Gábor Hojtsy@jthorson: Thanks for the feedback! I guess it looked puzzling how would any patch cause testbot to not be able to check out Drupal from git. How could this patch cause it? If it could not, then retesting it should possibly find testbot at a happier place and work, or at least give real testing feedback. Do you know how would the git checkout failures be related to the patch and what can we do about it in the patch?
Comment #23
Désiré CreditAttribution: Désiré commentedThere was a fatal error because of a MySQL insert error in the t(), but only in certain cases (this needs automated tests)
Comment #25
pp CreditAttribution: pp commentedBLOB can not be NOT NULL, because you can add default value. Set it to FALSE.
Why not use $sandbox? If locales_source contains many string it will run to long.
Comment #26
pp CreditAttribution: pp commentedCorrected the not null problem.
Comment #28
Désiré CreditAttribution: Désiré commentedOK, I have no idea why the updates fails:
We have a 'blob' and 'not null' in the schema, and it works on new install.
But with db_add_field() in the update hook it throws an error. BUT: If I just create a new table with a 'blob', 'not null' field and then add an another to the table, it works... So I'll keep searching, but now here is a patch without the update hook, just to test the other parts of the patch.
Comment #30
Gábor HojtsyWe discussed it is possible to do the comparison on the fly, and the performance for that did not seem to be prohibitive. The only "little" (hahaha) issue is that the code devised was only working on MySQL. Asked Désiré to post updates, so we can point SQL experts here to help figure this out in a more compatible way :)
Comment #31
Gábor HojtsyMoving off of sprint given no activity and not being critical to drive more activity here ATM.
Comment #32
YesCT CreditAttribution: YesCT commentedI just ran into this, searching for "read more" when I should have searched for "Read more".
Comment #32.0
YesCT CreditAttribution: YesCT commentedAdded brackets around issue number
Comment #33
MantasK CreditAttribution: MantasK commentedI have created some custom translation interface in d7 and to make search case insensitive I did this:
$query->where("CONVERT(r.source USING utf8) like :search", array(':search'=> '%' . db_like($filterString) . '%'));
regarding speed:
SELECT source FROM locales_source WHERE source LIKE '%string%' 0.0129 sec
SELECT source FROM locales_source WHERE CONVERT( source USING utf8 ) LIKE '%string%' 0.0321 sec
actually speed is not accurate. depending on word it was different. and sometimes one query is faster than another. but no big difference. It would go between 0.01 and 0.06s
and I have 45,757 records
Comment #34
geek-merlinBig +1 from me.
Also crosslinking related patch: #2269503: Translation overview page: Allow exact match (or better, anchoring at start or end)
Comment #35
nicrodgersIn case anyone lands on this page via Google and is looking for a quick and easy way to make the existing string search case insensitive, I've created a sandbox module that does exactly that (for Drupal 7):
https://www.drupal.org/sandbox/nicrodgers/2593305
Tested on MySQL, and from limited performance tests on sites with between 20,000 and 50,000 strings, search performance is still speedy.
It's obviously just an interim quick-fix solution until it can be addressed with a more permanent solution in locale itself.
Comment #38
nsputnik CreditAttribution: nsputnik commented@MantasK Where do you place that line?
Comment #43
jkdev CreditAttribution: jkdev commentedHi,
I have looked in the code, and found that the reason this behaviour is happening is because of this line:
locale/src/StringDatabaseStorage.php:441 - StringDatabaseStorage:dbStringSelect
Now, if this function serves the admin pages only, and not site wide
t()
function,We could theoretically change it to:
(the quotes looks rather ugly.. but you get the idea)
This should have minimal impact of performance.
further more, as we support mysql, sqlite3 and pgsql, all of those engines use
LOWER
function, so that supposed to be safe to use.Comment #48
jboxberger CreditAttribution: jboxberger as a volunteer commentedHello,
i made up an easy solution and wanted to share it with you. Since there is no issue with the BINARY format itself i do not see a task to change it here. The problem is the search query. So i created a view on the locales_source which formats the "source" column in utf8mb4 and then made an override for the StringDatabaseStorage.php in my custom module. Now i can route the query against the locales_source_ci view where the 'LIKE' behaves case-insensitive as wanted.
I made a module out of that with auto installing an dropping the view and also a dropdown for searching case-sensitive and case-insensitive, so if there is any interest in it i can upload it somehow.