Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Problem/Motivation
Æ
has a few accented variants, these do not get their accents removed because the transliteration component returns AE
and the removeDiacritics method only consider single letter replacements exactly in order to avoid replacing Æ itself with AE while removing diacritics.
Proposed resolution
Map these back. The consequences are minor: searching for Æ will now find Ǣ where previously it wouldn't.
I attached an image showing these were the only ones having accents on the expected side and what the change is really.
Remaining tasks
Does this need a change notice ?
Comment | File | Size | Author |
---|---|---|---|
#31 | 3151364-31-9.1.x.patch | 3.25 KB | amateescu |
#27 | interdiff.txt | 840 bytes | Ghost of Drupal Past |
#27 | 3151364_27.patch | 3.28 KB | Ghost of Drupal Past |
#3 | 2020-06-13 03_15_31-StrokesPlus.png | 145.09 KB | Ghost of Drupal Past |
Comments
Comment #2
Ghost of Drupal PastComment #3
Ghost of Drupal PastComment #4
Ghost of Drupal PastMoved the map into a constant.
Comment #5
jhodgdonThis change makes a ton of sense. It even has a test.
As a note, at least on my system, when I looked at the patch file in the browser, the characters were weird. But when I applied the patch and looked at the files in my terminal or text editor, the characters were readable and the patch looks correct to me.
Comment #6
alexpottThis map is now added to the global namespace but it is (necessarily) a very incomplete map. It'd be nice to add protected in front of it but then it's going to be Drupal 9 only because scoped constants were only introduced in PHP 7.1. Maybe we can consider a name that doesn't imply a complete map or somehow describes what it maps.
Another question I have is will this list grow? Because if if does then we might want to consider placing it somewhere else because then we don't have to instantiate it when we load this class.
Comment #7
Ghost of Drupal PastThis list can't grow because removeDiacritics is operating on a very small set of characters (323 of them) and of those, these six have this problem.
A better name eludes me. fix_transliterate_for_remove_diacritics?? The crux of the matter is removeDiacritics using the transliteration datasets for its work which is basically a crafty trick which happens to be correct except for these six. We totally could supply a different dataset for removeDiacritics but it'd be a waste to double maintain that. Look at Component/Transliteration/data/x01.php on how the transliterate dataset is supplied.
Let me describe the bug again by describing what happens when removeDiacritics is ran over a few specific characters:
I achieved this by using a simple map: AE gets mapped back to Æ. The new code flow is:
Note the test is total. Every character affected by removeDiacritics is tested.
We could eliminate the constant and make it a local variable again, it's just an array of six, not a biggie, I felt a constant is more expressive.
Comment #8
alexpottHow about making the list a protected property - that's supported and then it does not bleed into the public API.
Comment #9
Ghost of Drupal PastMoved to a property. Thanks for the guidance. The interdiff is not meaningful so I haven't attached it.
Comment #10
jhodgdonIt looks to me as though the concerns of alexpott have been addressed. I still like the patch. The tests still pass. No coder errors. And this is a sensible fix to removeDiacritics(), with a very small impact (removes diacritics from a total of 4 characters that were incorrectly being passed up in the previous code). Back to RTBC.
Comment #11
alexpottOne more thing to consider. Re-indexing. I think this change makes a re-index necessary because it changes search input and the text that is indexed... yes there probably aren't too many ǢǣǼǽǮǯ characters in people's text (this statement might very well be a statement of my ignorance) but this change will breaking searching for them on an already indexed site.
Comment #12
Ghost of Drupal PastComment #13
alexpottLet's do this in a post update - this is not a schema change and it also doesn't force our hand wrt to D8 and D9 update numbers for the search module.
Comment #14
Ghost of Drupal PastComment #15
Ghost of Drupal PastComment #16
Ghost of Drupal PastComment #17
jpatel657 CreditAttribution: jpatel657 at Trigyn Technologies Ltd commentedComment #18
jpatel657 CreditAttribution: jpatel657 at Trigyn Technologies Ltd commentedComment #19
jpatel657 CreditAttribution: jpatel657 at Trigyn Technologies Ltd commentedPatch is working for 9.1.x
Comment #20
Ghost of Drupal PastI do not quite understand why #16-19 was posted. The patches are identical in #16 and #19.
Comment #21
Ghost of Drupal PastComment #22
alexpott@jpatel657 the patch in #16 applied to 9.1.x already. You could have used the add test button to kick off an 9.1.x test against that branch. Since #19 is exactly the same as #16 I'm removing issue credit.
Comment #23
alexpottAlso since this is a bug that occurs in 8.9.x that's the correct branch for this patch. It's great that the same patch applies to 9.1.x so we can applied the same patch to all branches.
Comment #24
jhodgdonApparently, post update functions go into the module_name.post_update.php file. Other than that, looks OK to me.
Comment #25
alexpott@jhodgdon is correct - that's exactly where they go. Nice catch.
Comment #26
munish.kumar CreditAttribution: munish.kumar as a volunteer and at Srijan | A Material+ Company for Drupal India Association commentedComment #27
Ghost of Drupal PastThe interdiff is against #9 when the patch was last RTBC.
Comment #28
jibranBack to RTBC as all the outstanding feedback is addressed.
Comment #29
catchThis looks good to me, but there's a conflict in search.post_update.php now.
Comment #30
Ghost of Drupal PastAlas, someone else will need to fix this now.
Comment #31
amateescu CreditAttribution: amateescu as a volunteer commentedHere's a patch for 9.1.x, and #27 still applies to 8.9.x.
Comment #32
Krzysztof Domański@jhodgdon #2922638: No charset on response for patch & text files
Comment #33
Krzysztof DomańskiComment #35
catchCommitted/pushed to 9.1.x and backported to 8.9.x, thanks!