Problem/Motivation
Drupal core is being shipped with a set of scripts. One particular is transliteration_data.php.txt (added in #2097587: Follow-up: Add transliteration data maintenance script to scripts directory)
That script is dangerous thus the decision was made to ship it with a txt extension, so it cannot be run via browser or executed accidentally. However, this file is exposed to the web, as it is not protected by htaccess for example resulting in false-positive during automated security checks with a message like:
Source code disclosure
Looks like the source code for this script is available.
Steps to reproduce
Open any D8/9 website and open the /core/scripts/transliteration_data.php.txt
For example https://dri.es/core/scripts/transliteration_data.php.txt
Proposed resolution
Scenario A:
- Remove txt extension and make it a normal PHP file
- Add protection from running via browser (check whether it is being run via CLI)
- Comment out actual code execution part, i.e. force user to uncomment it before it can be run
- Create a follow-up issue to refactor the script
Scenario B:
- Remove the script (as @Charlie ChX Negyesi suggested)
Remaining tasks
- Agree on scenario A or B
- Review & approve the patch
User interface changes
None
API changes
None
Data model changes
None
Release notes snippet
TBD
Comment | File | Size | Author |
---|---|---|---|
#17 | drupal-delete-file-3205578-17.patch | 22.26 KB | kevinn |
|
Issue fork drupal-3205578
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #4
Taran2LComment #5
longwaveThe PHP file doesn't follow our coding standards, this wasn't a problem when it was .txt as it wasn't detected as PHP.
See https://www.drupal.org/pift-ci-job/2067433 for the list of errors
Comment #6
Taran2LThanks, @longwave, please check the updated version.
Comment #7
Ghost of Drupal PastI have so many questions.
Why is this file shipping with Drupal core? Are we running it regularly with updated datasets? git log tells me the code have seen zero changes since 2013 (comments did, once, in a mass search-and-replace). So perhaps just remove it, we can resurrect it at will with git if we ever want to re-run it? Or put it in some other repository where it can languish with the same purpose?
On the other hand, if we do run it regularly, then please do not make it harder to run automatically by forcing users to edit the source code. Making the script do a dry run by default and requiring something like --do-patch-core I believe would be better.
Comment #8
Ghost of Drupal PastReviewing
git log --oneline core/lib/Drupal/Component/Transliteration/data/
finds all improvements to the dataset were done manually by editing the generated files and not by running the script. This means any run of the script would destroy these improvements.Recommendation: remove the script.
Comment #9
Ghost of Drupal PastAlternatively, investigate the previous issues compared to our data sources especially to ICU / intl as PHP calls it. In the "I have so many questions" rubric we can add: why did we need to edit so when ICU is the Unicode reference implementation? How do the other projects we investigated when transliteration was added dealt with these problems? Perhaps the solution to these problems could be generalized somehow and so the script could be amended and perhaps even more fixes could be generated this way? My notes on Unicode / ICU transliteration are in https://www.drupal.org/project/drupal/issues/3099668#comment-13700693
This might be a separate issue and then this becomes postponed on it -- I believe first it needs to be decided whether we want to fix or remove the script.
Comment #10
Taran2Lhi @Charlie ChX Negyesi, thanks for your review.
I am totally fine with removing the thing altogether, as, in my opinion, it's being rarely used (if used at all).
All your questions and concerns are definitely valid, I had the same when looking at it. However, refactoring the script itself is out of scope for this issue imo and requires a separate follow-up issue.
Thus, there are 2 options:
Comment #12
andypostInstead of removal it could use to move in context of related #3181570: Move scripts outside of /core/ that are not part of the end product.
Comment #14
cydharttha CreditAttribution: cydharttha commentedOn a similar note, I had a security audit come back today which flagged {site}/core/scripts/drupal as PHP Code Exposure since it's publicly accessible.
Comment #17
kevinn CreditAttribution: kevinn commentedHere is a patch that removes the file.
Comment #18
kevinn CreditAttribution: kevinn commentedComment #19
longwaveI agree with @chx, I don't think this has actually been used in several years - let's just delete this file.
Comment #21
alexpottRe-read #3099668: Expand the remove diacritics feature to other scripts - there is something pretty interesting going on there that I think might be worth pursuing for Drupal 11. I think we should completely remove our transliteration component. We could rely on Symfony's poor persons transliteration and tell people if they want better transliteration to have the INTL extension installed. Then we'd no longer be responsible for much of this problem-space which seems ideal to me.
Wrt to this file I agree that removing it seems the best course of action.
Committed and pushed 35e72cbe16 to 10.1.x and 4be39e854b to 10.0.x and 8effb421ba to 9.5.x. Thanks!