Problem/Motivation

Drupal core is being shipped with a set of scripts. One particular is transliteration_data.php.txt (added in #2097587: Follow-up: Add transliteration data maintenance script to scripts directory)

That script is dangerous thus the decision was made to ship it with a txt extension, so it cannot be run via browser or executed accidentally. However, this file is exposed to the web, as it is not protected by htaccess for example resulting in false-positive during automated security checks with a message like:

Source code disclosure
Looks like the source code for this script is available.

Steps to reproduce

Open any D8/9 website and open the /core/scripts/transliteration_data.php.txt

For example https://dri.es/core/scripts/transliteration_data.php.txt

Proposed resolution

Scenario A:

  • Remove txt extension and make it a normal PHP file
  • Add protection from running via browser (check whether it is being run via CLI)
  • Comment out actual code execution part, i.e. force user to uncomment it before it can be run
  • Create a follow-up issue to refactor the script

Scenario B:

  • Remove the script (as @Charlie ChX Negyesi suggested)

Remaining tasks

  1. Agree on scenario A or B
  2. Review & approve the patch

User interface changes

None

API changes

None

Data model changes

None

Release notes snippet

TBD

Issue fork drupal-3205578

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Taran2L created an issue. See original summary.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Taran2L’s picture

Status: Active » Needs review
longwave’s picture

Status: Needs review » Needs work

The PHP file doesn't follow our coding standards, this wasn't a problem when it was .txt as it wasn't detected as PHP.

See https://www.drupal.org/pift-ci-job/2067433 for the list of errors

Taran2L’s picture

Status: Needs work » Needs review

Thanks, @longwave, please check the updated version.

Ghost of Drupal Past’s picture

I have so many questions.

Why is this file shipping with Drupal core? Are we running it regularly with updated datasets? git log tells me the code have seen zero changes since 2013 (comments did, once, in a mass search-and-replace). So perhaps just remove it, we can resurrect it at will with git if we ever want to re-run it? Or put it in some other repository where it can languish with the same purpose?

On the other hand, if we do run it regularly, then please do not make it harder to run automatically by forcing users to edit the source code. Making the script do a dry run by default and requiring something like --do-patch-core I believe would be better.

Ghost of Drupal Past’s picture

Reviewing git log --oneline core/lib/Drupal/Component/Transliteration/data/ finds all improvements to the dataset were done manually by editing the generated files and not by running the script. This means any run of the script would destroy these improvements.

  1. #2926187: Better Greek transliteration
  2. #2932249: Incorrect transliteration of some Russian Cyrillic characters
  3. #3169212: Improve transliteration of Ukrainian letters

Recommendation: remove the script.

Ghost of Drupal Past’s picture

Alternatively, investigate the previous issues compared to our data sources especially to ICU / intl as PHP calls it. In the "I have so many questions" rubric we can add: why did we need to edit so when ICU is the Unicode reference implementation? How do the other projects we investigated when transliteration was added dealt with these problems? Perhaps the solution to these problems could be generalized somehow and so the script could be amended and perhaps even more fixes could be generated this way? My notes on Unicode / ICU transliteration are in https://www.drupal.org/project/drupal/issues/3099668#comment-13700693

This might be a separate issue and then this becomes postponed on it -- I believe first it needs to be decided whether we want to fix or remove the script.

Taran2L’s picture

Issue summary: View changes

hi @Charlie ChX Negyesi, thanks for your review.

I am totally fine with removing the thing altogether, as, in my opinion, it's being rarely used (if used at all).

All your questions and concerns are definitely valid, I had the same when looking at it. However, refactoring the script itself is out of scope for this issue imo and requires a separate follow-up issue.

Thus, there are 2 options:

  1. Work towards getting this patch committed (and then refactor/fix it in a follow-up)
  2. Remove script altogether

andypost’s picture

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

cydharttha’s picture

On a similar note, I had a security audit come back today which flagged {site}/core/scripts/drupal as PHP Code Exposure since it's publicly accessible.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

kevinn’s picture

Version: 10.1.x-dev » 9.4.x-dev
FileSize
22.26 KB

Here is a patch that removes the file.

kevinn’s picture

Version: 9.4.x-dev » 10.1.x-dev
longwave’s picture

Status: Needs review » Reviewed & tested by the community

I agree with @chx, I don't think this has actually been used in several years - let's just delete this file.

  • alexpott committed 35e72cb on 10.1.x
    Issue #3205578 by Taran2L, kevinn, Charlie ChX Negyesi, longwave: Source...
alexpott’s picture

Version: 10.1.x-dev » 9.5.x-dev
Status: Reviewed & tested by the community » Fixed

Re-read #3099668: Expand the remove diacritics feature to other scripts - there is something pretty interesting going on there that I think might be worth pursuing for Drupal 11. I think we should completely remove our transliteration component. We could rely on Symfony's poor persons transliteration and tell people if they want better transliteration to have the INTL extension installed. Then we'd no longer be responsible for much of this problem-space which seems ideal to me.

Wrt to this file I agree that removing it seems the best course of action.

Committed and pushed 35e72cbe16 to 10.1.x and 4be39e854b to 10.0.x and 8effb421ba to 9.5.x. Thanks!

  • alexpott committed 4be39e8 on 10.0.x
    Issue #3205578 by Taran2L, kevinn, Charlie ChX Negyesi, longwave: Source...

  • alexpott committed 8effb42 on 9.5.x
    Issue #3205578 by Taran2L, kevinn, Charlie ChX Negyesi, longwave: Source...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.