Problem/Motivation
For the readiness checks and even during applying of updates, we want to make sure we aren't modifying a file that is already changed from its default. Call it patched, hacked, whatever. To do that, we should run a hash check against the files and compare it against... something.
It should look like: http://mirrors.us.kernel.org/ubuntu-releases/19.04/SHA256SUMS
Proposed resolution
Rudimentary profiling of sha256 vs sha512. The sums were run against all the files on the disk inside of a docker instance running on Linux. A full Drupal 8.8 clone with all test dependencies + 10-15 contrib modules on a fairly powerful 64bit laptop running SSD.
hash_file('sha256', $file_path)
: 1ms
hash_file('sha512', $file_path)
: 60ms
sha256 is definitely faster, but not majorly so. Let's stay with 512 unless we discover a reason to go with 256.
Also, I ran the numbers on generating the hashes.
$ time find . -type f -print0 | xargs -0 sha256sum >SHA256UMS
real 0m1.472s
user 0m1.365s
sys 0m0.175s
$ time find . -type f -print0 | xargs -0 sha512sum >SHA512SUMS
real 0m0.978s
user 0m0.857s
sys 0m0.172s
Results of running the SHA sums should be stored on ftp.drupal.org in nested folders:
project_name/version-string/SHA512SUMS
project_name/version-string/SHA512SUMS-package - this one has the hashes after packaging adds its extra data
Examples:
- https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha256su...
- https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha256su...
- https://updates.drupal.org/release-hashes/drupal/8.7.3/contents-sha256su...
- https://updates.drupal.org/release-hashes/drupal/8.7.3/contents-sha256su...
- https://updates.drupal.org/release-hashes/token/8.x-1.5/contents-sha256s...
- https://updates.drupal.org/release-hashes/token/8.x-1.5/contents-sha256s...
- https://updates.drupal.org/release-hashes/bootstrap/8.x-3.20/contents-sh...
- https://updates.drupal.org/release-hashes/bootstrap/8.x-3.20/contents-sh...
Remaining tasks
How does this handle when drupal 9 is released? Will all modules just continue using a single version for modules that don't include the major version? Decide.- Code up a generation of the sha sums.
- Only generate the sums for D7,D8 projects. No need to retrofit older projects. No need to handle this for dev releases.
- For core, will also need to include the vendor folder. See comment #34.
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#44 | drupal-8.7.3-SHA512SUMS-package.txt | 3.07 MB | drumm |
#44 | drupal-8.7.3-SHA512SUMS.txt | 2.64 MB | drumm |
#44 | drupal-8.7.4-SHA512SUMS-package.txt | 3.07 MB | drumm |
#44 | drupal-8.7.4-SHA512SUMS.txt | 2.64 MB | drumm |
#30 | interdiff_19-30.txt | 1.76 KB | heddn |
Comments
Comment #2
heddnComment #3
heddnComment #4
heddnAdding a file so I can start testing with something while we figure more things out.
Comment #5
heddnComment #6
greg.1.anderson CreditAttribution: greg.1.anderson at Pantheon commentedWordPress has a service like this, and it is very helpful. See wp-cli core verify-checksums.
If this were a separate service, then we could provide an equivalent Drush command.
Comment #7
drummIf this is stored separately, drupalorg will be the issue queue for the implementation.
Comment #8
heddnAfter more research, I think a SHA512SUMS is more appropriate.
Comment #9
heddnSome other notes,
hash_file('sha512', $file)
works on very old php versions, see https://3v4l.org/VRm0u. I feel like it should be an auto generated file during package creation. Although it would be nice to get it via a URL too. And even better if the sum file was gpg signed. See http://mirrors.us.kernel.org/ubuntu-releases/19.04/SHA256SUMS.gpgComment #10
catchI like the idea of a service for this.
While we should never have to patch a .info.yml file for a sec release (I guess there's always a first time), we'll need to account for composer-built sites vs. tarballs given the packaging script changes the contents.
Comment #11
heddnre #10, long term I think we want to add general update support, not just security. And in d7, I think we have added some class files in security updates. We might have loaded them via include_once, or not. I'm not sure. Fact is, we need to handle it both ways.
Comment #12
heddnI'm mentally working through this today I'm not sure we can easily calculate a SHA512SUMS for vendor files. Nor after we have that hash, would it be all the feasible to compare it. Mainly because of versions of the vendor files could vary widely and contrib modules do not have an approved, standard way to "require" 3rd party files. For php, we have some method using composer. But it isn't fully supported. And what about the js and other libraries that folks use/require for the module?
I'm more inclined to think we can hash/compare files explicitly inside an extension but any 3rd party dependencies, things in vendor, we can't do as much.
Putting this out for input/thoughts from others.
Comment #13
mbaynton#12 isn't this just the limitation of no composer support in the initial in-place updater, whose consequence I understood to be "if /any/ php file that occurs in the Drupal release archive has been modified, your site can't use automatic updates 1.0"?
Certainly we also can't start switching out things in vendor/ if Composer has had a crack at it on a particular site and put a non-Drupal release version there.
So don't we actually just treat vendor like all other things for now?
Comment #14
heddn#3050804: Modified code checker just turned green. It is using a dummy SHA512SUMS stream and demonstrates how I envision seeing us generate and comsume a hash. @drumm, should I move this over to the d.o. intrastructure queue to work out a solution, especially considering the issues with .info files being modified by packager? I don't account for it yet in that other issue, but we could use an alternative endpoint for packager built sha sums, since we are very capable to distinguish if the info.yml was from a tarball or git clone.
Comment #15
drummFrom #7 - drupalorg will be the issue queue for the implementation
Specifically, the packaging script is https://git.drupalcode.org/project/drupalorg/blob/dev/drupalorg_project/...
Are we thinking of changing the modifications, or do we just need hashes of the modified as packaged files available?
Comment #16
heddnThe simple solution is to have 2 endpoints. One that shows the hash w/o the packaging changes and one with. Then if we have an info with the timestamp and version details added to it, then use the packaging endpoint. If we don't want the modifications, that seems like it won't happen to me.
Comment #17
heddnFor pre-info file modifications it looks like we could add an implementation of
hook_drupalorg_package_release
that sha512sums and outputs to a specific location.I don't see a hook for post info file processing. Maybe somewhere around https://git.drupalcode.org/project/drupalorg/blob/dev/drupalorg_project/... we could add a post-info hook to generate sha512sums with the .info file changes.
Comment #18
heddnMoving queues.
Comment #19
drummMoving to the correct project.
Comment #20
heddnHere's a PoC. To give a general idea.
Comment #21
drummThis is already site-specific code, we don’t need the extensibility/complexity of a hook system here. Just add to the class, using comments, and maybe a private method like you’ve added, to explain and organize things.
Makes sense to me. So this is a single file per-package, with the sha512 of every file in that package? Could both the packaged and original .info file hashes be stuffed into the same file to indicate which is which?
Probably outside the scope of things, but drush, maybe only drush make, does modify info files similarly:
Comment #22
heddndrush make will kill the hash then :(
Having a single file to loop over is by far the easiest solution. These are very small files. We'll probably also want to add gpg signing as well. But I'm keeping it simple for now.
Comment #23
drummFor putting these on ftp.drupal.org, we should probably have these in subdirectories per-project, since https://ftp.drupal.org/files/projects/ having all the files is not ideal. And a call to
drupalorg_fastly_purge_url()
when this is generated, everything on this domain is cached with a long expiration date and actively purged when needed.Comment #24
heddnAgreed w/ #23. I'm currently keying the urls off
project_name/version/SHA512SUMS
. And we could probably also do something likeproject name/version/SHA512SUMS.packager
for the tarballs.Comment #25
mbayntonRe: eventual signing of the SHA512SUMS files, having the signatures verifiable with php is important, as is having better infrastructure that we currently do to manage the signing. Relevant discussion from today's slack meeting:
Note also that WordPress has gone this route with their crypto: https://www.zdnet.com/article/wordpress-finally-gets-the-security-featur...
Comment #26
heddnWe have a small dilemma as clearly identified in #3050804-16: Modified code checker. Not all projects' have a module with the same name as the project. The example I gave there was: varnish_purge vs module varnish_purger.
Surely we've solved this with
drush update
in the past. What did we do there?Comment #27
heddnI think I found a solution, packaging for tarballs adds a project value to .info files. For composer, we can parse the composer.json for the project name and use
ocramius/package-versions
for the version number.Comment #28
heddnBased on my reading of https://github.com/WordPress/wordpress-develop/blob/master/src/wp-admin/... and the trac ticket https://core.trac.wordpress.org/ticket/39309 that @mbaynton found, I'm going to document some findings.
This is also based on the findings from https://3v4l.org/VRm0u. for file hashes, we could use SHA-256 or SHA-512. For the 32-bit drupal installs, maybe going with SHA-256 is a better fit than 512. Yes we could have collisions, but we're not dealing with crypto here. We're trying to decide if files are hacked. Anyone have an opinion?
The rest of the findings in WP's implementation are around cryptographic algorithms. Which for file hashing don't apply. It sounds like we want to use
paragonie/sodium_compat
anded25519
for signing the hash files.Comment #29
heddnWould 256 scale better on 32-bit OSes. And do sufficiently better on 64bit that it outweighs the benefit of the extra few bits?
Comment #30
heddnI've updated the IS with some of my recommendations. One out standing question that occurred to me:
How does the ftp folder structure need to adjust when drupal 9 is released? Will all contrib modules just continue using a single version that don't include the major version?
Comment #31
drummLet’s leave the API compatibility out of this and only use the full version number, whether it is a legacy version number like 8.x-2.x-dev, or after #3054391: [META] Support semantic versioning on drupal.org for contributed projects lands, the new format.
Examples from the issue summary would be
entity/7.x-1.9/SHA512SUMS
entity/7.x-1.9/SHA512SUMS-package
entity/8.x-1.0-rc2/SHA512SUMS
entity/8.x-1.0-rc2/SHA512SUMS-package
drupal/7.67/SHA512SUMS
drupal/7.67/SHA512SUMS-package
drupal/8.7.1/SHA512SUMS
drupal/8.7.1/SHA512SUMS-package
Comment #32
heddnUpdated IS with the suggested paths.
Comment #33
heddnComment #34
heddnI'm working on sandboxing what an in place upgrade might look like. It just so happened that the first "upgrade" I tried to do was 8.7.0 => 8.7.1. That upgrade was 100% vendor library updates. Meaning, changes to composer.lock and composer.json. When building the hash, I think we will need to hash the vendor folder too. Maybe others already knew this. But let's make that assumption explicit.
Comment #35
drummThe non-
…-package
files will not have the vendor directory since vendor isn’t in Git, correct?Comment #36
heddnRe #35: that is a tricky question. Maybe it makes sense not to build the non
-package
hashes at this point in time. Since we don't really know everything about how we'll use them. I could see arguments for running for and against, and the non packaged isn't really in scope at this point.Comment #37
heddnAnother idea, rather than generating sum files for everything initially... what if we only did it for 8.7.0, 8.7.1 point releases? Then if we like it, we could roll it out further. This whole auto update stuff seems very experimental and I'd really like to learn some things before we have hundreds of hash files rolling out across the ftp site.
Comment #38
heddnComment #39
drummI’m assuming we should only be generating hashes of tagged releases, correct?
We’ll have to decide what, if anything, to do for distributions, like https://www.drupal.org/project/lightning/releases/8.x-4.001. They have 3 sets of packaged files, the
…-core
and…-no-core
ones in addition to base packaging.Comment #40
heddnTagged releases was my assumption too. Distributions and profiles will be tricky...
Comment #44
drummFor distributions, I’m skipping them for now. Only the bare package, without drush make run, gets the same treatment. If these are needed in the future, we can decide on a naming convention and generate them.
Attached are samples from the generation so far.
I went over this with nnewton and mlhess. Our consensus so far is:
All together, that is tentatively:
Comment #46
drummThose 4 examples are now delivering in production:
And I got a module and a theme for examples, too:
Remaining work:
Comment #47
heddnTaking a look at #46.
Comment #48
heddnI'm starting to make some modifications in #3071193: Use live SHA256SUMS to use these files. There's some things that aren't working, namely because they are edges cases I didn't cover in my unit tests. The nice thing though is that the hash files themselves work great.
Give me a few more days of playing with these formats. To make sure nothing doesn't surface with more testing. But with the first look, things are looking very promising.
Comment #50
drummThe examples in #46 are now regenerated with the BSD-style “tags” output of the hashes.
I had hoped to get these signed with temporary keys for testing, but that will have to wait for an OS upgrade or #3076191: Build a signing oracle server & HSM infrastructure.
These should be verifiable using https://github.com/drupalassociation/php-signify/blob/master/src/Verifie.... Note that method is currently protected, probably because it is intended to enforce only checking signed data. Maybe it can be made public with an exception to discourage misuse, and get a start on finding any gaps in its parsing and real-world use.
These commits also show that CDN purging is working well.
Comment #52
drummI created csig-signed versions of the test checksum lists:
The public key for the totally temporary root key used is https://git.drupalcode.org/project/drupalorg/blob/dev/drupalorg/temporar...
Comment #54
drummPer #3077737: [Policy, no patch] use SHA256 vs SHA512 hashing, we actually want sha256
Comment #57
drummI cleared out the old examples and have a new set:
Comment #58
heddnCSIG sha256 sums are now also generated for:
Comment #60
drummWe now have fully working package contents hashes, except that their generation isn’t automatically triggered. #3099876: Establish a packaging pipeline queue will help handle new releases.
Comment #65
drummNow all release contents hashes are generated, except for 52 releases where the Git tag has gone missing; and new releases will get their contents hashed when packaging is done and the release is published.