Problem/Motivation

For the readiness checks and even during applying of updates, we want to make sure we aren't modifying a file that is already changed from its default. Call it patched, hacked, whatever. To do that, we should run a hash check against the files and compare it against... something.

It should look like: http://mirrors.us.kernel.org/ubuntu-releases/19.04/SHA256SUMS

Proposed resolution

Rudimentary profiling of sha256 vs sha512. The sums were run against all the files on the disk inside of a docker instance running on Linux. A full Drupal 8.8 clone with all test dependencies + 10-15 contrib modules on a fairly powerful 64bit laptop running SSD.

hash_file('sha256', $file_path): 1ms
hash_file('sha512', $file_path): 60ms

sha256 is definitely faster, but not majorly so. Let's stay with 512 unless we discover a reason to go with 256.

Also, I ran the numbers on generating the hashes.

$ time find . -type f -print0  | xargs -0 sha256sum >SHA256UMS

real	0m1.472s
user	0m1.365s
sys	0m0.175s
$ time find . -type f -print0  | xargs -0 sha512sum  >SHA512SUMS

real	0m0.978s
user	0m0.857s
sys	0m0.172s

Results of running the SHA sums should be stored on ftp.drupal.org in nested folders:

project_name/version-string/SHA512SUMS
project_name/version-string/SHA512SUMS-package - this one has the hashes after packaging adds its extra data

Examples:

Remaining tasks

  • How does this handle when drupal 9 is released? Will all modules just continue using a single version for modules that don't include the major version? Decide.
  • Code up a generation of the sha sums.
  • Only generate the sums for D7,D8 projects. No need to retrofit older projects. No need to handle this for dev releases.
  • For core, will also need to include the vendor folder. See comment #34.

User interface changes

API changes

Data model changes

Release notes snippet

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

heddn created an issue. See original summary.

heddn’s picture

Issue summary: View changes
heddn’s picture

Issue summary: View changes
heddn’s picture

FileSize
216 bytes

Adding a file so I can start testing with something while we figure more things out.

heddn’s picture

Issue summary: View changes
greg.1.anderson’s picture

WordPress has a service like this, and it is very helpful. See wp-cli core verify-checksums.

If this were a separate service, then we could provide an equivalent Drush command.

drumm’s picture

I'm adding this here instead of another queue, because we should first decide if the hash file should be autogenerated and stored separately from git and not included in version control.

If this is stored separately, drupalorg will be the issue queue for the implementation.

heddn’s picture

After more research, I think a SHA512SUMS is more appropriate.

heddn’s picture

Title: Add SHA256SUMS file of all files » Add SHA512SUMS file of all files

Some other notes, hash_file('sha512', $file) works on very old php versions, see https://3v4l.org/VRm0u. I feel like it should be an auto generated file during package creation. Although it would be nice to get it via a URL too. And even better if the sum file was gpg signed. See http://mirrors.us.kernel.org/ubuntu-releases/19.04/SHA256SUMS.gpg

catch’s picture

I like the idea of a service for this.

While we should never have to patch a .info.yml file for a sec release (I guess there's always a first time), we'll need to account for composer-built sites vs. tarballs given the packaging script changes the contents.

heddn’s picture

re #10, long term I think we want to add general update support, not just security. And in d7, I think we have added some class files in security updates. We might have loaded them via include_once, or not. I'm not sure. Fact is, we need to handle it both ways.

heddn’s picture

I'm mentally working through this today I'm not sure we can easily calculate a SHA512SUMS for vendor files. Nor after we have that hash, would it be all the feasible to compare it. Mainly because of versions of the vendor files could vary widely and contrib modules do not have an approved, standard way to "require" 3rd party files. For php, we have some method using composer. But it isn't fully supported. And what about the js and other libraries that folks use/require for the module?

I'm more inclined to think we can hash/compare files explicitly inside an extension but any 3rd party dependencies, things in vendor, we can't do as much.

Putting this out for input/thoughts from others.

mbaynton’s picture

#12 isn't this just the limitation of no composer support in the initial in-place updater, whose consequence I understood to be "if /any/ php file that occurs in the Drupal release archive has been modified, your site can't use automatic updates 1.0"?

Certainly we also can't start switching out things in vendor/ if Composer has had a crack at it on a particular site and put a non-Drupal release version there.

So don't we actually just treat vendor like all other things for now?

heddn’s picture

#3050804: Modified code checker just turned green. It is using a dummy SHA512SUMS stream and demonstrates how I envision seeing us generate and comsume a hash. @drumm, should I move this over to the d.o. intrastructure queue to work out a solution, especially considering the issues with .info files being modified by packager? I don't account for it yet in that other issue, but we could use an alternative endpoint for packager built sha sums, since we are very capable to distinguish if the info.yml was from a tarball or git clone.

drumm’s picture

From #7 - drupalorg will be the issue queue for the implementation

Specifically, the packaging script is https://git.drupalcode.org/project/drupalorg/blob/dev/drupalorg_project/...

especially considering the issues with .info files being modified by packager?

Are we thinking of changing the modifications, or do we just need hashes of the modified as packaged files available?

heddn’s picture

The simple solution is to have 2 endpoints. One that shows the hash w/o the packaging changes and one with. Then if we have an info with the timestamp and version details added to it, then use the packaging endpoint. If we don't want the modifications, that seems like it won't happen to me.

heddn’s picture

For pre-info file modifications it looks like we could add an implementation of hook_drupalorg_package_release that sha512sums and outputs to a specific location.

I don't see a hook for post info file processing. Maybe somewhere around https://git.drupalcode.org/project/drupalorg/blob/dev/drupalorg_project/... we could add a post-info hook to generate sha512sums with the .info file changes.

heddn’s picture

Project: Automatic Updates »
Version: 8.x-1.x-dev »

Moving queues.

drumm’s picture

Project: » Drupal.org customizations
Version: » 7.x-3.x-dev

Moving to the correct project.

heddn’s picture

FileSize
2.2 KB

Here's a PoC. To give a general idea.

drumm’s picture

This is already site-specific code, we don’t need the extensibility/complexity of a hook system here. Just add to the class, using comments, and maybe a private method like you’ve added, to explain and organize things.

The simple solution is to have 2 endpoints. One that shows the hash w/o the packaging changes and one with. Then if we have an info with the timestamp and version details added to it, then use the packaging endpoint. If we don't want the modifications, that seems like it won't happen to me.

Makes sense to me. So this is a single file per-package, with the sha512 of every file in that package? Could both the packaged and original .info file hashes be stuffed into the same file to indicate which is which?

Probably outside the scope of things, but drush, maybe only drush make, does modify info files similarly:

; Information added by drush on 2019-04-16
version = "7.x-2.0-alpha3+80-dev"
project = "project"
datestamp = "1555429111"
heddn’s picture

drush make will kill the hash then :(

Makes sense to me. So this is a single file per-package, with the sha512 of every file in that package? Could both the packaged and original .info file hashes be stuffed into the same file to indicate which is which?

Having a single file to loop over is by far the easiest solution. These are very small files. We'll probably also want to add gpg signing as well. But I'm keeping it simple for now.

drumm’s picture

For putting these on ftp.drupal.org, we should probably have these in subdirectories per-project, since https://ftp.drupal.org/files/projects/ having all the files is not ideal. And a call to drupalorg_fastly_purge_url() when this is generated, everything on this domain is cached with a long expiration date and actively purged when needed.

heddn’s picture

Agreed w/ #23. I'm currently keying the urls off project_name/version/SHA512SUMS. And we could probably also do something like project name/version/SHA512SUMS.packager for the tarballs.

mbaynton’s picture

Re: eventual signing of the SHA512SUMS files, having the signatures verifiable with php is important, as is having better infrastructure that we currently do to manage the signing. Relevant discussion from today's slack meeting:

dts   [2 hours ago]
The reason I'm more +1 on libsodium + Ed25519 is because there's a good implementation built into modern PHP and a polyfill for older.
dts   [1 hour ago]
Some HSMs do support it, just not affordable cloud ones.
mixologic   [1 hour ago]
pardon my ignorance, but whats an HSM again?
mixologic   [1 hour ago]
nevermind. google worked when I thought it wouldnt
mbaynton   [1 hour ago]
heh I mean ask @dts, but from what I've learned from him and some light reading, it's basically a dedicated hardware device that provides a much higher degree of security and auditability than like a linux box for signing your things
mixologic   [1 hour ago]
yep. Really, thats the answer to my question for "where are the signatures going to come from"
dts   [1 hour ago]
I think the only reasonable HSM offering available today that supports Ed25519 is the one on Azure.
mixologic   [1 hour ago]
Any reason we'd avoid that?
dts   [1 hour ago]
It's really expensive.
mixologic   [1 hour ago]
The DA sales team are really good at getting deals from partners.
dts   [1 hour ago]
You're basically paying Azure to rack a hardware HSM.
dts   [1 hour ago]
I agree that, if we can get a deal with Microsoft, it's worth pursuing.
mixologic   [1 hour ago]
Ah, okay, so really theres an avenue we can pursue.
dts   [1 hour ago]
This is the product: https://docs.microsoft.com/en-us/azure/dedicated-hsm/
mbaynton   [1 hour ago]
Just wondering if there might be other open-source orgs out there who already have HSMs because they already have to sign things, that would be able to allow the DA use theirs
dts   [1 hour ago]
That's actually a neat idea.
dts   [1 hour ago]
It would have to be really modern to support Ed25519.
dts   [1 hour ago]
But, maybe, we could even upgrade another project's?

Note also that WordPress has gone this route with their crypto: https://www.zdnet.com/article/wordpress-finally-gets-the-security-featur...

heddn’s picture

We have a small dilemma as clearly identified in #3050804-16: Modified code checker. Not all projects' have a module with the same name as the project. The example I gave there was: varnish_purge vs module varnish_purger.

Surely we've solved this with drush update in the past. What did we do there?

heddn’s picture

I think I found a solution, packaging for tarballs adds a project value to .info files. For composer, we can parse the composer.json for the project name and use ocramius/package-versions for the version number.

heddn’s picture

Based on my reading of https://github.com/WordPress/wordpress-develop/blob/master/src/wp-admin/... and the trac ticket https://core.trac.wordpress.org/ticket/39309 that @mbaynton found, I'm going to document some findings.

This is also based on the findings from https://3v4l.org/VRm0u. for file hashes, we could use SHA-256 or SHA-512. For the 32-bit drupal installs, maybe going with SHA-256 is a better fit than 512. Yes we could have collisions, but we're not dealing with crypto here. We're trying to decide if files are hacked. Anyone have an opinion?

The rest of the findings in WP's implementation are around cryptographic algorithms. Which for file hashing don't apply. It sounds like we want to use paragonie/sodium_compat and ed25519 for signing the hash files.

heddn’s picture

SHA-256 and SHA-512 are novel hash functions computed with 32-bit and 64-bit words

Would 256 scale better on 32-bit OSes. And do sufficiently better on 64bit that it outweighs the benefit of the extra few bits?

heddn’s picture

Issue summary: View changes
FileSize
2.27 KB
1.76 KB

I've updated the IS with some of my recommendations. One out standing question that occurred to me:

How does the ftp folder structure need to adjust when drupal 9 is released? Will all contrib modules just continue using a single version that don't include the major version?

drumm’s picture

How does this handle when drupal 9 is released? Will all modules just continue using a single version for modules that don't include the major version? Decide.

Let’s leave the API compatibility out of this and only use the full version number, whether it is a legacy version number like 8.x-2.x-dev, or after #3054391: [META] Support semantic versioning on drupal.org for contributed projects lands, the new format.

Examples from the issue summary would be

entity/7.x-1.9/SHA512SUMS
entity/7.x-1.9/SHA512SUMS-package
entity/8.x-1.0-rc2/SHA512SUMS
entity/8.x-1.0-rc2/SHA512SUMS-package

drupal/7.67/SHA512SUMS
drupal/7.67/SHA512SUMS-package
drupal/8.7.1/SHA512SUMS
drupal/8.7.1/SHA512SUMS-package

heddn’s picture

Issue summary: View changes

Updated IS with the suggested paths.

heddn’s picture

Issue summary: View changes
heddn’s picture

Issue summary: View changes

I'm working on sandboxing what an in place upgrade might look like. It just so happened that the first "upgrade" I tried to do was 8.7.0 => 8.7.1. That upgrade was 100% vendor library updates. Meaning, changes to composer.lock and composer.json. When building the hash, I think we will need to hash the vendor folder too. Maybe others already knew this. But let's make that assumption explicit.

drumm’s picture

The non-…-package files will not have the vendor directory since vendor isn’t in Git, correct?

heddn’s picture

Re #35: that is a tricky question. Maybe it makes sense not to build the non -package hashes at this point in time. Since we don't really know everything about how we'll use them. I could see arguments for running for and against, and the non packaged isn't really in scope at this point.

heddn’s picture

Another idea, rather than generating sum files for everything initially... what if we only did it for 8.7.0, 8.7.1 point releases? Then if we like it, we could roll it out further. This whole auto update stuff seems very experimental and I'd really like to learn some things before we have hundreds of hash files rolling out across the ftp site.

drumm’s picture

Assigned: Unassigned » drumm

I’m assuming we should only be generating hashes of tagged releases, correct?

We’ll have to decide what, if anything, to do for distributions, like https://www.drupal.org/project/lightning/releases/8.x-4.001. They have 3 sets of packaged files, the …-core and …-no-core ones in addition to base packaging.

heddn’s picture

Tagged releases was my assumption too. Distributions and profiles will be tricky...

  • drumm committed 2f6b73e on 7.x-3.x, dev
    Issue #3053199 by heddn, drumm: Add drush command to generate SHA512SUMS...

drumm credited mlhess.

drumm credited nnewton.

drumm’s picture

For distributions, I’m skipping them for now. Only the bare package, without drush make run, gets the same treatment. If these are needed in the future, we can decide on a naming convention and generate them.

Attached are samples from the generation so far.

I went over this with nnewton and mlhess. Our consensus so far is:

  • These should have a slightly different filename scheme from the http://mirrors.us.kernel.org/ubuntu-releases/19.04/SHA256SUMS example. The common use of files named like that is hashes of files being downloaded, not their contents.
  • updates.drupal.org makes the most sense to host these, since they are metadata about releases.

All together, that is tentatively:

https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha512sums.txt
https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha512sums-packaged.txt

  • drumm committed 1903e05 on 7.x-3.x, dev
    Issue #3053199: Update filenames for release content hashes
    
drumm’s picture

heddn’s picture

Taking a look at #46.

heddn’s picture

I'm starting to make some modifications in #3071193: Use live SHA256SUMS to use these files. There's some things that aren't working, namely because they are edges cases I didn't cover in my unit tests. The nice thing though is that the hash files themselves work great.

Give me a few more days of playing with these formats. To make sure nothing doesn't surface with more testing. But with the first look, things are looking very promising.

  • drumm committed 0ad4941 on 7.x-3.x
    Issue #3053199: Use BSD-style checksum formatting
    
  • drumm committed 1644e46 on 7.x-3.x
    Issue #3053199: Purge caches when generating release contents hash files
    
drumm’s picture

The examples in #46 are now regenerated with the BSD-style “tags” output of the hashes.

I had hoped to get these signed with temporary keys for testing, but that will have to wait for an OS upgrade or #3076191: Build a signing oracle server & HSM infrastructure.

These should be verifiable using https://github.com/drupalassociation/php-signify/blob/master/src/Verifie.... Note that method is currently protected, probably because it is intended to enforce only checking signed data. Maybe it can be made public with an exception to discourage misuse, and get a start on finding any gaps in its parsing and real-world use.

These commits also show that CDN purging is working well.

  • drumm committed 64813d9 on 7.x-3.x, dev
    Issue #3053199: Add test csig signing
    

  • drumm committed 938aed6 on 7.x-3.x, dev
    Issue #3053199: Add the composer.json file to require drupal/php-signify
    
  • drumm committed c2f2b02 on 7.x-3.x, dev
    Issue #3053199: Verify the signed data before writing it
    
drumm’s picture

Title: Add SHA512SUMS file of all files » Add SHA256SUMS file of all files

  • drumm committed c5519f3 on 7.x-3.x, dev
    Issue #3053199: Use sha256 instead of 512
    

  • drumm committed d08d338 on 7.x-3.x, dev
    Issue #3053199: $valid_through is an optional parameter
    
heddn’s picture

CSIG sha256 sums are now also generated for:

8.6.0
8.6.1
8.6.10
8.6.11
8.6.12
8.6.13
8.6.14
8.6.15
8.6.16
8.6.17
8.6.2
8.6.3
8.6.4
8.6.5
8.6.6
8.6.7
8.6.8
8.6.9
8.7.0
8.7.1
8.7.2
8.7.3
8.7.4
8.7.5
8.7.6
8.7.7
8.7.8

  • drumm committed 50d2e2b on 7.x-3.x
    Issue #3053199: Remove --force parameter, files will be regularly...
drumm’s picture

We now have fully working package contents hashes, except that their generation isn’t automatically triggered. #3099876: Establish a packaging pipeline queue will help handle new releases.

  • drumm committed f079d68 on 7.x-3.x
    Issue #3053199: Do not hard-code the signing queue write timeout
    

  • drumm committed 31336dc on 7.x-3.x
    Issue #3053199: Motivate the data to be utf8, throw exception on JSON...

  • drumm committed 332b1d7 on 7.x-3.x
    Issue #3053199: Queue release contents hash generation when a release...
  • drumm committed f99c055 on 7.x-3.x
    Issue #3053199: Release contents hashing logic cleanups:
    
    - Do not...

  • drumm committed 617154a on 7.x-3.x
    Issue #3053199: Only queue release contents hashing for tagged static...
drumm’s picture

Status: Needs review » Fixed

Now all release contents hashes are generated, except for 52 releases where the Git tag has gone missing; and new releases will get their contents hashed when packaging is done and the release is published.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.