Problem/Motivation

In #3053199: Add SHA256SUMS file of all files, there was a decision to use SHA512 hashing. There was some logic put forth at the time. We'd like to rediscuss this decision.

This is mainly an issue with the core hash file. With SHA512 it comes in at roughly ~2.8mbs. Contrib projects, even large ones like bootstrap are nominal in size (bootstrap ~60k w/ SHA512). If we used SHA256 we'd cut the size by roughly 1/3, let's say around 1.9mbs.

There's the larger download size with SHA512. But then processing time for hash comparison takes longer client side. Some research shows 25% or 50% slower w/ SHA256. And this project encourages sites to run cron at least every 6 hours and validate the site is always ready for an automatic update. So we'd be running a slower SHA256 comparison roughly 4 times a day.

Another consideration, using truncated 512 hashes does not seem to be supported with signify.

Proposed resolution

Some research/resources: https://medium.com/@davidtstrauss/stop-using-sha-256-6adbb55c608

Remaining tasks

Decide which is preferred; SHA512 or SHA256.

User interface changes

API changes

Data model changes

Release notes snippet

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

heddn created an issue. See original summary.

heddn’s picture

Slack conversation:

pwolanin (he/him) 09:48
@heddn any thought to using sha256 to reduce the manifest size a bit?
heddn:mtech: 09:50
on the majority of machines, 512 takes less time to process client side than 256 (from my limited research)
hestenet (he/him) 09:52
^^ I may be totally misremembering, but I feel like I remember something about using a 512 because of processing time, but having the option to truncate for manifest size? I may be way off base though..

heddn:mtech: 09:52
it was my primary reason for 512. would dropping to 256 cut the 2.8mb manifest in roughly half? that has compelling arguments given the large number of internet poorly connected counties in the world. (edited)
fastly doesn't charge for the extra download size, so it is purely a client download size issue
we'd have to profile these 2 things, but 256 on 32bit was faster than 512 but on 64 bit 512 is much faster. as opposed to 1.4 or 1.5mb download w/ 256 vs 2.8mb with 512. Not sure which has the biggest wins.
I'm just using some rough numbers from my research and assumptions that 256 would be roughly half the size. @pwolanin (he/him)
pwolanin (he/him) 09:56
@heddn the hash output is half, the file paths may still be long, so I’d expect a roughly 1/3 decrease
@hestenet (he/him) sha512 is “faster” but I doubt it matters for this use case
heddn:mtech: 09:57
got it. so maybe ~2mb instead of ~2.8
pwolanin (he/him) 09:57
I don’t think signify supports 512 truncated to 256 bits, which we discussed at MWDS (edited)
hestenet (he/him) 09:57
Ahhh.. okay
heddn:mtech: 10:01
so "faster" 512 client side vs smaller hash file. which is better? core is the only place this matters. bootstrap is a nominal 60k, so I'd consider most contrib projects a non-issue.
drumm 10:03
These are served by a CDN, so it should be quick worldwide, depending on your local connection.
pwolanin (he/him) 10:03
the shasums only needs to be run once during build, so I think the speed and B/W savings of a million downloads is very compelling to reduce the file size
if nothing else, it’s an environmental issue
heddn:mtech: 10:05
we run the preflights in cron. the sums file is locally cached.
so it is run every 6 hours by default
B/W of 1M downloads vs 4 times daily sha256 :shrug: seems like a toss-up (edited)
preflight informs a site owner if they are ready to run an update. one of those checks is if the site has any modified files for any and all projects, including core (edited)
pwolanin (he/him) 10:08
@heddn why would you re-download them repeatedly?
or you mean you are running the hashes 4x daily?
heddn:mtech: 10:09
@pwolanin (he/him) the later
its a one time download for each version of core or a module. and we could (eventually) package it w/ the actual code itself, no?
pwolanin (he/him) 10:11
yes
heddn:mtech: 10:11
vs running that hash check multiple times a day
pwolanin (he/him) 10:11
so the web says on a 64 bit machine, sha512 can be typically 1.5x faster
heddn:mtech: 10:11
and most sites I'd assume run on 64 bit linux
pwolanin (he/him) 10:12
maybe?
but could be a VM, so who knows how it really performs (edited)
locally in Docker it’s like 12 sec vs 17 sec to run the cli utility on core files
heddn:mtech: 10:13
yeah, I don't know. it seems like there's trade offs with either approach. and not a really good way to decide. I take it you prefer 256 @pwolanin (he/him)? I'm fine w/ switching if that's what we prefer.
pwolanin (he/him) 10:13
wiat, I lied - these numbers are all over
anyhow

heddn’s picture

Issue summary: View changes
David Strauss’s picture

SHA512 is faster to compute, but the hashes it produces are excessively long. I suspect, given our use of PHP for the validations, that working with the longer hash lengths more than negates the performance advantages of computing them. There are standard (and non-standard but still safe) ways to shorten the SHA-512 hash output, but they're not supported consistently by standard utilities (e.g. sha512sum). So, I'd lean toward SHA-256 for the files. Also, OpenBSD's use of Signify seems to center more around SHA-256 despite the most implementations having undocumented support for SHA-512 as well.

heddn’s picture

Issue summary: View changes

I guess I'm more in favor of 512, but not massively. I'd be happy to be convinced 256 is better.

drumm’s picture

I think 512 is okay. The only advantage I can see to 256 is bandwidth.

For bandwidth, https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha512su... is 2,911,975 bytes uncompressed. Moving 256 would save 64 bytes × 14,105 lines, making the new size 2,009,255 bytes. Core just has a lot of files, and long filenames. Depending on the HTTP client, this should be compressed when transferred.

I'd say 1.9M is in the same range as 2.8M to not make much difference for bandwidth.

David Strauss’s picture

Moving 256 would save 64 bytes × 14,105 lines, making the new size 2,009,255 bytes. Core just has a lot of files, and long filenames. Depending on the HTTP client, this should be compressed when transferred.

Because the difference in size comes from the hash lengths (which are effectively random), I don't expect we'll see any benefit from compression on the extra megabyte from using SHA-512. Since the file grows by 0.9M from the 256 to 512 switch, it's likely that 1.8M (2x0.9M) of the 2.8M file is hash content. If the remaining content compresses by 80% (which is a guess but an informed one), we can expect 2M for SHA-512 and 1.1M for SHA-256.

I do think that's meaningful, but I still don't have a super strong opinion given the full situation.

pwolanin’s picture

A side note - we should NOT be using constant time hash comparison, but rather faster string comparison for checking the file hashes, so the time to calculate them (and the php memory to hold that set of strings) are the main things. It would be good to know if there is any real performance difference for file_hash() for the 256 vs 512.

heddn’s picture

We have https://updates.drupal.org/release-hashes/drupal/8.7.3/contents-sha512su.... @drumm could you create a sha256 version so we can do some profiling?

David Strauss’s picture

@pwolanin I doubt there's a meaningful performance difference between the hash types for our purposes here. However, I was mostly referring to the extra overhead in copying around and manipulating the strings, not the time for the final comparison. You're correct that we don't need to do a constant-time comparison. Even if we needed to, we could choose however many bits we cared to compare (over a sensible minimum). We could, say, only compare the first half of the SHA-512 hashes and still be fine. I think the performance improvements wouldn't outweigh the complexity.

I think we should make this decision based on what fits best with the tools we use or want to use and costs like bandwidth. In the case of Signify on BSD, my understanding is that they use SHA-256 for the file checksum lists and that the support for SHA-512 is de facto in the Signify utilities. My understanding for the bandwidth costs is that it's seen as not an issue for the expected difference in size.

drumm’s picture

@drumm could you create a sha256 version so we can do some profiling?

Done, the raw file sizes are:

$ ls -lh */*
bootstrap/8.x-3.20:
total 200K
-rw-rw-r-- 1 bender bender 37K Sep 19 15:37 contents-sha256sums.csig
-rw-rw-r-- 1 bender bender 37K Sep 19 15:37 contents-sha256sums-packaged.csig
-rw-rw-r-- 1 bender bender 58K Aug 31 00:06 contents-sha512sums.csig
-rw-rw-r-- 1 bender bender 58K Aug 31 00:06 contents-sha512sums-packaged.csig

drupal/8.7.3:
total 11M
-rw-rw-r-- 1 bender bender 2.0M Sep 19 15:40 contents-sha256sums.csig
-rw-rw-r-- 1 bender bender 2.3M Sep 19 15:40 contents-sha256sums-packaged.csig
-rw-rw-r-- 1 bender bender 2.8M Aug 31 00:06 contents-sha512sums.csig
-rw-rw-r-- 1 bender bender 3.3M Aug 31 00:06 contents-sha512sums-packaged.csig

drupal/8.7.4:
total 11M
-rw-rw-r-- 1 bender bender 2.0M Sep 19 15:40 contents-sha256sums.csig
-rw-rw-r-- 1 bender bender 2.3M Sep 19 15:40 contents-sha256sums-packaged.csig
-rw-rw-r-- 1 bender bender 2.8M Aug 31 00:04 contents-sha512sums.csig
-rw-rw-r-- 1 bender bender 3.3M Aug 31 00:04 contents-sha512sums-packaged.csig

token/8.x-1.5:
total 56K
-rw-rw-r-- 1 bender bender 8.1K Sep 19 15:41 contents-sha256sums.csig
-rw-rw-r-- 1 bender bender 8.2K Sep 19 15:41 contents-sha256sums-packaged.csig
-rw-rw-r-- 1 bender bender  13K Aug 31 00:06 contents-sha512sums.csig
-rw-rw-r-- 1 bender bender  13K Aug 31 00:06 contents-sha512sums-packaged.csig

(bender is the robot)

heddn’s picture

Could you upload contents-sha256sums-packaged.csig and contents-sha512sums-packaged.csig for 8.7.4? I want to profile the hash compare times on MacOS, a linux laptop and and a 1GB digital ocean droplet.

drumm’s picture

heddn’s picture

FileSize
4.72 MB

Here's a route controller that should let us have various folks on different platforms give us some stats on runtime comparison between SHA256 vs SHA512.

This assumes you have a full download of Drupal 8.7.4. Download this project, apply the patch, composer require all the dependencies, including drupal/php-signify. Then install the module and visit the route automatic-updates/checksum-comparison.

The patch is a bit large, only so it is a simple install of a patch instead of doing lots of manual steps. It contains the sha256 and sha512 files and the root signature for validation.

heddn’s picture

FileSize
4.72 MB

Missing the routing.yml file. Here's the results from a Macbook 2018 running docker (ddev).

modules/automatic_updates/artifacts/contents-sha512sums.csig took 53.453179121017 seconds
modules/automatic_updates/artifacts/contents-sha256sums.csig took 21.417464017868 seconds
modules/automatic_updates/artifacts/contents-sha512sums.csig took 25.940617799759 seconds
modules/automatic_updates/artifacts/contents-sha256sums.csig took 31.243373155594 seconds
modules/automatic_updates/artifacts/contents-sha256sums.csig took 32.105424880981 seconds
modules/automatic_updates/artifacts/contents-sha512sums.csig took 27.207813024521 seconds

To me, it seems that file system caching is more a thing than anything else. Note how the first csig is significantly longer than later. Overall, it seems that it is 1-3 seconds faster to run SHA512 over SHA256 (on average). The 256 file is 0.9MB smaller than 512. On a decently speedy computer w/ decent internet speed, that shakes out to be a wash when you compare the two on a one time basis.

The other aspect to consider is that the hash will run every few hours. So while it might be only 1-3 seconds in one run, it will run several times a day. And the hash won't change for several weeks/months, until the next point release of Drupal is provided and the site is upgraded.

xjm’s picture

heddn’s picture

Status: Active » Reviewed & tested by the community
Issue tags: -Needs performance review

So, some performance testing was already done in #15. And a patch is available for others to test. But the TLDR; is that we're dealing with something that is marginally faster on 64-bit (SHA512) and which is marginally larger to download (SHA512). One could also look at this from the other perspective where SHA256 is marginally slower on 64-bit and marginally smaller to download.

However, based on the thorough explanation by David and Neil in slack, it seems like consensus has built around using SHA256. Based on that conversation and the existing performance numbers, I'm going to suggest RTBC and remove the profiling tag.

xjm 13:47
I think the good reason not to use 512 is performance?
dts 13:49
512 is not meaningfully more secure than 256
xjm 13:49
@catch also ^ -- we are talking about https://www.drupal.org/project/automatic_updates/issues/3077737
drumm 13:49
I think the summary is 512 is actually marginally faster, unless you have a 32 bit processor or something, 256 is less data.
dts 13:50
An attack that would undermine 256 but not 512 isn't very plausible, as they're constructed the same way.
xjm 13:50
I'm going to tag it for performance review
dts 13:50
Specifically, an attack that undermines enough bits of 256 to render it insecure but not the 512 variant is implausible.
xjm 13:51
So speed vs. data/size is the question? (edited)
dts 13:51
Yes
The speed for our purposes is also not meaningfully different, at least in terms of hash computation.
If I could pick anything, I'd pick SHA-512/256 or SHA-512/224, but those aren't supported well across tools.
(Those variants would allow us to maximize both hashing performance and brevity.) (edited)
There is one other thing: I think OpenBSD uses SHA-256 in their Signify files, and I think the SHA-512 support in Signify is slightly less official (though broadly available).

However, the signing algorithm Signify uses relies on SHA-512, so any Signify implementation also has to contain a SHA-512 implementation (though it may not get wired up to support the file checksums).
So, I don't think this is a major factor. It's more of a factor where "all other things being equal, let's do what OpenBSD did because we're using a utility/format from their ecosystem."
:upvote:
1

dts 14:24
So, my vote is for SHA-256 here, despite not loving it in general.
It's what OpenBSD is using for their files: https://www.openbsd.org/papers/bsdcan-signify.html
And it has the edge in bandwidth and string comparison overhead.
SHA-512 is not meaningfully more secure or faster for our purposes, so its strengths aren't compelling here.
We can also switch later, if we ever want to. It's not a closely coupled part of the design to other parts.
drumm 14:29
Once Drupal.org is providing the API in a production capacity, that’s essentially locking it it. We’ll pretty much always have to provide what we provide.
dts 14:30
I think we can particularly switch later because the hash formats are explicit in the tagged (BSD style) format.
It would only break compatibility with a client that only supports/implements SHA-256 for some reason
Our client should support both, and most utilities that can handle the tagged checksum format do as well (edited)
That said, ability to change later isn't a major factor in why I think SHA-256 is a very slightly better choice here.
drumm 15:06
256 works for me.
I know what we need to do

dstol’s picture

Status: Reviewed & tested by the community » Fixed

Looks like the consensus agreement is SHA-256. Marking as fixed.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.