Problem/Motivation
In #3053199: Add SHA256SUMS file of all files, there was a decision to use SHA512 hashing. There was some logic put forth at the time. We'd like to rediscuss this decision.
This is mainly an issue with the core hash file. With SHA512 it comes in at roughly ~2.8mbs. Contrib projects, even large ones like bootstrap are nominal in size (bootstrap ~60k w/ SHA512). If we used SHA256 we'd cut the size by roughly 1/3, let's say around 1.9mbs.
There's the larger download size with SHA512. But then processing time for hash comparison takes longer client side. Some research shows 25% or 50% slower w/ SHA256. And this project encourages sites to run cron at least every 6 hours and validate the site is always ready for an automatic update. So we'd be running a slower SHA256 comparison roughly 4 times a day.
Another consideration, using truncated 512 hashes does not seem to be supported with signify.
Proposed resolution
Some research/resources: https://medium.com/@davidtstrauss/stop-using-sha-256-6adbb55c608
Remaining tasks
Decide which is preferred; SHA512 or SHA256.
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#15 | 3077737-15.patch | 4.72 MB | heddn |
Comments
Comment #2
heddnSlack conversation:
Comment #3
heddnComment #4
David StraussSHA512 is faster to compute, but the hashes it produces are excessively long. I suspect, given our use of PHP for the validations, that working with the longer hash lengths more than negates the performance advantages of computing them. There are standard (and non-standard but still safe) ways to shorten the SHA-512 hash output, but they're not supported consistently by standard utilities (e.g. sha512sum). So, I'd lean toward SHA-256 for the files. Also, OpenBSD's use of Signify seems to center more around SHA-256 despite the most implementations having undocumented support for SHA-512 as well.
Comment #5
heddnI guess I'm more in favor of 512, but not massively. I'd be happy to be convinced 256 is better.
Comment #6
drummI think 512 is okay. The only advantage I can see to 256 is bandwidth.
For bandwidth, https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha512su... is 2,911,975 bytes uncompressed. Moving 256 would save 64 bytes × 14,105 lines, making the new size 2,009,255 bytes. Core just has a lot of files, and long filenames. Depending on the HTTP client, this should be compressed when transferred.
I'd say 1.9M is in the same range as 2.8M to not make much difference for bandwidth.
Comment #7
David StraussBecause the difference in size comes from the hash lengths (which are effectively random), I don't expect we'll see any benefit from compression on the extra megabyte from using SHA-512. Since the file grows by 0.9M from the 256 to 512 switch, it's likely that 1.8M (2x0.9M) of the 2.8M file is hash content. If the remaining content compresses by 80% (which is a guess but an informed one), we can expect 2M for SHA-512 and 1.1M for SHA-256.
I do think that's meaningful, but I still don't have a super strong opinion given the full situation.
Comment #8
pwolanin CreditAttribution: pwolanin at SciShield commentedA side note - we should NOT be using constant time hash comparison, but rather faster string comparison for checking the file hashes, so the time to calculate them (and the php memory to hold that set of strings) are the main things. It would be good to know if there is any real performance difference for
file_hash()
for the 256 vs 512.Comment #9
heddnWe have https://updates.drupal.org/release-hashes/drupal/8.7.3/contents-sha512su.... @drumm could you create a sha256 version so we can do some profiling?
Comment #10
David Strauss@pwolanin I doubt there's a meaningful performance difference between the hash types for our purposes here. However, I was mostly referring to the extra overhead in copying around and manipulating the strings, not the time for the final comparison. You're correct that we don't need to do a constant-time comparison. Even if we needed to, we could choose however many bits we cared to compare (over a sensible minimum). We could, say, only compare the first half of the SHA-512 hashes and still be fine. I think the performance improvements wouldn't outweigh the complexity.
I think we should make this decision based on what fits best with the tools we use or want to use and costs like bandwidth. In the case of Signify on BSD, my understanding is that they use SHA-256 for the file checksum lists and that the support for SHA-512 is de facto in the Signify utilities. My understanding for the bandwidth costs is that it's seen as not an issue for the expected difference in size.
Comment #11
drummDone, the raw file sizes are:
(bender is the robot)
Comment #12
heddnCould you upload
contents-sha256sums-packaged.csig
andcontents-sha512sums-packaged.csig
for 8.7.4? I want to profile the hash compare times on MacOS, a linux laptop and and a 1GB digital ocean droplet.Comment #13
drummThose are available provisionally in production, same URLs as #3053199-52: Add SHA256SUMS file of all files, s/512/256/, like https://updates.drupal.org/release-hashes/drupal/8.7.4/contents-sha256su...
Comment #14
heddnHere's a route controller that should let us have various folks on different platforms give us some stats on runtime comparison between SHA256 vs SHA512.
This assumes you have a full download of Drupal 8.7.4. Download this project, apply the patch, composer require all the dependencies, including
drupal/php-signify
. Then install the module and visit the routeautomatic-updates/checksum-comparison
.The patch is a bit large, only so it is a simple install of a patch instead of doing lots of manual steps. It contains the sha256 and sha512 files and the root signature for validation.
Comment #15
heddnMissing the routing.yml file. Here's the results from a Macbook 2018 running docker (ddev).
To me, it seems that file system caching is more a thing than anything else. Note how the first csig is significantly longer than later. Overall, it seems that it is 1-3 seconds faster to run SHA512 over SHA256 (on average). The 256 file is 0.9MB smaller than 512. On a decently speedy computer w/ decent internet speed, that shakes out to be a wash when you compare the two on a one time basis.
The other aspect to consider is that the hash will run every few hours. So while it might be only 1-3 seconds in one run, it will run several times a day. And the hash won't change for several weeks/months, until the next point release of Drupal is provided and the site is upgraded.
Comment #16
xjmComment #17
heddnSo, some performance testing was already done in #15. And a patch is available for others to test. But the TLDR; is that we're dealing with something that is marginally faster on 64-bit (SHA512) and which is marginally larger to download (SHA512). One could also look at this from the other perspective where SHA256 is marginally slower on 64-bit and marginally smaller to download.
However, based on the thorough explanation by David and Neil in slack, it seems like consensus has built around using SHA256. Based on that conversation and the existing performance numbers, I'm going to suggest RTBC and remove the profiling tag.
Comment #18
dstolLooks like the consensus agreement is SHA-256. Marking as fixed.