Problem/Motivation

At some point in the Drupal 7 cycle, a non-formal policy was introduced of not using PHP's md5() or sha1() functions.

This is being questioned for non-security-related use cases.

See #1807662-115: Built-in APCu support in core (PHP 5.5 only) through #1807662-125: Built-in APCu support in core (PHP 5.5 only).

There are two concrete technical cases where this policy has caused technical regressions to be committed to core or proposed in issues:

https://drupal.org/node/2268875#comment-8792075

#2224847-26: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend md5() is a 32 character hash. On that issue we were discussing keeping as much of the cid as possible, then hashing the rest). A 64 char length requirement for the hash used either limits the length of the cid that can be preserved, or doubles the index requirements for a dedicated column. It's not about which hash is 'faster'. So following this rule without considering context results in solutions that are worse.

The other is #723802-92: convert to sha-256 and hmac from md5 and sha1 #723802-94: convert to sha-256 and hmac from md5 and sha1: - our password implementation is no longer a proper implementation of PHPASS since the hashes are only portable in one direction (of course you can argue that also 'increases adoption' since of course they're portable to Drupal but not away from it).

Proposed resolution

Articulate a formal policy.

Remaining tasks

TBD

User interface changes

None.

API changes

None.

Comments

Gábor Hojtsy’s picture

Well, see #723802: convert to sha-256 and hmac from md5 and sha1.

FIPS 180-3 and FIPS 198-1 require transition to sha-2 family functions for use in US government applications in 2010. Let's update Drupal 7 to use these more secure algorithms so that Drupal is not excluded from being used for federal sites.

Gábor Hojtsy’s picture

Also from #46 there:

The goal is indeed to have grep -r "md5(" * return nothing on core - and ideally nothing on D7 contrib modules. It's not a silly goal because this will greatly ease adoption by government agencies and large companies that need to meet those standards. Discussing this patch with people who have to meet those standards, it doesn't matter whether a particular use of md5/sha1 is relevant to the FIPS standard - even if it's not you have to spend time documenting why it's not, how it won't be in the future, etc, and that will make it harder to get Drupal adopted. Hence, I plan to continue this patch until that grep only returns 1 result - in the password hashing code.

catch’s picture

I was close to quitting core development when this policy originally went in. Every time it comes up in an issue reminds me how ridiculous it is.

I frankly could not care less about the 'increased adoption' - we do not have an adoption problem for Drupal core, we do have a tonne of technical problems which when this policy comes up it only hampers.

Additionally, the 'increased adoption' has not remotely been proved. Having md5() in the code base does not mean 'can never be used by US public bodies', it just means you might have to do some extra paperwork, poor US government contractors :(

Not only that, but:

~/7.x catch$ grep -r "md5(" *
includes/password.inc:    // have 'U' added as the first character and need an extra md5().
includes/password.inc:    $password = md5($password);
includes/update.inc:  return 'SESS' . md5($session_name);
modules/filter/filter.module:    $hash = md5($content);
modules/simpletest/tests/password.test:    $account = (object) array('name' => 'foo', 'pass' => md5($password));
scripts/generate-d6-content.sh:  $pass = md5("test PassW0rd $i !(.)");
scripts/generate-d7-content.sh:  $pass = md5("test PassW0rd $i !(.)");

Was never completely removed from 7.x in the first place, very much still there.

Damien Tournoud’s picture

FIPS/NIST recommendations and requirements obviously only apply to cryptographic applications. You are still free to use any hash function (even modulo or CRC) for non-cryptographic applications.

Which means that HMAC should never use MD5, but straight hashing for non-cryptographic uses can.

xjm’s picture

Title: Policy on using md5()/sha1() in non-security contexts » [Policy, no patch] Using md5()/sha1() in non-security contexts
pwolanin’s picture

We made this change for Drupal 7 for several reasons. I am strongly opposed to changing the current policy and coding standard which is to use sha-256 or sha-512 for everything for Drupal 7+.

1) The average developer cannot distinguish what is or is not a security context. Thus, as a coding standard we must adopt a "secure by default" posture, and it significantly lessens the potential for needless debate in core issues about what hash is appropriate to use.

2) The US and other governments and security-conscious (or paranoid) enterprises forbid or will less readily adopt software using md5 or sha1. Randomly using them for no actual benefit impedes Drupal adoption.

3) Related to #2 it makes security audits looking at hash functions at least nosier and hence more expensive and also impedes Drupal adoption.

4) There is no significant performance impact, especially on 64 bit hardware.

Current symfony has removed md5 and sha1 as of 9 months ago and is in the 2.4 branch: https://github.com/symfony/symfony/pull/8609

pwolanin’s picture

@catch - most of the uses you found are in upgrade or migration code, but not used during normal site operation. If there are any that are not we should fix those.

greggles’s picture

To say that Drupal doesn't have an adoption problem is a bit myopic. Drupal is where it is today because regardless of the scale at that time people weighed adoption as part of pros/cons of new ideas. In other words: if you have a job working with Drupal today, among the people you can thank is someone who piped 10 years ago and said "X will help adoption, let's do it!" Further, there has been a lot of adoption of Drupal by USA federal government websites which is a virtuous circle that drives investment into improving Drupal by those organizations. Do we really want to say we don't care about that? I sure don't.

re #3 - can you clarify how this policy makes core's technical problems harder to solve?

I agree very much with pwolanin's point in #7 that adopting a strong hash across the board will make it harder for someone to accidentally introduce a mistake. Even on very large/big budget sites there is a fair amount of "copy what core does." Hashing is an area that is not well understood, so people are unlikely to know the difference between md5 and sha-256 the mistake can be costly. I agree that standardizing (e.g. making it part of the coder rules, for example) on *not* using md5/sha1 then we increase the likelihood that all Drupal sites will be safe.

For the people who want to use md5: I don't see a benchmark comparing md5 to sha-256 inside of the context of a page request. Unless there's a tight loop hashing a ton of things, the md5/sh2 performance difference is noise. I believe we should use the stronger hash until/unless it's a clear performance problem. And then when it is a clear performance problem, we should add comments explaining why this use of md5 is OK in this instance with a link to the policy on hash functions.

dstol’s picture

As someone who has had to patch core to remove md5() usages, I'll tell you why. It's some security person's job to run some incredibly expensive security scan suite to just produce a report for C&A. These suites flag these types of things as critical regardless of context.

While I totally with #4, it's sometimes not obvious to non-technical stakeholders what any of that means. All they see is their security team sent a report, generated with a very expensive piece of software, with a critical issue.

Sometimes these organizations have misguided policies like "no usage of md5/sha1." There is no exception for non-crypto usages. And in cases where you pursue exceptions, it becomes CIO or higher level approval, with justifications, paperwork, weeks to months of headaches. The primary stakeholder doesn't want their project on the CIOs desk for any reason let alone something that says exemption from a critical security issue.

pwolanin’s picture

Issue summary: View changes
alexpott’s picture

Assigned: Unassigned » Dries
Issue summary: View changes

So the plus of not using md5 or sha1 anywhere is that if someone copies and pastes and the new use case is security related then we do not introduce a new security hole. The minus is that it takes away the developer's autonomy of using which ever hash function they like.

Apart from not wanting to be told how to code by misguided policies like "no usage of md5/sha1" - why should we care? To me the only possible reason would be performance. So I've done some investigation.

I've hashed every single yaml file in core using the following script https://gist.github.com/alexpott/202a520d3bec69a8d1d6

MD5 in milliseconds

[min] => 0.0000
[max] => 0.0708
[mean] => 0.0026
[median] => 0.0012
[95th] => 0.0079

SHA512 in milliseconds

[min] => 0.0010
[max] => 0.1352
[mean] => 0.0052
[median] => 0.0029
[95th] => 0.0160

So yes MD5 is twice as quick as SHA512 but as @greggles points out at these speeds this is just going to be noise with respect to the rest of the request time.

Interestingly SHA256 is slower...

SHA256 in milliseconds

[min] => 0.0010
[max] => 0.2069
[mean] => 0.0069
[median] => 0.0031
[95th] => 0.0239

I'm assigning this to Dries to make a decision for the project. Technically I think the requirement to not use md5 as a blanket rule is silly. But as a decision for the project I think it is savvy. One of the goals of Drupal is to be easy to use and to adopt. We should accept that people are going to run security scanning tools over core and these will throw up warnings about the use of md5 and sha1. I think that minimising such warnings is in our best interest since it makes it easier for people with less technical knowledge choose Drupal. Furthermore, it costs us nothing.

Crell’s picture

FWIW, I agree with #9/#11/#12.

The people who will reject Drupal "because it uses md5 and md5 is insecure" are people who don't know the difference between a "cryptographic context" and a non-cryptographic context. That is, the kind of people who are responsible for 6- and 7-figure implementation budgets but are not themselves technical. Which is increasingly the people making the call about what CMS to use.

"Make doing the right thing easy" is a key principle of UX and DX. That means don't make people think about "cryptographic context" or not if we don't have to, especially when the cost to being cautious-by-default is so low per #12.

Technically I think the requirement to not use md5 as a blanket rule is silly. But as a decision for the project I think it is savvy.

^^ This.

Dries’s picture

There are a lot of reasons why it is valuable for Drupal to succeed in government and why we should consider to make certain trade-offs:

  1. We care about government adoption because they too host or sponsor Drupal Camps around the world.
  2. We care about government adoption because they contribute modules back to Drupal.
  3. We care about government adoption because we can provide visually impaired people a better experience.
  4. We care about government adoption because they directly and indirectly pay for core development.
  5. We care about government adoption because them adopting Drupal encourages tens of thousands of other organizations to adopt Drupal.
  6. We care about government adoption because we don't want them to waste our tax money on crappy and expensive proprietary software.
  7. We care about government adoption because some of the most awesome contributors in our community are employed by the government; we want to look after them.
  8. We care about government adoption because many of us worked incredibly hard to get Drupal adopted in the government.
  9. Last but not least, we care about government adoption because they use Drupal to better the lives of its citizens around the world. We can help them be more efficient at that.

Also, this "problem" is not limited to government; other organizations use scanners like this.

While it is not "technically pure", this seems like an easy change with a lot of upside if you consider the big picture.

pwolanin’s picture

@Dries - from my perspective we made the change already in Druapl 7.x four years ago, but it seems we need to affirm and document that this is an official core policy.

greggles’s picture

#14 is reasons to make a tradeoff. But there isn't any material tradeoff here.

#15 it is documented. I guess the pro-md5 crowd wants that documentation re-affirmed?

catch’s picture

@greggles #9 there are two cases where this has had a negative technical impact (that I can immediately think of), one was in the original issue too.

#3 was referring to #2224847-26: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend. md5() is a 32 character hash. On that issue we were discussing keeping as much of the cid as possible, then hashing the rest). A 64 char length requirement for the hash used either limits the length of the cid that can be preserved, or doubles the index requirements for a dedicated column. It's not about which hash is 'faster'. So following this rule without considering context results in solutions that are worse.

The other is #723802-92: convert to sha-256 and hmac from md5 and sha1 #723802-94: convert to sha-256 and hmac from md5 and sha1 (and etc.) - our password implementation is no longer a proper implementation of PHPASS since the hashes are only portable in one direction (of course you can argue that also 'increases adoption' since of course they're portable to Drupal but not away from it).

This was documented on the original issue, although largely glossed over (and has not been mentioned here either). This is not and has never been a neutrally technical policy - it has always had a negative impact on the actual software since the day it went in, so constantly claiming that there is 'no material trade-off' is demonstrably false.

@pwolanin's #8, if the objective of the policy is so that automated security scanners don't ever find md5() in core, then surely if grep finds md5() that means it has not met that objective. If we can explain why it's used on those contexts, then we can also explain in other contexts where it makes sense.

@Dries which governments? All of them? What if they have policies that contradict each other?

catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
catch’s picture

Issue summary: View changes
chx’s picture

Oh, FIPS! Everyone loves FIPS.

I notice nobody claims that there's any intrinsic value to FIPS mode. It's widely recognized as a worthless checkbox; now it's time to stand up to the clowns in charge and tell them the same thing. It's funny to compare how many people like to quote Gandhi's "Be the change that you wish to see in the world." with how few people actually like to be the change.
LibreSSL - FIPS mode is not coming back

pwolanin’s picture

@catch - MD5 is known to be somewhat flawed as a hash function - not as collision-resistant as it should be.

If we want a shorter hashed string, we can should a substring of sha-256 or sha-512. That is the technically better solution also. e.g. see:

http://crypto.stackexchange.com/questions/9435/is-truncating-a-sha512-ha...

compared to using sha-1, "truncating one of the SHA-2 functions to 160 bits is around 2^20 times stronger when it comes to collision resistance."

pwolanin’s picture

Regarding the code to support importing PHPASS passwords using md5, since the password class is readily swappable as a service, we could move that support to contrib. I'd be curious to know if anyone has seen a use in production Drupal 7.

pwolanin’s picture

@chx "FIPS mode" in openssl is not the same as the FIPS standards in general. The rest of that comment indicates it was satisfied by forcing the use of outdated algorithms in openssl.

MD5 and SHA1 have been considered less collision resistant than they should be for 9 years or more. So, it's not the best technical solution to use them for anything new.

pwolanin’s picture

sha1 is also being actively deprecated even though it's not as broken as md5:

https://konklone.com/post/why-google-is-hurrying-the-web-to-kill-sha-1

https://www.schneier.com/blog/archives/2013/11/microsoft_retir.html

yes, those relate to security uses, but again there is absolutely no technically correct or performance reason to use md5 or sha1, so we should never use them in Drupal code.

effulgentsia’s picture

So, the issue summary says:

There are two concrete technical cases where this policy has caused technical regressions to be committed to core or proposed

The first is about wanting to use 32 character hashes for cache IDs. And #22 answers that we can truncate SHA-512 to achieve that, and that doing so offers superior collision resistance than a 32 character md5.

The second is about wanting Drupal to play nicely with others by default: in other words, store hashed passwords that can be migrated to non-Drupal systems that implement PHPASS, and that SHA-512 isn't one of the algorithms implemented by default in http://www.openwall.com/phpass/ implementations. Two questions about that:

  • Is that still true? Perhaps by now, SHA-512 has become more standard?
  • If it is true that SHA-512 is still not included in standard implementations of PHPASS, then what about switching D8's default PHPASS generation to CRYPT_BLOWFISH as originally requested in #723802-94: convert to sha-256 and hmac from md5 and sha1? That was rejected for D7 due to the PHP 5.3 requirement, but that's not a problem for D8.

there is absolutely no technically correct or performance reason to use md5 or sha1, so we should never use them in Drupal code

What's the suggestion for how to remove the remaining usages of md5()? Such as:

  • \Drupal\Component\Diff\Engine\DiffEngine::_line_hash(): this is a copy of an old algorithm/library. Do we fork it, try to fix upstream, or what?
  • \Drupal\Core\Password\PhpassHashedPassword::check(): the md5() in there is to allow user authentication against passwords migrated from a D6 site. Do we move that to a contrib project, so if you want to migrate a site from D6 to D8 (or from D7 to D8 where the D7 site was itself at one time updated from D6) you then also need a contrib module? Doable, but then wouldn't this be a case of creating a worse site building experience just to satisfy an overly draconian security audit?
  • PHPUnit_Framework_MockObject_Generator::generate() (and other vendor libraries): do we need to get all our vendor projects to comply with the "no md5" policy in order for Drupal to comply with automated security audits? Will #1475510: Remove external dependencies from the core repo and let Composer manage the dependencies instead affect that answer?

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

pwolanin’s picture

Title: [Policy, no patch] Using md5()/sha1() in non-security contexts » [Policy, no patch] Using md5()/sha1()/crc32b in Drupal code
pwolanin’s picture

Status: Active » Closed (duplicate)

made a new issue in the coding standards queue: #2833544: [Policy, no patch] Reflect secure coding policy in coding standard of only using Crypt::hashBase64() for hashes - I think restarting in that frame may be a better idea and this issue is a bit stale.