FIPS 180-3 and FIPS 198-1 require transition to sha-2 family functions for use in US government applications in 2010. Let's update Drupal 7 to use these more secure algorithms so that Drupal is not excluded from being used for federal sites.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pwolanin’s picture

Discussed this over e-mail with Solrdiz. In terms of interoperability and security there is actually no reason to replace the core password hashing (stretched, salted MD5) especially since an alternate .inc file can be substituted which uses a stronger password storage mechanism.

Switching to sha-2 isn't technically important in all cases, so we might consider the NIST standard is as much political as technical.

grendzy’s picture

Title: Require hash module and convert to sha-256 and hmac from md5 and sha1 » convert to sha-256 and hmac from md5 and sha1
Status: Active » Needs review
Issue tags: +Security improvements, +Security Advisory follow-up
FileSize
809 bytes

We already require PHP 5.2 (see INSTALL.txt), so this doesn't introduce any new system requirements.

This patch converts drupal_get_token to use HMAC-SHA256. This function is used for many security-related purposes, for example preventing cross-site request forgeries.

The hash is base64-encoded, which makes it one third shorter than a hex string.

pwolanin’s picture

Looks like a good starting place. PHP can be compiled without the hash module, so we should check for it in system_requirements.

pwolanin’s picture

Looks like a good starting place. PHP can be compiled without the hash module, so we should check for it in system_requirements.

pwolanin’s picture

pwolanin’s picture

Hmm - actually patch above has a serious flaw - it does not include the session ID in the hmac.

pwolanin’s picture

FileSize
4.07 KB

Status: Needs review » Needs work
Issue tags: -Security improvements, -Security Advisory follow-up

The last submitted patch, sha256_723802-7.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
Issue tags: +Security improvements, +Security Advisory follow-up

#7: sha256_723802-7.patch queued for re-testing.

pwolanin’s picture

FileSize
6.88 KB

need drupal_random_bytes() available when session.inc is loaded - so move it to bootstrap.inc.

note we should improve session generation due to: http://seclists.org/fulldisclosure/2010/Mar/519

pwolanin’s picture

interestingly 'hash' is already a required extension in system_requirements() - so that work is done unless we want to double-check that 'sha256' is an available algorithm.

pwolanin’s picture

FileSize
14.58 KB

This converts more, adds a helper function to make a base64 encoded sha256, and adds a performance enhancement to drupal_random_bytes() after I noticed that every call takes ~1 ms locally due to the system I/O reading from /dev/urandom. Hence, we can usually avoid doing that more than once per page by reading an extra 1000 bytes.

jbrown’s picture

Great!

See also #761820: Warning: getmypid() has been disabled for security reasons in drupal_random_bytes()

getmypid() should not be used for the following reasons:

  • it isn't always available
  • it is a bad source of entropy
  • it requires a system call
  • it isn't necessary as mt_rand() is self seeding
  • it definitely isn't necessary if we are using /dev/urandom

Sometimes you are using drupal_random_bytes(55) and sometimes drupal_random_bytes(64) - the rationale behind this needs to be documented.

pwolanin’s picture

There have been issues with the robustness of mt_rand(), etc, but I'd agree that we don't want to do a system call every page load that's usually not needed.

55 vs 64 is a bit random. Solrdiz suggested 55 was faster for md5 due to the internals of the PH md5 implementation. No reason necessarily to use it - but unclear if we are better off using some value that is not an a multiple of the block size for the hash function.

pwolanin’s picture

FileSize
14.69 KB

This patch removes the system call.

jbrown’s picture

Thanks!

I'd prefer it if $random_state was only initialised if it is going to be used - no point in wasting cycles for the preferred case of /dev/urandom.

pwolanin’s picture

@jbrown - I don't think any further micro-optimization in going to help much there. Also, avoiding a couple function calls is not necessarily of value compared to keeping the code flow simple.

jbrown’s picture

Okay - thats fine! :-)

pwolanin’s picture

FileSize
14.7 KB

base64 encodes the session IDs also.

Status: Needs review » Needs work

The last submitted patch, sha256_723802-19.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
28.36 KB

Further updates and moved a couple more little functions to bootstrap.inc to they are available to session.inc

aggregator is going to be a bit of a pain to update - probably just have to force all feeds to be re-fetched.

pwolanin’s picture

On the hosts where getmypid() is disabled - does function_exists('getmypid') return TRUE or FALSE? Solardiz sugested to me in e-mail that while not a great source of entropy, it's still valuable to have it. Please see this comment: http://drupal.org/node/29706#comment-746593

Quick local testing also suggests this call is about as fast as a normal function call, so not really a worry in terms of performance.

pwolanin’s picture

FileSize
28.44 KB

Some changes to the RNG generation function per e-mail with Solardiz.

Damien Tournoud’s picture

Migrating all the md5() calls to sha256() doesn't really make sense in term of security (most of those are used as non-cryptographic hashes), but moving to drupal_hash_base64() is a nice clean-up, because it is easier to understand and that base64 is shorter then the standard hex-encoding.

In that case, let's migrate *all* those:

-  return empty($drupal_hash_salt) ? sha1(serialize($databases)) : $drupal_hash_salt;
+  return empty($drupal_hash_salt) ? hash('sha256', serialize($databases)) : $drupal_hash_salt;
-    $key = md5(drupal_random_bytes(64));
+    $key = hash('sha256', drupal_random_bytes(64));
-    'value'    => sha1(drupal_random_bytes(64)),
+    'value'    => hash('sha256', drupal_random_bytes(64)),
-    $cid = 'menu_item:' . md5($path);
+    $cid = 'menu_item:' . hash('sha256', $path);
-  return 'links:' . $menu_name . ':tree-data:' . $GLOBALS['language']->language . ':' . md5(serialize($data));
+  return 'links:' . $menu_name . ':tree-data:' . $GLOBALS['language']->language . ':' . hash('sha256', serialize($data));
-      $tree_cid = 'links:' . $item['menu_name'] . ':subtree-data:' . md5(serialize($data));
+      $tree_cid = 'links:' . $item['menu_name'] . ':subtree-data:' . hash('sha256', serialize($data));

Some additional remarks:

+      $desired_bytes = ($count <= 4096) ? 4096 : $count;

What about min($count, 4096)?

-  $private_key = drupal_get_private_key();
-  // A single md5() is vulnerable to length-extension attacks, so use it twice.
-  // @todo:  add md5 and sha1 hmac functions to core.
-  return md5(drupal_get_hash_salt() . md5(session_id() . $value . $private_key));
+  return drupal_hmac_base64(session_id() . $value, drupal_get_private_key() . drupal_get_hash_salt());

I would move session_id() into the key.

+    // Less random sessions (which are much faster to generate) are used for
+    // anonymous users than are generated in drupal_session_regenerate() when
+    // a user becomes authenticated.
+    session_id(drupal_hash_base64(uniqid(mt_rand(), TRUE)));
vs.
+    $session_id = drupal_hash_base64(uniqid(mt_rand(), TRUE) . drupal_random_bytes(55));

What makes you think that 4 bytes of entropy + microsecond for session IDs is not enough?

pwolanin’s picture

@Damien - posts such as http://seclists.org/fulldisclosure/2010/Mar/519 suggest that we don't always get the entropy we expect from, e.g., uniqid(). I'd rather invoke some overkill than have to do a SA later.

pwolanin’s picture

FileSize
28.49 KB

Fixed most of the suggested changes in #24 - might be a couple more needed.

andypost’s picture

I think it's a bad idea to change behaviour for actions, especially with:

-  return db_query("SELECT aid FROM {actions} WHERE MD5(aid) = :hash AND parameters <> ''", array(':hash' => $hash))->fetchField();
+  $result = db_query("SELECT aid FROM {actions} WHERE parameters <> ''")->fetchAll(PDO::FETCH_ASSOC);
+  foreach ($result as $row) {
+    if (drupal_hash_base64($row['aid']) == $hash) {
+      $aid = $row['aid'];
+      break;
+    }
+  }
+  return $aid;

Why not leave actions hash as is? This could simplify patch a lot.

grendzy’s picture

As I see it, as a political move, it would be nice to eliminate all md5() calls in core. Sure, lots of people will say md5 is a perfectly valid way to generate non-cryptographic identifiers. And there's nothing wrong with that.

On the other hand, anytime someone wants to audit Drupal, all the md5 calls will need to be evaluated individually to determinate if their use is cryptographic in nature or not. Much better IMO to always use a secure hash, and remove all doubt.

pwolanin, looking forward to reviewing this in detail soon!

andypost’s picture

My opinion about actions is related to database level and currrent API implementation for actions and triggers. This patch changes a form generation and tests - this as a serious API change.

As we talk about possibility to change hashing all over core then algorithm should be swapable. And should not affect a code change.

pwolanin’s picture

@andypost - the really unfortunate existing actions code depends on the DB layer being able to run the hash call. This is bad for any number of reasons. A proper fix would involve adding a new DB column or perhaps we can just remove actions from core - they should not have been added to Drupal 6 imho in this sorry shape.

@grendzy - yes, I agreee 100%. The point is to remove or absolutely minimize the use of non-recommended hash functions in core (and in contrib) so that each one does not have to be audited for each project.

andypost’s picture

@pwolanin I agree exactly about ugly actions code but since we past code-freeze I see no ability to improve it as I wish. Suppose Dries should decide on this.

pwolanin’s picture

I've already been making Dries aware of the issue - I'm pretty hopefully he'll support all the needed changes.

andypost’s picture

Great news! The only problem with actions it's their storage where {actions}.aid is string that could be an ID (for configurable actions) and String (for non configurable) so hash is used to hide this. So I think it's enough to separate this storage but I think it's better to file new issue or find old one where this ugly storage was invented.

Dries’s picture

Yes, I'd support this patch.

pwolanin’s picture

FileSize
28.46 KB

minor change suggested by solrdiz. Need still to tackle aggregator and node/filter modules.

mfer’s picture

Status: Needs review » Needs work

Don't forget about /includes/password.inc. There is md5() in there.

pwolanin’s picture

Status: Needs work » Needs review

@mfer - see above comment #1. That use is the one md5 not changing

mfer’s picture

@pwolanin - I see.

While I do not think performance should hold this up it is something to be aware of. According to test results posted to php.net sha256 is about 2.75 times slower than md5.

pwolanin’s picture

@mfer - well it's a more robust 256 bits of hash versus 128 bits, so I'd expect ~2 fold slower at least.

note, however, that we are almost always hashing short strings. I just re-ran that code from php.net using 512 bytes and the result is:

md5 (raw)                     8.106 microseconds
md5 (hex)                     9.059 microseconds

sha256 (raw)                  15.02 microseconds
sha256 (hex)                  15.974 microseconds

so roughly 2-fold, but given that we are talking about a few calls per per page that's only matter of 10s of microseconds.

mfer’s picture

@pwolanin Thanks for re-running the performance code. I'm starting to evaluate everything based on performance. Even when we think a change is good knowing the performance impact is good. :)

jlambert’s picture

I really think this is a good idea for a number of reasons (and nice work in this thread btw). But I wanted to raise the point that it's really important to be ahead of the curve with security, and while it *is* somewhat slower, Moore's law should make up for that in the d7 lifetime. From a Drupal marketing standpoint, it's really important that we start moving towards more secure encryption libraries. Are y'all know, md5 *has* been hacked, and while impractical, many of the larger clients we've been working with are looking for Drupal to be "one step ahead," and again Moore's law says that cracking md5 is going to be feasible quickly (check the playstation hack to get an idea: http://www.hardwareanalysis.com/content/topic/73234/).

Are you sure it makes sense to rely on urandom for this kind of core patch?

+ // /dev/urandom is available on many *nix systems and is considered the
+ // best commonly available pseudo-random source.
+ if ($fh = @fopen('/dev/urandom', 'rb')) {

It's somewhat optimistic (it's going to work great on *nix, and windows might work with CryptGenRandom, but it doesn't look like NotImplementedError is getting caught, and the function will fail without it). Maybe it's a good place to rely on phprand, even if it's not the best - at least it's cross platform.

catch’s picture

Please remove these from the patch. They are all in the critical path, they all have exactly zero to do with security hardening.

+++ includes/common.inc
+++ includes/common.inc
@@ -2883,7 +2883,7 @@ function drupal_aggregate_css(&$css_groups) {

@@ -2883,7 +2883,7 @@ function drupal_aggregate_css(&$css_groups) {
         if ($group['preprocess'] && $preprocess_css) {
           // Prefix filename to prevent blocking by firewalls which reject files
           // starting with "ad*".
-          $filename = 'css_' . md5(serialize($group['items'])) . '.css';
+          $filename = 'css_' . drupal_hash_base64(serialize($group['items'])) . '.css';
+++ includes/common.inc
@@ -2883,7 +2883,7 @@ function drupal_aggregate_css(&$css_groups) {
@@ -3694,7 +3694,7 @@ function drupal_get_js($scope = 'header', $javascript = NULL) {

@@ -3694,7 +3694,7 @@ function drupal_get_js($scope = 'header', $javascript = NULL) {
     // Prefix filename to prevent blocking by firewalls which reject files
     // starting with "ad*".
     foreach ($files as $key => $file_set) {
-      $filename = 'js_' . md5(serialize($file_set)) . '.js';
+      $filename = 'js_' . drupal_hash_base64(serialize($file_set)) . '.js';
+++ includes/common.inc
@@ -4252,10 +4213,7 @@ function drupal_get_private_key() {
@@ -5139,7 +5097,7 @@ function drupal_render_cache_set(&$markup, $elements) {

@@ -5139,7 +5097,7 @@ function drupal_render_cache_set(&$markup, $elements) {
 function drupal_render_cache_by_query($query, $function, $expire = CACHE_TEMPORARY, $granularity = NULL) {
   $cache_keys = array_merge(array($function), drupal_render_cid_parts($granularity));
   $query->preExecute();
-  $cache_keys[] = md5(serialize(array((string) $query, $query->getArguments())));
+  $cache_keys[] = drupal_hash_base64(serialize(array((string) $query, $query->getArguments())));
+++ includes/menu.inc
@@ -408,8 +408,8 @@ function menu_get_item($path = NULL, $router_item = NULL) {
     // Since there is no limit to the length of $path, but the cids are
-    // restricted to 255 characters, use md5() to keep it short yet unique.
-    $cid = 'menu_item:' . md5($path);
+    // restricted to 255 characters, use sha256 to keep it short yet unique.
+    $cid = 'menu_item:' . drupal_hash_base64($path);

Powered by Dreditor.

moshe weitzman’s picture

Yeah, lets only pay the performance penalty where the crypot actually matters. In other places we just need a comment reassuring the reader that we knowingly leave md5 there.

catch’s picture

Status: Needs review » Needs work

Yep, and that's especially true if we're leaving md5() in the password hashing framework, if the goal here is not be to have grep -r "md5(" * return nothing on Drupal core (and that would be a silly goal), then we should only do it where it makes sense.

mfer’s picture

The goal is to meeting crypto requirements for FIPS 180-3 and FIPS 198-1. Since I have not read these I am not in a position to evaluate in detail. From details in the initial issue it looks like they require the use of sha-2 crypto instead of md5 crypto. We may not be using it for security but it is a crypto algorithm. So, someone want to become more familiar with the requirements to know what we need to change?

pwolanin’s picture

@catch - The goal is indeed to have grep -r "md5(" * return nothing on core - and ideally nothing on D7 contrib modules. It's not a silly goal because this will greatly ease adoption by government agencies and large companies that need to meet those standards. Discussing this patch with people who have to meet those standards, it doesn't matter whether a particular use of md5/sha1 is relevant to the FIPS standard - even if it's not you have to spend time documenting why it's not, how it won't be in the future, etc, and that will make it harder to get Drupal adopted. Hence, I plan to continue this patch until that grep only returns 1 result - in the password hashing code.

The only reason to continue to use it in the password storage is that we iterate it a large number of times (hence it's easy to justify that it's not vulnerable to known md5 attacks), it falls outside the categories in the FIPS requirements, it's replaceable, and (most importantly) the present algorithm was adopted after long discussion as a standard that allows us greater cross-project compatibility (e.g. we can more easily migrate WP to Drupal).

catch’s picture

Switching to sha-2 isn't technically important in all cases, so we might consider the NIST standard is as much political as technical

If it results in pointless increasing the workload in the critical path with no technical justification at all, then it's a silly goal. I can think of stronger words to describe it as well. If you really want to replace md5() in the critical path, replace it with something that's equivalent performance wise.

Drupal doesn't have adoption issues at the moment, it does have performance issues. If someone is stupid or lazy enough to base their choice of software on grep -r "md5" then they'll surely don't give a crap about the password framework being pluggable. If a comment or justification is sufficient there, then it's sufficient for cache keys and aggregated filenames. Since all we do in those four places is run md5($some_stuff) to generate a unique cache key, why not a one-line drupal_cache_key($stuff); wrapper with a comment? Otherwise please split the non-cryptographic changes into a separate issue.

pwolanin’s picture

@catch - the NIST standard is coming into effect this year. So we will very soon face adoption issues for Drupal 6 based on on this issue.

Also, the point is that md5 does not have enough collision resistance - sha256 is the least expensive comparable function we can be replacing it with.

I realize that the goal I'm pushing here is not based on strictly technical merits, but it's rather based on wanting to make Drupal a viable option for more projects in from 2010 and forward.

If you think this is going to cause an actual performance effect, then I accept that this patch should not be considered done until we've done that testing and have quantified the effect.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
32.41 KB

removing a couple more - setting to CNR for testbot.

Status: Needs review » Needs work

The last submitted patch, sha256_723802-49.patch, failed testing.

andypost’s picture

As I see the main issue with SHS - we need to replace all hashes or make them swappable to conform more then US standards.

http://csrc.nist.gov/publications/PubsFIPS.html
FIPS 180-3 Secure Hash Standard (SHS)
FIPS 198-1 The Keyed-Hash Message Authentication Code (HMAC)

Damien Tournoud’s picture

@catch: the difference in performance between md5() and drupal_hash_base64() for small strings is very very tiny. We will be able to know for sure when Peter will have a working patch, but I don't believe that the changes in the critical paths will make any difference.

Plus, it buys us to nice improvements: (1) a longer key space and less possible collisions for those, with (2) a string that is only 30% longer (32 characters to 43), thanks to the base64() compression.

jbrown’s picture

#52 Even more space could be saved by using a binary field for the hash.

pwolanin’s picture

@jbrown - see Damien's comments about consistent usage. Also, we often want to be able to include these hashes in a URL or page content.

catch’s picture

pwolanin's most optimistic micro-benchmarking has it as twice as expensive- 16us vs. 8us, as opposed to the 2 1/2 times on php.net

Assuming that's about right, this means you only need 120 calls to get a 1ms penalty (8 * 120 = 960). My current core install has 39 calls to md5() on the front page. Add some more contextual links and drupal_render() caching and 120 isn't hard to reach. In fact I bumped the nodes per page to 30, and got 70 calls to md5().

Let's assume that drupal_render() caching gets you from core's current 15 requests per second on some pages up to 20 - so 50ms per request, you're then paying a 2% penalty. It's only a small penalty, but it's a pointless one for all practical purposes, and core gets slower 1-2% at a time. Talking to pwolanin in irc now, I'd obviously be fine with non-has cache key generation, as long as it's not slower.

Damien Tournoud’s picture

@catch: it all depends on the size of the hashed data. For short sizes (up to 100 bytes), I find the difference to be unmeasurable. Can you extract the total size of this hashed data?

Damien Tournoud’s picture

I measured the following:

0.080ms penalty for 10kB of data
0.65ms penalty for 100kB of data
6.5ms penalty for 1MB of data
Damien Tournoud’s picture

I suspect we mostly hash small strings, and for those, the cost of the function call is nearly as big as the cost of the hashing itself.

catch’s picture

@Damien: those numbers don't look so bad. It's likely we'd run into a reasonable penalty if we were still using the filter cache in check_markup(), which runs md5() on the whole string, but that's nearly always stuffed into the field cache by text module now instead.

So the main culprit in core is menu_get_item(), due to contextual links. Spoke to pwolanin and we reckon menu_get_item() could check for strlen() before hashing the path - which is going to work 99% of the time given very few paths if any are > 255 chars long.
That'd also be a small improvement in itself.

With both of those out of the picture, I only see a handful of calls when profiling, (mainly drupal_get_token()), which will be negligible whether there's a measurable difference per-call or not, so with the menu_get_item() change I'd be fine with this going in.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
32.75 KB

Adding such a change into this patch. A side-effect is that some other cache back-ends (e.g. memcache) may need to urlencode or hash all keys to insure there is no white space. Apparently memcache module doesn't do that now - I think it's just luck that it works usually. [edit] Indeed - here's bug report and patch for memcache that would hash all keys: http://drupal.org/node/525400

Since the calls to menu_get_item for contextual links are only for admins, I'm a little worried that we are adding unneeded complexity for an admin-only optimization. Or maybe I don't understand how the contextual links work.

catch’s picture

Well the other option for menu_get_item() would be microbenchmarks of md5() vs. drupal_hash_base_64() with some realistic menu paths, Damien's numbers suggest there might not be an appreciable difference.

pwolanin’s picture

FileSize
1.07 KB

@catch - I'm just wondering why contextual links would be available at all for 90% of users?

running hash functions 10^6 times on 32 bytes of hex data, I get average values of:

Results: 
   md5:  2.60 +/- 0.07 microseconds for 1000 32 byte values 1000 times each
  sha1:  2.99 +/- 0.08 microseconds for 1000 32 byte values 1000 times each
sha256:  3.38 +/- 0.21 microseconds for 1000 32 byte values 1000 times each
sha512:  7.60 +/- 0.63 microseconds for 1000 32 byte values 1000 times each

So as Damien suggests, there is a rather modest difference between md5 and sha-256 for short strings.

Adding the base64_encode() + strtr() calls increases the sha256 time to 10.16 +/- 0.36 microseconds, so that's much more of a penalty and suggests that we should perhaps not use them for CID values and the like.

(the code is attached - the output above manually formatted a bit beyond what it provides)

mfer’s picture

For the record, adding and changing features happens because of project requirements and because of technology changes. This change is due to an upcoming change in project requirements for a particular sector. This is not so much political but being targeted at requirements.

+++ modules/aggregator/aggregator.install
@@ -170,10 +170,10 @@ function aggregator_schema() {
-        'length' => 32,
+        'length' => 60,

Why is the hash length to store 60? Shouldn't it be 64?

+++ includes/form.inc
@@ -203,7 +203,7 @@ function drupal_build_form($form_id, &$form_state) {
+      $form_build_id = 'form-' . drupal_hash_base64(uniqid(mt_rand(), TRUE) . mt_rand());

I may have missed this, but why is uniqid used here and a couple other places?

+++ modules/file/file.module
@@ -411,7 +411,7 @@ function file_managed_file_process($element, &$form_state, $form) {
   // Add progress bar support to the upload if possible.
   if ($element['#progress_indicator'] == 'bar' && $implementation = file_progress_implementation()) {
-    $upload_progress_key = md5(mt_rand());
+    $upload_progress_key = mt_rand();
 

Why does the upload progress key move from a hash to a non-hash?

107 critical left. Go review some!

catch’s picture

At one point contextual links were used for all comment edit links, I protested against this and it was rolled back, but they're also used for edit links on node teasers now. I'm very wary about saying "this only affects admins" when we regularly end up unleashing stuff on non-admin users.

There's also the issue here of drupal_render cids, which isn't used much in core but will affect potentially any user, but that doesn't have the strlen() option at all so not sure what we can do there.

pwolanin’s picture

@mfer - For aggregator I was saving the bas64 encoded hash, so it was shorter - but likely making that column 64 and using the hex value per comment above is the better option.

we could likely use microtime() instead of uniqid() - the goal is to combine multiple difficult-to-guess values (pseudo-random ints plus microsecond timing).

I don't see any reason the upload progress key needs to be a hash instead of an int - the hash certainly adds no information.

moshe weitzman’s picture

This is getting crazy IMO. We're not going to slow down the user experience for thousands of our cherished admins so that we can get the string 'md5' out of out codebase. Those benchmarks that pwolanin just posted are quite bad, even for short strings. 15% slower counts as awful, not "modest" in my book. There has to be technical merit in order to justify degrading user experience for this group. We did a whole new admin theme and overlay and contextual links and toolbar for "just admins".

mfer’s picture

@moshe thats 15% slower for that one thing. The numbers above at 1000 32 byte values 1000 times each showed a .78 microsecond difference. That's a tiny change to meet the requirements of a particular sector.

This is not about technical merit to justify a change. This is about meeting project requirements. Yes, there is a performance hit. And, sometimes project requirements don't make sense (at least to us). But, that is to meet project requirements and sometimes changes to do that are good.

catch’s picture

Except it's not meeting the requirements of that sector. It's meeting the requirements of people in that sector who are too lazy to read a code comment to explaining why md5() isn't being used for cryptographic purposes and rely on scanning software to make their decisions about application security.

These are the same people who e-mail the security team about SQL injection for queries which are properly escaped and CRSF for requests which check a token on an almost weekly basis.

They are not the people I want core to go out of its way to cater too, especially when that causes a runtime penalty, however small.

I would far rather contracts were lost on this, and people built their crappy government sites on custom .NET, then came to Drupal 3-4 years later when they can't do what they want with it. Or that the companies going for these contracts had to spend a couple of extra minutes explaining why menu cids aren't a security threat, since they are the ones which will benefit financially from winning those contracts.

Let them bear the cost of their idiocy and sloth, or that of their clients, not all people who run Drupal sites around the world, some of whom really couldn't give a crap about the US government and large corporations.

If it was really necessary to remove md5() from all of core, then password.inc would be included here too, as probably the most security sensitive usage in core. Except that we can justify the usage in password.inc based on technical merit - an argument which far more applies to cache keys.

pwolanin’s picture

FileSize
46.08 KB

per #62, avoid the use of the base64 encode where not needed for values placed in a form or URL.

catch’s picture

I've opened #769598: Remove non-cryptographic uses of md5() from core, any non-crypto changes should be dealt with in a separate issue IMO.

catch’s picture

Status: Needs review » Needs work
pwolanin’s picture

Status: Needs work » Needs review

@catch - frankly I am unclear from this http://csrc.nist.gov/groups/ST/hash/policy.html whether there are any typical uses hash functions where we would not need to convert to sha-2 to comply. We can certainly benchmark the effect of these changes, but I still think the removing md5 everywhere approach is correct in terms of making Drupal 7 easier to adopt.

Status: Needs review » Needs work

The last submitted patch, sha256_723802-69.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
46.86 KB

Need to increase a field length in the {language} table.

catch’s picture

Status: Needs review » Needs work

I would rather not play status pong, but since you're not clear on the requirement either way, that's even more reason to split the issue per #70.

catch’s picture

Also, here's another option #769648: Make common.inc swappable.

pwolanin’s picture

@catch - not clear meaning I don't see any use in core that's excepted from the categories listed. And in any case, md5 is totally deprecated for any use as a hash function. (note again that the password storage is using it essentially as a computational barrier).

So, let me be more clear - I consider #769598: Remove non-cryptographic uses of md5() from core as duplicate, and that this is the correct approach.

I understand that this is a philosophical difference and that it might annoy you and others.

chx’s picture

Version: 7.x-dev » 8.x-dev

This is absolute, total nonsense. This late in the cycle barring OMGWTFBBQ issues (security holes, dire performance problems, horrible UX or DX). we are not

  1. changing schema this late
  2. moving even more and even more code to bootstrap, the poor thing is bloated enough.
  3. adding or modifying APIs

.

pwolanin’s picture

Version: 8.x-dev » 7.x-dev
Status: Needs work » Needs review

@chx - extending a varchar length is trivial change - there's no reason really these aren't all 255.

The code moved to bootstrap is needed for session - we could put logged-in sessions back to being weaker, but see link above about PHP session weaknesses. That is one relatively clear security weakness this patch addresses.

chx’s picture

Status: Needs review » Needs work

Trivial or not, you modify schema and you should not. IF there is a security weakness, address it and leave the rest. If you can convince people md5 is not a problem in password.inc then explaining the non-crypto usage of MD5s elsewhere should be no problem. BTW. if PHP has a security hole then PHP shall address it unless we have a quick and easy way out.

The standards you quote are US government specific -- and are from July/Oct 2008. If it was not superb important during the years of code thaw what's the rush now?

pwolanin’s picture

password.inc is the only place we iterate md5 a large number of times, and is a special case. If we see the value of cross-porject compatibility as something we can sacrifice, I'd be happy to alter that too.

The standard from 2008 goes into effect now - in 2010, hence the rush.

chx’s picture

You did not get what i said: your argument that MD5 is fine in password is a very strong contradiction to the stated goal of getting rid of MD5. Do or do not, there is no 'try'. Cross platform passwords, bleh! Just use a pluggable password inc. I still recommend the 'leave core alone' approach but i can, obviously, be overruled.

pwolanin’s picture

FileSize
17.17 KB

catch asked more more demonstration of the length dependence on hash execution time. Here's a plot - each point is the average of 1,000,000 hash executions (php 5.2.6 on Mac)

You can see that up to ~55 bytes, sha256, sha1, and m5 are pretty similar in execution time. So for menu_get_item it won't make much difference.

As we get to ~400 bytes and greater we see (as above) sha256 is about 2-fold in execution time to md5.

pwolanin’s picture

FileSize
17.14 KB
16.87 KB

comparisons on Ubuntu 8.04 with php 5.2.4 on 32 and 64 bit platforms. On 64 bit (larger cpu too) all these are much faster, and sha-512 and sha-256 are about the same speed (surprisingly). The 32 bit one looks close to my Mac.

More scatter on the 64 bit one - maybe more background processes.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
53.49 KB

@chx - you are right of course, so here's a possible change. Allows us to import phpass passwords or simple md5 passwords (so no HEAD-HEAD issues), but only generates new hashes using sha256.

pwolanin’s picture

FileSize
53.37 KB

Reworked password.inc to avoid the code duplication in #85.

pwolanin’s picture

FileSize
54.66 KB

Some minor cleanup for greater consistency.

mfer’s picture

After a quick review I have some questions...

+++ includes/bootstrap.inc
@@ -622,7 +622,7 @@ function drupal_settings_initialize() {
   $prefix = ini_get('session.cookie_secure') ? 'SSESS' : 'SESS';
-  session_name($prefix . md5($session_name));
+  session_name($prefix . substr(hash('sha256', $session_name), 0, 32));
 }

Why is a 32 character sub string being used? How collision safe is just using this part of the hash?

+++ includes/common.inc
@@ -4294,7 +4255,7 @@ function drupal_get_hash_salt() {
   if (!($key = variable_get('drupal_private_key', 0))) {
-    $key = md5(drupal_random_bytes(64));
+    $key = drupal_hash_base64(drupal_random_bytes(55));
     variable_set('drupal_private_key', $key);
   }

Why was 55 chosen as the number of bytes rather than 64 as it had previously been?

+++ includes/form.inc
@@ -203,7 +203,7 @@ function drupal_build_form($form_id, &$form_state) {
+      $form_build_id = 'form-' . drupal_hash_base64(uniqid(mt_rand(), TRUE) . mt_rand());

Why is mt_rand() used here and not drupal_random_bytes? I don't know that this question applies to this issue since drupal_random_bytes is not new in this patch and the system currently uses mt_rand() here.

+++ includes/menu.inc
@@ -407,9 +407,9 @@ function menu_get_item($path = NULL, $router_item = NULL) {
+    $cid = 'menu_item:' . hash('sha256', $path);

Why is hash called here and not drupal_hash_base64?

+++ includes/password.inc
@@ -16,15 +16,15 @@
-define('DRUPAL_MIN_HASH_COUNT', 7);
+define('DRUPAL_MIN_HASH_COUNT', 5);

Why was the DRUPAL_MIN_HAS_COUNT reduced to 5?

+++ includes/password.inc
@@ -113,19 +113,22 @@ function _password_generate_salt($count_log2) {
+function _password_crypt($algo, $password, $setting) {

API change? Is that allowed at this stage of the game? Or, does this mean we are officially moving this to drupal 8?

+++ includes/session.inc
@@ -206,7 +206,10 @@ function drupal_session_initialize() {
-    session_id(md5(uniqid('', TRUE)));
+    // Less random sessions (which are much faster to generate) are used for
+    // anonymous users than are generated in drupal_session_regenerate() when
+    // a user becomes authenticated.
+    session_id(drupal_hash_base64(uniqid(mt_rand(), TRUE)));

Why is mt_rand() added into the mix here?

+++ includes/session.inc
@@ -284,7 +287,7 @@ function drupal_session_regenerate() {
-    $session_id = md5(uniqid(mt_rand(), TRUE));
+    $session_id = drupal_hash_base64(uniqid(mt_rand(), TRUE) . drupal_random_bytes(55));

Why is drupal_random_bytes added here?

Some of these occur multiple times. So, if you change something be sure to look through it all. :)

110 critical left. Go review some!

pwolanin’s picture

@mfer:

We don't really have to worry about session name collision - it's based on the url. We only hash (I believe) to limit the name to alpha-numeric.

see plots above - 55 bytes is faster than 64 due to quirks of the internals of these hash functions. This is why it was in the original patch, since this quirk was communicated to us by Solardiz. In some cases it's not going to matter, but seemed like using it consistently would encourage contrib to do the same.

There was another issue asking for DRUPAL_MIN_HAS_COUNT to be reduced to speed up testbot runs - seemed easy to add it here.

any function with a leading _ is not officially part of the API, and in any case, this function should not be called from anywhere outside that file.

drupal_random_bytes() added to the session generation to harden it in response to: http://seclists.org/fulldisclosure/2010/Mar/519 and in response to discussion within the security team. Note that we don't really need to harden session generaiton for anonymous users, hence I only added the hardening where needed.

mfer’s picture

Status: Needs review » Reviewed & tested by the community

Did some brief testing of the most recent patch. These are the requests per second.

Head:
/node - 16.47
/node/1 - 15.87

With Patch:
/node - 16.63
/node/1 - 15.69

Any difference is withing the standard deviation of the tests. I toggled on/off features like js/css preprocessing and found everything within the std deviation. From my basic tests and what pwolanin previously did on the hashing algorithms I don't see a performance issue here.

The code looks good, the test pass, and I tested drupal with the patch and it seems to work fine. RTBC.

pwolanin’s picture

Status: Reviewed & tested by the community » Needs work

Solardiz says $2$ is already used as a different hash identifier, so setting to CNW to at least get feedback from him on how to fix that minor iteml

Owen Barton’s picture

I think moving to sha based hashes for sessions, tokens and other cryptographic uses is clearly beneficial. Performance checks aside I don't see any huge issues with moving to sha based hashes for most normal hashing usages, although I think the probability of collisions is tiny with frequency of use for most of these cases.

-define('DRUPAL_MIN_HASH_COUNT', 7);
+define('DRUPAL_MIN_HASH_COUNT', 5);

This is unrelated and should be in it's own issue, I think.

Note that we don't really need to harden session generaiton for anonymous users, hence I only added the hardening where needed.

I am not sure I agree with this - anonymous sessions can on occasion be used for things such as ecommerce, and as such I think they should be treated in the same manner as authenticated users. Given the lazy session generation in Drupal 7, the performance impact of this should be negligible.

-  return _password_crypt($password, _password_generate_salt($count_log2));
+  return _password_crypt_sha256($password, _password_generate_salt($count_log2));

There are serious repercussions of this that I have not seen discussed.

We already know that sha256 provides no security benefits over md5. What's more, there is a reasonable argument that because it is a (relatively) newer, and much more complex transform it is harder to guarantee that there will not be any radical improvements in the sha algorithm performance that could threaten security of the passwords at the iteration count we are using. What's more, if there was a radical improvement discovered then we would be stuck with insecure passwords until php implemented the improvement and it reached servers (we could not increase the iteration count to adapt whilst php had the slower algorithm). This scenario is much, much less likely with md5.

Another issue with this is data portability. Right now our md5 stretched/salted passwords adhere to popular portable hashing standards. Right now you should be able to move from Drupal to another system or authenticate against the users table from another system that supports these hashes (e.g. BB3, Wordpress, *nix passwd files etc). If we change to sha256 (which is not a standard I am aware of) we will force these users to hack the password hashing code of these other systems so that they can test against Drupal's non-standard hashes. Obviously there is no way they could convert from one to the other, without having all their users change their password.

For these reasons I would urge that we stick with standard and well tested md5 hashes as the default, but either allow contrib modules to use their own password.inc (as they can now, in fact), or commit the sha256 option but provide a variable_get wrapper so that government sites can switch to sha256 to satisfy their security "checks", rather than actually learning how password hashing works and leaving it at md5 :)

pwolanin’s picture

@Owen - the password include file is already swappable via a .inc file, as you know. If you read back you'll see that I was advocating the same position re: the password hashing, but I think there are good arguments to be made for not using it at all in core if possible.

I'm kind of tempted to go back to what you suggest - we can readily do the password.inc as a follow-up.

solardiz’s picture

'$2$' was used by the original bcrypt (Blowfish-based crypt(3) hashes) on OpenBSD in 1997 (IIRC). It was superseded by '$2a$' for the revised bcrypt that is being used nowadays (including by the unmodified phpass, which would use CRYPT_BLOWFISH when available), but both are still supported on *BSD systems.

I think '$S$' is not in use yet; you could have it mean "SHA".

The move to phpass-like "portable" hashes based on SHA-256 is really unfortunate, being made for non-technical reasons only, yet introducing very real problems.

If you feel you really must move off the MD5-based phpass "portable" hashes, then my recommendation is that you consider CRYPT_BLOWFISH, which is actually a security improvement (primarily because the password stretching loop is then implemented in C or even assembly code). CRYPT_BLOWFISH is guaranteed to be available on PHP 5.3.0+ (my implementation of it has been integrated into the PHP interpreter as of this version), and it is very often available on specific builds/installs of older versions as well. Can Drupal 7+ require PHP 5.3.0+?

If you strongly prefer SHA-2 family functions over Blowfish for the underlying primitive for password hashing (even though there's no security reason for that), then you can choose to require PHP 5.3.2+, which similarly includes SHA-2 based crypt() support. This similarly moves the password stretching loop into C code (maybe somewhat less efficient as of this writing, though).

Also, if you're somehow unhappy with Blowfish (which would be my recommendation), then it may be beneficial to use SHA-512 rather than SHA-256 as the underlying primitive for password hashing (and only for this purpose - not in the rest of Drupal). This is because SHA-512 makes better use of 64-bit CPUs (whereas SHA-256 leaves more of the silicon unused/idle), thereby reducing the cracking vs. validation speed ratio (bringing it closer to 1.0). The benefit may be as large as a factor of 2 (equivalent to stretching the passwords by 1 bit more), although this needs further research and benchmarking of actual cracking-optimized implementations (which don't exist yet).

Overall, I think we're not ready to make a fully-informed decision on this. Thus, my advice is that you leave password hashing alone for now, committing the rest of the MD5 to SHA-256 changes. Then open a separate issue for the password hashing discussion if you feel you must.

solardiz’s picture

PHP 5.3.3+ will support unbuffered reads, which you could use for reading fewer bytes from /dev/urandom (faster):

http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3/ext/standard/stre...

mfer’s picture

@solardiz going with SHA-2 is due to the requirements of a customer sector. See the initial issue for spec details. We have a requirement for PHP 5.2 in Drupal 7 so we need to target a native solution there as well.

Part of this has to do with performance and SAH-256 is faster so I'd like to go with that. Speed is important.

I like having portable passwords. What about creating a project that provides SHA-2 password hashing along with this ticket? We could leave the MD5 version in core and provide a SHA-2 version in contrib? Or, providing both in core and default to MD5?

solardiz’s picture

@mfer - understood re: "the requirements of a customer sector". Yet I think it'd be best to separate this issue in two: commit the switch from MD5 to SHA-256 for everything but password hashing now, then open a separate issue for the password hashing.

A faster hash is beneficial for uses _other_ than password hashing. For password hashing, it's often the other way around, although what actually matters is how much use of the CPU a single computation of a password hash makes. (The next step, which is starting to be implemented with scrypt, is to use large amounts of memory as well. But we're not ready for that yet.) For this reason, SHA-512 may be preferable over SHA-256 for this purpose (if we optimize primarily for 64-bit CPUs, which makes sense to me). I recommend that you re-read "the original" password hashing discussion starting with this comment:

http://drupal.org/node/29706#comment-342450

and specifically the comments that discuss "password stretching".

I agree with the suggestion to provide non-MD5-based password hashing as a non-default option (not necessarily based on SHA-2 - maybe CRYPT_BLOWFISH if that's OK per the customer sector requirements).

In fact, if you went for PHP 5.3.0+ you could also make this the default, because it would actually be providing better security (higher password stretching iteration counts possible due to the native code) and because it would be producing hashes that are compatible with at least something external to Drupal. So if you really want to move off MD5 for password hashing, then I recommend that you seriously consider requiring PHP 5.3.0+ and using CRYPT_BLOWFISH or requiring 5.3.2+ and using CRYPT_SHA512 hashes.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
49.49 KB

Here's a re-roll of the patch with the changes to password algorithm removed.

chx’s picture

I am still at

If you can convince people md5 is not a problem in password.inc then explaining the non-crypto usage of MD5s elsewhere should be no problem.

catch’s picture

And I'm still wondering when Drupal became a vendor with customers. I don't spend my free time on Drupal core for customers, or 'customer sectors', and will have to seriously reconsider whether that's a good use of my time if changes get in for those reasons. If password.inc is better with md5(), then the consistency argument is out completely, so this seriously needs to be scaled back. #769648: Make common.inc swappable now has a patch.

mfer’s picture

@chx - I think the issue here is not convincing people that MD5 is ok in password.inc. Password hashing is swappable so someone could provide a drupal based solution that doesn't use MD5 because it was swapped out. The other places MD5 is used are not swappable.

@catch There are drupalers building drupal sites that need to meet FIPS 180-3 and FIPS 198-1. If this does not go in their options are to migrate away from drupal or run a hacked core.

solardiz’s picture

@chx - I wouldn't say that all non-password-hashing uses of MD5 in Drupal are non-crypto. Some could actually benefit from the switch to SHA-256. As to the rest of MD5 to SHA-256 changes, they can also be made for consistency and to simplify further code audits, unless there's a reason not to make them (there is such a reason for password hashing).

Thus, the patch from #98 looks good to me. And if you want better password hashing, use CRYPT_BLOWFISH or CRYPT_SHA512. Really.

If CRYPT_BLOWFISH or CRYPT_SHA512 is merely supported by Drupal, rather than required by it, then I think this would fully address the government sites issue, without requiring PHP 5.3.0+ for other installs of Drupal (these will be able to use the MD5-based phpass hashes).

In fact, if you simply used my original phpass code, which included optional CRYPT_BLOWFISH support, then you would not even have this issue. ;-) (This is assuming that any non-MD5 cryptographic primitive will do, not just SHA-2. I am not 100% sure whether this is the case or not - someone should actually read the standards.)

pwolanin’s picture

@chx - discussion with solardiz and Owen made me conclude that the changes to password.inc needs more discussion, but the rest of this patch is ready. Here's a separate issue for that: #775792: Use/include an additional evern more secure password hashing algorithm in core

re-re-thinking...

requiring PHP 5.3 is not possible at this point. Removing md5 except for reading upgraded/imported passwords is the clear goal, and we also need a default algorithm that we know any Drupal 7 install can use, especially since we often do development on slightly different local, staging, and deployment versions of PHP, OSs, etc. We want to be using the same password storage for all Drupal 7 instlals not specifically configured with some alternative implementation - i.e. we don't want any silent choices being made based on platform.

jringalls’s picture

As pwolanin said, the intent should be to support the reading of other MD5 hashed data, however, to comply with Federal Information Processing Standards (FIPS) developed by NIST, which all U.S. Federal agencies must comply with in order to operate Information Systems, D7 should really move to SHA-512 hashing functions. This provides clear, concise answers to any Certification and Accreditation question of whether D7 is using NIST approved hash functions, regardless of initial intent (whether the hash is used for crypto functions or just for uniqueness).

I realize that several of the functions D7 used MD5 for had nothing to do with actual non-repudiation requirements, or made the hashing function susceptible to Collision or B-day attacks. However, my advice would be to continue to support whatever OOS hashing standards you want to by allowing a read of the interative MD5 hashed password used, but rewrite it for D7 usage as SHA-512 hashed, and use only SHA-512 hashing for any and all has function requirements for D7.

grendzy’s picture

I'm going to test a PHP implementation of http://en.wikipedia.org/wiki/MurmurHash which is a fast, explicitly non-cryptographic hash. It's actually faster than md5. Though a PHP function (instead of a built-in) might suck. Look for results here soon...

grendzy’s picture

major fail on murmurhash. The algorithm requires fixed precision arithmetic (it's really designed for C), which isn't feasible in PHP. I also tried FNV-1, but in pure PHP it's 10X slower than md5().

We could consider hash('crc32', ....) for non-crypto uses in the critical path. For me it benchmarked at 15% faster than md5. This way there's no ambiguity, all hashes in core (passwords aside) would be either strong crypto, or clearly just checksums.

pwolanin’s picture

FileSize
54.42 KB

I'm torn about changing the password storage hashing, but after further (real, in-person) conversations today with jringalls, febbraro, and others I think #87 was the right thing to do for Drupal 7 - so we can import $P$ hashes, but only generate sha-256-based password hashes.

this is a reroll of #87 - essentially the same as the patch that was RTBC. Only changing '$2$' to '$S$' and omitting the change to the min hash count.

pwolanin’s picture

FileSize
54.44 KB

oops - wrong patch above (needs on more line changed).

grendzy’s picture

Just reviewed #98 line-by-line. The code all looks solid (and is mostly just simple substitutions). I also benchmarked this and could not discern any performance difference (with AB, logged as uid1, median response time for front page was 258ms).

I also don't see this issue as appeasing lazy outsiders, even I would have a hard time auditing all the md5 calls. Making the code easier to audit is good DX for everyone.

febbraro’s picture

After some good discussions today I'm a big thumbs up on #108 as it supports easier auditing and SHA-2 for some of our more "demanding" implementations.

andypost’s picture

Incredible, that nobody cares about countries that have limits on key-size so sha-256 prohibited to use.
http://www.apache.org/dev/crypto.html
ECCN 5D002 http://www.access.gpo.gov/bis/ear/txt/ccl5-pt2.txt
120.1 ITAR http://epic.org/crypto/export_controls/itar.html

EDIT: forget to say that all not about hash but password crypt.

mfer’s picture

@andypost Please be careful how you phrase things. It is not that nobody cares. There are enough requirements around that world that we cannot and do not know them all. On top of that, most developers are not familiar with ITAR or Export controls (or even what countries other than the US require) and what that means. Someone caring vs. someone having knowledge of and experience with are two entirely different things.

And, thank you for the links.

While the apache.org and epic.org links are not official if we take what is at the end of the apache.org link to be true we do not have an issue with usage of sha-2 as a hash. The issue at hand is with the crypto section from password.inc, if there even is an issue. Looking at the original issue that landed the current crypto in core (#29706: More secure password hashing) I do not see detail about ITAR/Export Controls.

Not sure where we go from here and I do not have the time right now to dig through the requirements to come to some kind of understanding on this.

Owen Barton’s picture

I need to read this a bit more carefully, and ask some people - but my first impression is that these rules don't apply to md5 or sha. We are using sha as a one way hash. This is not the same as either a symmetric or assymetric cypher (which is used for communication and hence not one way). The 256 in sha265 is the hash length, not the key length (indeed, there is no key).

mfer’s picture

@Owen Barton Be sure to look at what password.inc is doing. It does more than generate a hash using MD5 or SHA-2. This extra bit is what is in question.

Owen Barton’s picture

Could you be more specific about the "extra bit" in question? I understand how password.inc salts and stretches passwords, but I still don't see how those relate to the export rules, as it is still essentially one way algorithms, not reversible cyphers.

solardiz’s picture

@pwolanin - if you have decided to take this nasty move, which it sounds like you did, why not at least use SHA-512 (and not SHA-256) as the underlying primitive for password hashing (and only for it)? If you don't want the hashes to become overly-long, you may truncate them to 256 bits or less. As I mentioned above, the advantage of SHA-512 for this purpose (and only for it, as it relates to Drupal) is its more complete use of 64-bit CPUs. Another advantage, which I didn't mention yet, is its increased cost on 32-bit CPUs, which makes the relative PHP interpreter overhead somewhat lower. I understand that it is difficult to grasp how these are advantages, but they are.

Also, perhaps in a few months from now you'll find that requiring PHP 5.3+ is actually acceptable (and you'd require it for another reason anyway). Maybe this will even happen before the Drupal 7 release. Thus, the move to some "intermediate" hashes feels premature. CRYPT_BLOWFISH or CRYPT_SHA512 would be a whole lot better than a loop around hash('sha512', ...) in PHP. The difference is several bits of password stretching.

As to the export controls, as far as I'm aware they're irrelevant, for many reasons at once (any one of which would be sufficient). In no particular order: (1) we're not redistributing any crypto code (we're using code from PHP), (2) the code in PHP does not fall under the US export regulations, (3) the US export regulations have been relaxed a lot since 1990s, (4) the relevant code in PHP is either hashes (not ciphers) or it's been developed outside of the US (this would allow for redistribution per the old export regulations, although maybe not via sites hosted in the US; nowadays this is unimportant), and (5) indeed, we're not using/implementing any ciphers - we're merely computing hashes.

pwolanin’s picture

Looking at the summary at http://www.apache.org/dev/crypto.html I really don't see anything relevant to what we are doing.

mfer’s picture

@solardiz - 3 Things for you to consider.

1) If you think SAH-512 should be used instead show that it's faster (the numbers previously posted here show it to be slower) or make some other case for it. Suggesting some benefit that goes along with 64-bit processors is not a case to make a switch.

2) PHP 5.3+ is not and will not be a requirement for Drupal 7. During the lifecycle for Drupal 7 don't expect PHP 5.3. The only way that would change is if PHP 5.3 takes over the market and there is no reason to stay with PHP 5.2. As it stands PHP 5.3 has a very very very small market share. We cannot consider a PHP 5.3 specific solution at this time and will not be able to for the near future for core.

3) Please watch how you say things. Calling this a nasty move is subjective and rude. What is here is here for a good reason. SHA-2 is here to meet specific requirements. SHA-256 is used the way it is used based on providing high performance. This has been thought out. If you have a suggestion for a change make a case for it in the context and environment Drupal is used within and that will help meet the need outlined in the initial post for this issue.

solardiz’s picture

@mfer - Thank you for your comments. Obviously, when I talk in/to a certain community, I have to obey "the rules", or leave.

1. I do think that SHA-512 should be used, but for password hashing only - not in other places in Drupal. It is not faster than SHA-256. It is slower. I did make "some other case for it", several times by now, so I won't waste everyone's time by repeating that (and I do not mean this to sound rude, really, I just have difficulty wording it differently, yet convey the thought). As to "making a switch", you won't be making any extra switch if you simply choose SHA-512 over SHA-256 for password hashing now (and only for it), assuming that you're switching from MD5 anyway.

Anyhow, I don't have strong feelings on this matter. It is just roughly 1 extra bit of password stretching that you could get for free. If you don't want it, that's fine by me. (You could get more than that by going with CRYPT_BLOWFISH or CRYPT_SHA512 anyway, but I understand your position on PHP 5.3 now.)

2. Understood re: PHP 5.3. Thank you for stating this so clearly!

3. I am sorry for the "nasty move" words. I did refrain from saying that for a while (in fact, I refrained from commenting in here for a while). Yes, the words are subjective. I did not mean them to be rude; rather, I meant them as a way to emphasize this opinion (yes, my opinion, which was not quite humble for certain reasons). I do understand that you're making the change for non-security reasons, yet I wanted to help you improve security and maintain better interoperability as well. Once again, understood re: PHP 5.3.

Since it appears that my involvement is not of help any further (decision made, all info conveyed, etc.), I will just leave this discussion.

Thank you everyone for Drupal! :-)

pwolanin’s picture

@solrdiz - Owen and I were discussing at Drupalcon the suggestion to use sha512 for passwords today. One item of interest is the selinux uses sha512 by default for storing passwords. http://docs.fedoraproject.org/release-notes/f9/en_US/sn-Security.html I'm happy to re-roll with that added change if we think that's a good idea.

solardiz’s picture

@pwolanin - not SELinux, but recent versions of Fedora and Ubuntu do indeed use SHA-512 based hashes by default (these are equivalent to PHP 5.3.2+'s CRYPT_SHA512, which would be another reason to refrain from changing the password hashing until you can require PHP 5.3). I don't think my vote counts, but my opinion is that if you're moving away from MD5 anyway, then move to a SHA-512 based rather than a SHA-256 based algorithm, and maybe truncate the resulting hashes to 256 bits (to avoid changing the schema and avoid unnecessarily storing overly-long strings).

AlexisWilke’s picture

Not trying to extend the thread too much here... (and no, I did not read the 101 entries...) but it seems to me that the right solution is not to change the existing code with another hash! Instead we want to call a function that takes care of the hashing (i.e. in one single place, so site owners can choose what scheme to use!)

Also, if we want to avoid destroying the previous scheme(s)... you want to allow multiple hashing schemes in the same database. We could have a complex request to those who did not yet change their password (maybe on some specific triggers such as coming back to the site or trying to purchase something through some cart, or if the user did not change their password in X amount of time--oh! yeah we don't have that feature yet... hmmm...) But forcing millions to change their password at once could be a problem for many sites... This would mean the addition of one field to know what type of hash the password currently uses.

So anyway... I think it would be a much better idea to replace any md5($blah) call, with a drupal_password_hash($blash). And that function could then do whatever one wants. (keep md5() in countries where sha1/2 is forbidden, use sha1() or sha2(), or even your own recipe...) That function could even give modules a chances to do their own hashing (module_invoke()...) although that could be a bit far fetch I guess.

All of that may sound like a bit more work, but it would work for all future versions! Then you would not have to ask yourself the question. You'd just add the capability. (Actually you could have a drop down in the Install screen offering the user to choose the hashing function! Am I dreaming again?!)

Thank you for listening...
Alexis Wilke

Owen Barton’s picture

@AlexisWilke - we actually already have exactly the functionality you describe - contrib modules have the ability to switch out all of password.inc and replace it with any alternate hashing functionality they prefer. An example of this would be the existing phpass 6.x module which would likely be upgraded and has options to allow you to use the salted and stretched portable md5 hashes that 7.x uses now, or the stronger but (currently) less widely supported extended-DES or blowfish hashes.

Discussions have happened offline, but I think we are OK to move ahead with the change to SHA. I think it would be worth using SHA512, as this is a more widely supported standard than SHA256 (other webapps don't use this, but it appears that some OS level libraries use SHA512), appears to have some security advantages (see notes from Solar) and it is only a couple of lines of additional code.

Incidentally I came across http://people.redhat.com/drepper/SHA-crypt.txt which has some interesting discussion related to NIST and md5 for password hashes. The conclusion is much the same as ours, that while md5 based hashes (used correctly) is not a security issue at all, there are still benefits in terms of simplifying NIST audits in using SHA hashes, and minimal downsides.

edit: actually, it seems it is a bit more complex that the above link describes - there is no clear guideline regarding blowfish, and *BSD has decided that it seems like it is allowed. The only clear requirement is moving from sha1 to sha2, and even there there is an exception for KDFs (key derivation functions).

pwolanin’s picture

FileSize
54.51 KB

As suggested by solardiz and others, this changes the password hashing to use sha-512, with a truncation of the setting + hash to 60 chars (retains 36 bytes of info from the 64 byte sha-512 hash).

solardiz’s picture

@pwolanin - Overall, this looks fine to me (given the decision already made), but there are a couple of issues I noticed:

1. Truncating at exactly 60 chars leaves no room for a possible 'U' prefix (for hashes to be "upgraded" from raw MD5).

2. You still have a check for 55 in the code.

I suggest that you truncate the SHA-512 hash encoding string not including the setting to 43 chars (258 bits). Then the resulting string will be exactly 55 chars, like it would have been if you used SHA-256. Or you can just truncate the resulting string to 55 chars, much like you truncate it to 60 now. This will match the rest of your code well.

pwolanin’s picture

FileSize
54.89 KB

@solrdiz - good catch on the 55 mismatch. Note, we strip the 'U' frrom the setting string before proceeding, so it won't have an effect.

Adding a constant to eliminate the magic number 55 in code here, and truncating at 55 chars as suggested (so 256 bits of the hash is stored).

greggles’s picture

FWIW, I love this idea. As we researched the Security Report Ben and I talked about how this is really common in other systems and something we should really get done in 7 considering that 7 will be around for the next few years and this is an area where we are already behind the standard.

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

all the problems raised since #90 have been resolved, I believe.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Looks like there is fairly strong agreement, so committed to CVS HEAD. Personally, I also think this is a good idea -- better safe than sorry. Security matters. Thanks for the hard work and the guidance in this issue.

scor’s picture

Status: Fixed » Needs review
FileSize
961 bytes

fixes a couple of typos in the docs.

Status: Needs review » Needs work
Issue tags: -Security improvements, -Security Advisory follow-up

The last submitted patch, 723802_130_sha256_docs.patch, failed testing.

Owen Barton’s picture

Status: Needs work » Needs review
Issue tags: +Security improvements, +Security Advisory follow-up

#130: 723802_130_sha256_docs.patch queued for re-testing.

Damien Tournoud’s picture

Hm. Don't we need an upgrade path for the action MD5 hashes stored in the database?

Owen Barton’s picture

You can't upgrade in bulk because there is no way to easily reverse the md5 hashes (which is the point, of course) - but the patch (in fact the previous md5 based salt+stretch patch that went in early last year) has code to upgrade each old-style single-md5 hash as each user logs in. I believe there are existing "bulk password reset" type modules in contrib that could be used for sites that want to enforce the new hashes sitewide.

catch’s picture

That won't upgrade actions hashes though.

Status: Needs review » Needs work

The last submitted patch, 723802_130_sha256_docs.patch, failed testing.

pwolanin’s picture

@catch - I don't think the actions hashes are stored in the DB. I was afraid so before doing the patch, but it does not seem to be the case.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
961 bytes

dunno why scor's patch doesn't work for the testbot. Here's a re-roll

pwolanin’s picture

Status: Needs review » Reviewed & tested by the community

tests pass, only changes code comments

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

jbrauer’s picture

jbrauer’s picture

Standardizing security tags.

andypost’s picture

What's about 'drupal_private_key' variable? Suppose it should be updated for old installs too.

pwolanin’s picture

@anypost - you mean old Drupal 7 installs or Drupal 6 updates?

Maybe we should add it for updates - given that we change the algorithms for the use of it, there' probably no utility in preserving it.

Dave Reid’s picture

Should we document this in the module upgrade guide that developer should stay away from md5() now?

pwolanin’s picture

@Dave Reid - yes, likely so - also we should add to the "writing secure code" part of the handbook.

mfb’s picture

Something I noticed while debugging another issue, and seeing various formats of session ids: If a user doesn't already have a session when they log in, the session id they are assigned is the "less random" one that is pre-generated for anonymous users. You can test this by appending some text onto the session id:

  // Less random sessions (which are much faster to generate) are used for
  // anonymous users than are generated in drupal_session_regenerate() when
  // a user becomes authenticated.
  session_id(drupal_hash_base64(uniqid(mt_rand(), TRUE)) . '-anon');

If an anonymous user already has an active session, then when logging in, her session will be regenerated via session_regenerate_id(), which allows us to use various PHP settings for improved randomness: session.entropy_file, session.entropy_length, session.hash_function.

Personally, I'd prefer to always have PHP generate highly random session ids using my hardened PHP session settings when users login.

If we leave the session id logic as is, at least the code comment should be clear about the fact that this "less random" session id will often be used for authenticated users.

Owen Barton’s picture

Well spotted - I think this is a significant issue. I am not sure if it is related to this patch or not, but either way I think this deserves it's own issue - I just opened one at #801278: Authenticated users getting "less random" session IDs.

andypost’s picture

@pwolanin I mean D6 updates and posibly previous alpha D7 (suppose head2head module)

lostchord’s picture

To expand on the comments in #111 the important thing is the Wassenaar Arrangement, which gets a brief mention in the Apache document. This international arrangement gives rise to such riveting reads as the Defence and Strategic Goods List which should give you a good idea of what to avoid if free distribution is your goal.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

thoth’s picture

I am writing a perl script to update an admin password temporarily for sites I am admin for. I know this could be done using an external call to the php script or drush. But I'd like to understand what's going on a bit more and wanted to use the Digest::SHA module inside of perl. Right now I'm doing this:

my $shahash = sha512_base64($string);
print "\n $shahash\n";
$mysqlupdatecmd = `mysql $mylogin $drupaldb -e "update users set pass=(\'$shahash\') where uid=\'1\'`;

Is there something I'm missing? (the print statement looks like a hash to me:[) I've been sorta kinda following this thread even though it doesn't pertain to SHA-2:

http://drupal.org/node/677756

Can anyone shed a bit more light on the drupal 7 situation? I'd even like to hear about any drupal 8 differences if any.

thoth’s picture

found my answers here if any wayward searcher finds themselves here:

http://api.drupal.org/api/drupal/includes--password.inc

the documentation in drupal 7 is really good!