After discussing this on the security list and in IRC, I am opening this issue report for discussion.

The password hashing function does use salted and iterated hashing
which is far better than not hashing.

However, it does not abide by industry standard stretching practices.

// We rely on the hash() function being available in PHP 5.2+.
 $hash = hash($algo, $salt . $password, TRUE);
 do {
   $hash = hash($algo, $hash . $password, TRUE);
 } while (--$count);

There are a few problems with a function such as this:

First, if there is ever a collision for $hash.$password, that
collision will propegate through to the result. This means that
increasing iteration counts can actually decrease the output space of
the function. This can theoretically lead to password collisions
(where multiple password inputs combinations lead to the same output
hash string).

Second, straight concatenation in the message of the hash function can
be susceptible to certain padding attacks. While the iterations make
the vector more difficult to execute, it's still theoretically
possible.

Suggestion:

Implement the industry standard PBKDF2 derivation function.
http://www.ietf.org/rfc/rfc2898.txt

A PHP implementation looks like this:

https://github.com/ircmaxell/PHP-CryptLib/blob/master/lib/CryptLib/Key/D...

public function derive($algorithm, $password, $salt, $iterations, $length) {
       $size   = strlen(hash($algorithm, '', true));
       $len    = ceil($length / $size);
       $result = '';
       for ($i = 1; $i <= $len; $i++) {
           $tmp = hash_hmac($algorithm, $salt . pack('N', $i),
$password, true);
           $res = $tmp;
           for ($j = 1; $j < $iterations; $j++) {
               $tmp  = hash_hmac($algorithm, $tmp, $password, true);
               $res ^= $tmp;
           }
           $result .= $res;
       }
       return substr($result, 0, $length);
   }

(Note, I tweaked it to be pure PHP rather than dependent upon the
CryptLib library).

This implementation is tested against the test
vectors for PBKDF2:
https://github.com/ircmaxell/PHP-CryptLib/blob/master/test/Unit/Key/Deri...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pwolanin’s picture

We discussed this in IRC last night.

In general the algorithm we are using meets the basic design principles:
https://www.owasp.org/index.php/Cryptographic_Storage_Cheat_Sheet#Rule_-...

The likelihood of the hash collision ought to be extremely small, though agree if we were starting over the XOR would be better, or we should e.g. include the $count in the hashed value or do something like crypt does (see below) to vary the hashed string per iteration.

Related links:

unclear whether a KDF is appropriate for password storage, though don't see any obvious reason why not:
http://security.stackexchange.com/questions/2051/is-pbkdf2-based-system-...

*nix has the "crypt" hashing algorithm using md5/sha2 that mixes the salt and other strings into different prime-numbered iterations which would substantially avoid the possibiity of the hash collision:
http://www.akkadia.org/drepper/sha-crypt.html
http://www.akkadia.org/drepper/SHA-crypt.txt

pwolanin’s picture

Title: Insecure Hashing Mechanism Used In Password.inc » Potential problems with Hashing Mechanism Used In Password.inc

Changing the title - I don't think the mechanism is fundamentally insecure - especially compared to Drupal <= 6

pwolanin’s picture

solardiz’s picture

Regarding the (non-)collisions issue, see the classic paper on password stretching - http://www.schneier.com/paper-low-entropy.html
The relevant places are page 3, "Narrow-Pipe Attacks" (in this case, our internal state is large enough) and page 6, "Time to Compute". A collision would amount to a break of SHA-512.

Yes, I am aware of length extension attacks on Merkle–Damgard hashes, and I was during the design of phpass. I published one of those myself in 2000 (in my analysis of Cisco's TACACS+); that one was purely theoretical, though. They are not a problem here.

Please don't move to PBKDF2. Yes, it was designed for the purpose, but you'd spend more of the (slow) PHP code to implement it, which will give offline attackers (who may run native machine code) a greater advantage over you. This is far more important than the purely theoretical risk of collisions.

There's definitely room for improvement, but PBKDF2 is just not it. When Drupal is able to require PHP 5.3.2+, it could move to CRYPT_SHA512. And work is underway on even better password hashing methods, such as based on the concept of sequential memory-hard functions introduced with scrypt: http://www.tarsnap.com/scrypt.html

grendzy’s picture

Component: other » base system

subscribe. (thanks solardiz for the feedback!)

ircmaxell’s picture

@solardiz:

I don't understand your rationale behind not using PBKDF2. It's not doing anything significant in PHP to implement. Yes, it does add 1 additional hash call per iteration, but that's done inside the C call to hash_hmac.

Now, it is slower than a native implementation of PBKDF2. However, so is the current implementation. A quick test shows that PBKDF2 is less than twice as slow as the current implementation. Given the 1 additional hash rounds involved, and the fact that the algorithm is vetted (and approved) for this use, and the fact that it's not even theoretically susceptible to any padding attacks, is that a significant margin?

After all, isn't the whole point of stretching to make generating the hash slower? And considering that PBKDF2 via the above implementation only takes around 73 milliseconds per execution (at 16384 iterations, the current setting Drupal is using), is it really that slow?

I have absolutely no issue going for something like scrypt, but considering it's not available in a PECL module yet, and even when it does it won't be portable, I'm not sure that's realistic in the short term. Not to mention that scrypt remains to be vetted and approved. Whereas PBKDF2 is simple to implement in PHP, and reasonably fast.

I guess I just don't understand the stance against PBKDF2 given the arguments provided.

pwolanin’s picture

@ircmaxwell - How fast is the PHP implementation of XOR?

In any case, I think we should move the relevant discussion points to the documentation - the the extent that the algorithm we have is reasonably secure, I don't know it's worth the effort to change the algorithm again.

Owen Barton’s picture

After all, isn't the whole point of stretching to make generating the hash slower? And considering that PBKDF2 via the above implementation only takes around 73 milliseconds per execution (at 16384 iterations, the current setting Drupal is using), is it really that slow?

Yes, the point of stretching is to make the hash slower - but I think Solar's point is that to implement PBKDF2 we may need to spend more CPU time proportionally on PHP overhead (the time taken to call functions, set variables etc) than the current implementation. If this is the case then even though it may be "slower" for PHP, it could take less time for highly optimized machine code which would not be subject to much of the PHP overheads, and hence would be less secure overall.

As a pedantic example, adding sleep() calls in PHP will also make it "slower", but won't make the hash any more secure - in the same way, keeping the implementation simple and tight avoids as much php overhead as possible and ensures as much of the time as possible is spent on actual hashing (which is obviously harder to optimize away for attackers).

ircmaxell’s picture

@pwolanin:

The XOR implementation is fast enough to not even bother discussing it.

@Owen:

While I completely understand your point (and agree with it as you made it), I fail to see the relevance to the discussion. The only PHP-land difference between PBKDF2 and the current implementation is the addition of an XOR. And considering you're removing a concat operation, that's likely a wash.

The added computational cost comes in the hash_hmac function which is all done at the C level. So the difference in PHP overhead is minimal (if any). The difference comes in compiled code (due to the added hash call which is done at the C level). And that is "good" added cost...

Dave Reid’s picture

Subscribing. Very nice research already on this.

solardiz’s picture

@ircmaxell - I have no time to continue this discussion (I was asked to comment, so I did), and I am afraid that the tone of my comment below may be a little inappropriate (but I can't make it any better while conveying the thoughts, except by starting with this comment, which I do). I am merely trying to help the project, and I don't treat any of this personal.

> A quick test shows that PBKDF2 is less than twice as slow as the current implementation.

The fact that it's slower at all - for almost the same number of SHA-512 hash computations, right? - is a drawback.

> Given the 1 additional hash rounds involved

Outside of the inner loop, so it does not double the SHA-512 hash computation count, right?

> and the fact that the algorithm is vetted (and approved) for this use

PBKDF2 is commonly used for deriving encryption keys. While the task and the criteria for algorithm selection are essentially the same as those for password hashing, PBKDF2 is not commonly used for the latter. Thus, moving to PBKDF2 is not obviously a step to make people who want something vetted and approved happier. You'd introduce yet another hash type incompatible with anyone else's, and you would not gain anything.

Another comment here refers to http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html - written by Colin Percival, FreeBSD Security Officer. I have a lot of respect for Colin, and he definitely knows what he writes about. However, I think you're misinterpreting what he wrote:

"Password handling: As soon as you receive a password, hash it using scrypt or PBKDF2 and erase the plaintext password from memory." - I think Colin meant this in context of key derivation for encryption, not password hashing. Although, once again, a good KDF is also a good password hashing method, and vice-versa, I think Colin meant to give a modern/better example (his scrypt) and a widely-accepted one (PBKDF2) specifically as examples of KDFs for encryption keys. He is aware that PBKDF2 is not the next-best of common KDFs or password hashes (per the comparison in his paper on scrypt, this second place is occupied by bcrypt, which is primarily a password hashing method) and that it is not commonly used for password hashing. FreeBSD uses an MD5-based password hashing method (yes, even MD5 is OK for the purpose if properly cooked, as discussed before), it supports bcrypt as an option, and it does not support PBKDF2 as a password hashing method. Now recall that Colin is FreeBSD Security Officer.

Now, I understand and agree that PBKDF2 is realistic to semi-efficiently implement in PHP, unlike bcrypt (other than by calling crypt() assuming that we have CRYPT_BLOWFISH). However, it is not an improvement over what we currently have, neither technically nor politically. Although if someone else suggests that Drupal moves to PBKDF2, I might agree that there's political demand. ;-)

> not even theoretically susceptible to any padding attacks

Are you able to demonstrate (fully describe) a specific padding attack against the algorithm we're using currently (a theoretical attack would be OK). Also, show how such an attack has any security implications (theoretical would be OK in this context) on our password hashes.

The closest match that I see would be upgrading a password hash to a higher iteration count, which could be a nice feature with no security implications, but even that actually doesn't work (because the new hash would have to use different $password on different ones of its iterations, so it wouldn't match a hash computed by our actual algorithm unless there's a collision).

> After all, isn't the whole point of stretching to make generating the hash slower?

I second Owen's comments on this. In short: the goal is to make the hash genuinely slower, not just make one of its implementations slower (and especially not the one that we use).

> going for something like scrypt

I am not proposing that. I mentioned it to illustrate that we'll have real improvements to make a bit later, and we should not be making pseudo-improvements now (merely complicating things for no good reason).

> the stance against PBKDF2

You're proposing making a change for no good reason (you think that you have good reasons, but I disagree). This change would be a (minor) annoyance to some users, and it would require more code (backwards compatibility code and new code). It also has practical drawbacks - more PHP overhead, hence hashes weaker against offline attack.

Once again, I am sorry for opposing your proposal. I felt it was my job to do so, bringing up the right counter-arguments. I don't treat any of this personal. I suggest that we close the issue now.

solardiz’s picture

@ircmaxell - I retract my comment re: worse code efficiency for PBKDF2. hash_hmac() actually involves two hash computations internally, so if the overall slowdown is less than 2x (and you say that it is), we have an efficiency improvement with that change. The rest of my comments/concerns still apply. If we were to make any change, I'd rather make a much bigger change.

If you proposed this change when Drupal was about to move from MD5 to SHA-512 for the cryptographic primitive used by the algorithm, I would likely support your proposal (as going one tiny step further in improving code efficiency through calling a more expensive function implemented in C). But now I think the change would do more harm (yet another hash type to support) than good (slightly more efficient code).

hash_hmac() didn't exist in older versions of PHP that phpass was meant to support, so it was out of consideration during my initial design of phpass.

ircmaxell’s picture

@solardiz: No hard feelings at all. Your initial comments were great until the line about PHP efficiency (which wasn't applicable due to the implementation comparison I tried to describe). And now that you realize the hmac call uses two internal hash rounds, that solves that problem.

As far as the benefits, I think there are benefits. That's what I was trying to get across. As far as are they worth while implementing, that's another story all together (and far more subjective).

In an ideal world, we'd just implement scrypt or bcrypt and be done. Unfortunately I am not sure that either will be ubiquitous enough (in a compiled form at least) in the near to mid future (next 5 years) to implement in a far reaching project such as PHP, yet alone Drupal. So that means the *best* generic solution right now is a PBKDF2 style algorithm. Now, how much better than the current one is another point, and if the justifications for it are outweighed by the negatives associated I leave up to you.

I do respect and will stand behind your opinion that "it's not worth doing now".

I'm happy for this to be closed.

solardiz’s picture

@ircmaxell: Yeah, sorry for my initial confusion regarding code efficiency. That comment was based primarily on my old experience while designing phpass, which pre-dates the introduction of hash_hmac() into PHP, and thus on expectation that an attempt to implement a more complicated algorithm in PHP would result in worse efficiency (but the introduction of hash_hmac() into PHP changed that).

You mention that your implementation of PBKDF2 at 16384 iterations ran in 73 milliseconds on your system. I am curious of the corresponding performance number for Drupal's current code (milliseconds for 16384 iterations on your system). I think this efficiency improvement is not worth the new hash type for Drupal now, but I am curious anyway.

So, you appear to be doing around 450 thousand SHA-512 computations per second. CRYPT_SHA512, which is available in PHP 5.3.2+, appears to do around 3.5 million SHA-512 computations per second on Core 2'ish CPUs. (Disclaimer: I ran my benchmarks on a different copy of the code, not the one in PHP, but I think it's almost the same, and the number looks about right to me. Also, I am assuming that the size of inputs to SHA-512 is usually such that only one block is processed.) So this will provide a further 7x efficiency improvement when Drupal is able to require PHP 5.3.2+. I think this may happen sooner than in the next 5 years. This would not necessarily be the best thing to do (I have several other options in mind), but it is the least Drupal could do and it would not introduce yet another hash type to this world (CRYPT_SHA512 is already widespread). So I propose that Drupal jumps over this PBKDF2 step.

Damien Tournoud’s picture

Title: Potential problems with Hashing Mechanism Used In Password.inc » Strenghten password hashing mechanism
Version: 7.x-dev » 8.x-dev
Category: bug » feature
Priority: Minor » Normal

Ok, given the analysis it seems fair to bump this to 8.x as a feature request. The 8.x branch requires PHP 5.3.2+, so have fun.

solardiz’s picture

Speaking of code efficiency and of what else we could do, one thing I regret not doing in phpass "portable hashes" is calling md5(), which Drupal would change to hash('sha512', ...), on much longer strings. Somehow this did not occur to me at the time, and I treated those hashes as last resort fallback, so I did not try to make them "perfect" (but then of course many projects preferred them for portability).

I've since done some benchmarking on short vs. long strings, on a 32-bit system. PHP's md5() vs. hash('sha512', ...) show about a 2x speed difference on short strings (such as those we use them on in phpass and in Drupal), but a 7.5x speed difference on strings of 4 KB or 8 KB (much longer, yet not exceeding the CPU's L1 cache size). This suggests that the PHP overhead seen on short strings (and thus on frequent function calls) is mostly avoided by the use of longer strings (and thus far less frequent function calls). With hash_hmac(), this further improves to a 8.5x difference between MD5 and SHA-512 on this 32-bit system (the difference should be a lot less on 64-bit), but only when both arguments to hash_hmac() are long strings (not the case in PBKDF2).

Thus, if we're to improve efficiency such that we gain a few extra bits of password stretching while not increasing PHP version requirements, then calling functions on longer strings and not merely moving to PBKDF2 is what we'd want to do. We could consider doing both, though - apply PBKDF2 to a string several kilobytes long derived from the password - to be able to say that we're using something standard (except for the initial step where we derive the long string).

Another aspect is that increasing memory needs makes our hashes somewhat unfriendly to GPU/FPGA/ASIC implementations. Since we typically don't have access to such hardware from Drupal, this is an advantage.

Thus, taking it a step further, we could deliberately use more memory. Implementing a true sequential memory-hard function similar to scrypt in PHP would reduce code efficiency, yet I have some ideas in this area (to be researched when/if I have time).

solardiz’s picture

I wrote: "except for the initial step where we derive the long string". Actually, no, we can use PBKDF2 in that initial step too, just with a low iteration count. So we'd be able to say that we build upon PBKDF2.

Overall, there's no immediate issue with the code currently in Drupal (other than that it could be better, but this will always be the case), and any improvements that would be worth it need further research. So I think that tentatively making 8.x the target for any changes in this area is about right.

ircmaxell’s picture

@solardiz: The time of the current Drupal implementation was about 40 ms.

As far as working on long strings, that's an interesting point. I wonder if it would be worth running a PBKDF2 style function (meaning one which has a variable length output) as a feedback loop. Each iteration effectively lengthening both the hashed content and the output hash. Then, at the end returning it to a fixed length with a simple sha512...

And as you talk about longer strings, that's where PBKDF2 can actually surpass bcrypt in efficiency and strength. bcrypt is limited to using the first 55 bytes of the input stream. Therefore once you surpass somewhere around 60 bytes or so, PBKDF2 (with sha512) will become stronger than bcrypt. In terms of a password being fed directly into the storage function this is of little concern. But if you want to stretch it before hand, it could wind up hurting (at least in comparison to other strengthening functions). Not nearly enough to justify general use, just something to realize when considering other mechanisms of making the password derivation memory-not-easy...

I was digging around in scrypt today, and in all honesty, I'm not seeing anything significant that would prevent implementing it in native PHP. Yes, it does use a bunch of memory, but strings in PHP are reasonably efficient at that. I'm going to try implementing it in PHP just to see if it's efficient enough to be worth while (I doubt it, but I feel like trying to better understand it anyway)... If I can get it working, I'll reply back here...

pwolanin’s picture

I'm going to open a separate issue to at double the iteration count for D7 and 4x for D8. That's a trivial step we can take to maintain security in the face of increasing CPU speed.

As much as the algorithm here has potential to be improved, I'm tempted to say it's even Drupal 9 material unless there is a flaw uncovered in the current.

pwolanin’s picture

solardiz’s picture

Correction: I made an incorrect statement in comment #11 on "upgrading a password hash to a higher iteration count". With the code we're discussing, it is actually not related to padding, it appears to be possible in the original phpass code (but requires knowledge of both the hash and the password, which makes it nearly pointless), but not in Drupal's revision (because only partial hashes are saved). This is of no relevance to what we're discussing now, but I felt I needed to post a correction anyway.

@ircmaxell: Thanks for the info. So it appears that hash_hmac() resulted in a 10% efficiency improvement in your case. This is useful to know, but it's not a reason for us to make any change now.

> Each iteration effectively lengthening both the hashed content and the output hash.

I had thought of that briefly. I think that, no, arriving at a fixed hash function input size quickly and staying at that in the costly loop is better, because we'll adjust the size to match actual machines well (via a configurable parameter to the hashing method, perhaps encoded along with the hash like we do for the iteration count).

> bcrypt is limited to using the first 55 bytes of the input stream.

This was an error in the original paper on bcrypt, which made its way into the recent paper on scrypt. The actual password length limit for bcrypt is 72 characters.

> I was digging around in scrypt today, and in all honesty, I'm not seeing anything significant that would prevent implementing it in native PHP.

As defined, it uses Salsa20/8, which I thought was not available in PHP (unless we implement it in PHP code, which would be slow). However, some Salsa hashes are in fact present in recent PHP (5.3+?)

I imagine that some folks will be unhappy about the choice of a crypto primitive not approved by NIST. We could calm them down by calling this (most important) step "non-crypto" and saying that we also have a PBKDF2 step with SHA-256 (scrypt has it) or SHA-512.

Then, as defined scrypt would be invoking Salsa20/8 on blocks that are too small to be efficient in a scripting language. This is BlockMix on page 10 of the scrypt paper, or blockmix_salsa8() in the reference implementation. It uses blkxor(), salsa20_8(), and blkcpy() on 64-byte blocks. I think that we'll need to make changes there, maybe after making sure our reimplementation matches scrypt test vectors first (before we break it).

And if we do make such changes, we may deviate from using Salsa20/8 as well (although that choice had been made for specific reasons). Ideally, we'd arrive at a hashing method that is implementable efficiently not only in PHP code (given what we have available in current versions of PHP), but also implementable even more efficiently in C with SIMD intrinsics. Salsa20/8 is limited to 128-bit SIMD vectors there, so we should do no worse (have no less parallelism at that level), and preferably do better (configurable amount of instruction-level parallelism). (scrypt has a configurable high-level parallelism parameter, but I am not sure if it'd be easy or difficult to bring that parallelism down to instruction level, which would be desirable when implementing scrypt for machines with wider than 128-bit SIMD vectors.)

I think we should approach this task outside of Drupal context, but considering Drupal as one of the prospective users. We actually have a relevant project underway, with a GSoC 2011 student focusing on the FPGA aspect of it (putting FPGA boards in authentication servers):

http://www.openwall.com/lists/crypt-dev/2011/04/05/2 - project rationale
http://www.openwall.com/lists/crypt-dev/2011/05/09/1 - alternative approach
http://www.openwall.com/lists/crypt-dev/2011/05/09/6 - comments on scrypt
http://www.openwall.com/lists/crypt-dev/2011/05/12/4 - using scrypt for user authentication

ircmaxell’s picture

@solardiz:

> This was an error in the original paper on bcrypt, which made its way into the recent paper on scrypt. The actual password length limit for bcrypt is 72 characters.

That's interesting, so that reduces (but not eliminates) the falloff. So then approximately 75 to 80 characters would be the breakeven point (which is still far less than we are talking about extending the password length to)...

> However, some Salsa hashes are in fact present in recent PHP (5.3+?)

Yes, 5.3.0 introduced Salsa10 and Salsa20. The reduced round variants are not available (perhaps that's something to petition core for 5.4).

> And if we do make such changes, we may deviate from using Salsa20/8 as well

Perhaps Salsa20 would be a fair replacement, considering that the only difference is the number of rounds performed of the internal mixing function (20 vs 8). So it would make it artificially slower... It should have the same parallelism, just more time would be spent in the internal iterations of the hash function (but I'm curious if the hash function isn't dominated by other factors, which would mean the difference is trivial)...

> I think that we'll need to make changes there, maybe after making sure our reimplementation matches scrypt test vectors first (before we break it).

If we go that far, would it be worth while making it a PECL extension with a PHP code backup (for servers without the extension installed)? That way we can implement scrypt directly as well as the custom implementation. And use the PHP implementation as a fallback (of the custom implementation).

solardiz’s picture

@ircmaxell:

> that reduces (but not eliminates) the falloff.

Correct.

> So then approximately 75 to 80 characters would be the breakeven point

I wouldn't say that. If our input has low guessing entropy, then those extra chars probably have little entropy in them as well. If our input has high guessing entropy, then there's probably more than enough of it in the first 72 chars.

Anyway, when using bcrypt, my recommendation is for the system to refuse to set passwords longer than 72 chars.

> still far less than we are talking about extending the password length to

You lost me here. What does bcrypt's length limit have to do with us using longer strings with PHP functions such as hash('sha512', ...) when we build our own hashing method? In that case, we don't use bcrypt at all.

> Perhaps Salsa20 would be a fair replacement

Perhaps. But I am not happy with a few other things in the way scrypt is defined, so if we deviate from it I'd likely want to make more changes. Of course, we will need peer review.

> If we go that far, would it be worth while making it a PECL extension with a PHP code backup (for servers without the extension installed)?

Yes, and we need to consider non-PHP apps as well. For PHP, I think it could even become a new hashing method supported by crypt().

> That way we can implement scrypt directly as well as the custom implementation.

While this enables us to implement scrypt efficiently as-is in the PECL extension, it is not obviously the best thing for us to do. Efficiency of the fallback code (written in PHP) matters too, and some other properties do as well.

ircmaxell’s picture

@solardiz:

>Perhaps. But I am not happy with a few other things in the way scrypt is defined, so if we deviate from it I'd likely want to make more changes. Of course, we will need peer review.

Speaking of such, I think I found an issue with the scrypt implementation (not a major one, but still an issue).

Right now, the bulk of the RAM usage is in the V variable. This is initialized to be 128 * r * N bytes of memory. It accounts for about 98% of the allocated memory of the entire crypt cycle.

In pseduo-code, smix looks like this currently:

X <-- B
For i = 0 ... N - 1
    V_i <-- X
    X <-- H(X)

For i = 0 ... N - 1
    j <-- Integerify(X) mod N
    X <-- H(X xor V_j)
B <-- X

Now, as written, it will call H(X) 2N times, and consume sizeof(X) * N bytes of RAM for V

If we tweak the algorithm to the following, we will reduce the memory usage from sizeof(X)*N to 2*sizeof(X) (which is quite small, a few KB on common settings):

X <-- B
T <-- B
For i = 0 ... N - 1
    X <-- H(X)

For i = 0 ... N - 1
    j <-- Integerify(X) mod N
    V <-- T
    For k = 0 ... j
        V <-- H(V)
    X <-- H(X xor V)
B <-- X

Now, there is a tradeoff. The above code will require on average 2N + N*N/2 calls to H(X). So for a N setting of 2^15 (32768), that means we go from 65,536 hash rounds to 536,936,448 hash rounds. That's an increase of of 4 orders of magnitude.

However, it also means the total memory used in computation is reduced from 128 * r * p + 256 * r + 128 * r * N to 128 * r * p + 256 * r + 2 * 128 * r. Considering common values of p=3,r=8,N=2^15, that means we dropped memory usage from 33,559,552 bytes to 7168 bytes.

Now, considering the new byte value should be able to fit inside the L1 cache with ease, that will save a significant amount of memory latency involved with making scrypt memory-hard. So that reduces the computational advantage significantly. So the 4 order of magnitude increase in hash cycles will be offset (at least partially) by never having to hit RAM. If we assume random distribution of J, we can estimate the number of fetches from RAM at about 2N*0.8 (assuming a 20% cache efficiency with a 4mb L2/3 cache). Therefore, for N=2^15, there will be somewhere around 54,000 RAM operations. At approximately 51ns each, that accounts for approximately 2.7ms of the functions runtime.

Now, salsa20/8 takes approximately 3000 cpu cycles to execute for a 8*128 byte hash value (r=8). On a 2ghz machine, that should take approximately 1.5 microseconds for each hash execution.

So if we draw that out, for the hash and memory usage, the original scrypt should take approximately 0.1 seconds to execute (this is only inside of smix, assuming that salsa + memory latency will greatly dominate the run time). And the low memory modification should take approximately 805 seconds for the same hash (with a lower bound of 0.1 seconds, and an upper bound of 1600 seconds). So there's still a significant difference in runtime. But it's now all CPU bound runtime (no main memory operations). So it can be run on memory constrained systems such as a GPU reasonably efficiently (it'll take a fair bit more CPU time). But since the memory dependency is drastically reduced, many operations can be run in parallel (hashing multiple differet guesses at the same time). On a single high end GPU, the average number of hashes per second could approach double digits. While this is still very slow, it at least allows existing GPU farms to be used to attack scrypt with reasonable results.

This "issue" could be eliminated by simply writing to V_j in the second loop, for example:

For i = 0 ... N - 1
    j <-- Integerify(X) mod N
    V_j <-- X xor V_j
    X <-- H(V_j)
B <-- X

It's not a truly significant vulnerability since we cannot parallelize smix due to the dependency on X being written on each iteration, and the "vector" greatly increases computational effort for reducing memory requirements significantly. But I think it's something worth noting at least...

Beware, all numbers here are back-of-envelope style calculations and should not be referenced.

solardiz’s picture

@ircmaxell:

It is interesting observations that you make. I did not verify your math, but here are some quick comments:

IIRC, scrypt did not attempt to exploit RAM latency nor bandwidth; on the contrary, I think it tried not to be bound by these, but at the same time to actually require this much RAM (and not something significantly slower) for implementation (including attacker's). If so, and if this was achieved in practice, I wouldn't expect the reduction in scrypt's memory needs (and thus fitting in a cache) to significantly speed it up (per Salsa20/8 computation).

Also, scrypt was designed with attackers armed with ASICs in mind, who would not have constraints of GPUs and thus would not need nor want to resort to trade-offs like this. In ASIC, per the estimates given in the scrypt paper, 1 bit of DRAM is 50 times cheaper than 1 logic gate. So this trade-off - reducing RAM needs but increasing logic needs (to implement more cores) - would actually work against the attacker.

When we consider GPUs, I think we need to use such settings of scrypt that the attack is impractical, and even when it is practical, scrypt still does slow the attacker down a lot. Alternatively, yes, a countermeasure like you describe could be implemented. BTW, with that change the loop becomes somewhat similar to that of Eksblowfish (part of bcrypt), which is also GPU-unfriendly.

> It's not a truly significant vulnerability since we cannot parallelize smix due to the dependency on X being written on each iteration

Mostly not for that reason. If we're an offline attacker, we always have lots of parallelism from being able to test different candidate passwords in parallel. Normally, scrypt requires lots of memory per instance, which prevents too many instances from being run in parallel like that. However, after the trade-off this becomes possible again (like it is with other typical password hashes), so we do have parallelism regardless of whether it is present in a single instance or not.

> greatly increases computational effort for reducing memory requirements significantly

Yes, this is what makes it almost a non-issue, for proper settings - but it may affect what settings are the proper ones.

Thank you for researching and posting this!

solardiz’s picture

Here's the response from Colin Percival: http://www.openwall.com/lists/crypt-dev/2011/07/01/2

The unexpected (to me) part is:

"I'm actually planning on adding this support to my scrypt code, in order to
avoid the "you don't have enough RAM to compute this hash" error; but I
haven't had the time yet."

Here "this support" refers to making use of the trade-off.

ircmaxell’s picture

@solardiz:

Well, that makes me happy that my analysis was correct (in principle at least). I figured it wasn't a big revelation, but it was worth pointing out explicitly.

I would assume that with an ASIC, the memory latency would not be nearly as bad as it is in a CPU (General Purpose CPU) or a GPU. Therefore it should be slightly more efficient (by only a few percent) at computing the scrypt hash than the CPU would be.

Oh, and I've subscribed to both the crypt-dev and the oss-security lists. So if you want to take the conversation there (as it seems more context appropriate than here), I'm game for that.

caston’s picture

@ircmaxell,

I found your blogpost very interesting. I am trying to find out if it is possible to mine a scrypt based crypto-currency with GPUs and I found your pseudocode to help reduce the memory usage.

I was wondering if you had attempted to introduce your code into the scrypt.c file yet?

I had a look but it is well over my head. The cpu miner we are using currently is:

https://github.com/ArtForz/cpuminer

I have also posted here:

http://www.gat3way.eu/forum/viewtopic.php?f=5&t=68

The website of the currency is here:

http://litecoin.org/

xqus’s picture

FileSize
2.07 KB

Since CRYPT_BLOWFISH is always available as of PHP 5.3.0, using bcrypt as default hashing method is easy to implement.

Mark Trapp’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, drupal-1201444.patch, failed testing.

tstoeckler’s picture

+++ b/core/includes/password.inc
@@ -193,23 +193,43 @@ function _password_get_count_log2($setting) {
+ * Generate a bcrypt hashed password.
...
+ * Hash a password using a secure hash.

These should start with verbs in the third person (i.e. "Generates" and "Hashes")

+++ b/core/includes/password.inc
@@ -193,23 +193,43 @@ function _password_get_count_log2($setting) {
+    return false;
...
+    return false;

We use "FALSE" rather than "false".

Also, is there any point in splitting this out into two functions? (I'm not saying there isn't, just asking.)

Owen Barton’s picture

Also note that Drupal 8 would need to require 5.3.7+ to avoid CVE-2011-2483.

webchick’s picture

Note: jkmickelson followed-up at #29706-287: More secure password hashing with a password.inc that does bCryptic hashing (as well as a nice compliment!). He has been directed to this issue.

webchick’s picture

Also while I'm at it, #1751100: Bump minimum version of php required to 5.3.5 aims to bump Drupal's minimum PHP version to 5.3.5, and there is some research there on the general availability of PHP in various distros.

jkmickelson’s picture

FileSize
11.95 KB

[revised excerpt from post on closed thread]
I scoured the Drupal-sphere for a D7 bcrypt enabled password.inc file and found none. Thus, I wrote and tested my own "bcryptic_password.inc" based on the scattered contributions of the top-most expert hash specimens available, including Drupal's own forward-thinking password.inc.

This bcryptic_password.inc file will upgrade existing hashes to bcrypt as users log in, while fully validating Drupal's existing sha512 hash. The included bCryptic PHP class can be used in other GPL/non-GPL systems, as it is presented into the public domain (just as I received the original specimen). [bCryptic on github.com]

This offering is not meant to stir up trouble, nor any feelings whatsoever. I am giving back in a quiet corner where the Drupal security experts dwell, and where keen, discerning eyes refine logic, technique, and execution. This is a vital thread where other like-minded professionals still come searching for answers.

[Note: I did not discover the earlier patch listed in #29 until I was directed here]

Owen Barton’s picture

Thanks for your offering!

An interesting historical summary is that the Drupal password.inc logic is based on phpass (http://www.openwall.com/phpass/), which does actually support bcrypt based hashes (only falling back to md5 based salted and stretched hashes if bcrypt is not available). The md5 hashes were replaced with the sha512 based hashes to simplify some security audits. The bcrypt hashes were dropped fairly early on, because a site could become unusable with no easy recourse if a site was moved from a server supporting bcrypt to one without (new users would be ok thanks to the fallback, but existing users passwords could not be checked).

I think the next step is to decide if we can require 5.3.7 for Drupal 8 on the other issue. If we do then we should investigate the phpass and bcrypic implementations.

killua99’s picture

What about use this lib https://gist.github.com/1070401 or add this code.

hash_hmac('snefru256', microtime() . $bytes, $key, true); 
$hash = crypt($input, $this->getSalt());

I guess this function can be changed.

  /**
   * Implements Drupal\Core\Password\PasswordInterface::checkPassword().
   */
  public function check($password, $account) {
    if (substr($account->pass, 0, 2) == 'U$') {
      // This may be an updated password from user_update_7000(). Such hashes
      // have 'U' added as the first character and need an extra md5() (see the
      // Drupal 7 documentation).
      $stored_hash = substr($account->pass, 1);
      $password = md5($password);
    }
    else {
      $stored_hash = $account->pass;
    }

    $type = substr($stored_hash, 0, 3);
    switch ($type) {
      case '$S$':
        // A normal Drupal 7 password using sha512.
        $hash = $this->crypt('sha512', $password, $stored_hash);
        break;
      case '$H$':
        // phpBB3 uses "$H$" for the same thing as "$P$".
      case '$P$':
        // A phpass password generated using md5.  This is an
        // imported password or from an earlier Drupal version.
        $hash = $this->crypt('md5', $password, $stored_hash);
        break;
      default:
        return FALSE;
    }
    return ($hash && $stored_hash == $hash);
  }

sourcer: http://stackoverflow.com/questions/401656/secure-hash-and-salt-for-php-p...

ircmaxell’s picture

I would recommend switching to the 5.5 API for upcoming 8 (if 5.3.7 can be required). If so, here's the compatibility layer (back to 5.3.7, which works transparently with 5.5): https://github.com/ircmaxell/password_compat

Pancho’s picture

Category: feature » task

#1800122: Bump minimum version of php required to 5.3.10 has finally landed, so feel free to move this forward!

Also, recategorizing as task, as this is about a security related replacement or refactoring of current functionality.
To qualify as a bug that might be fixed at any point, you'd need to substantiate a specific security flaw with the current code. Otherwise the time might be short for D8 Core.
Besides D9 Core, this could also be backported to a D8 contrib module, which is much less time-critical. If that approach should require any kind of enabler in D8 Core, that shouldn't be a problem at all.

pwolanin’s picture

looking at https://wiki.php.net/rfc/password_hash thnaks @ircmaxwell.

Given that this is in the 5.5 release officially, I'd support converting D8 to use this for new/re-hashed password while keeping the existing code in core so that we can simplify upgrade and import from phpass.

mgifford’s picture

So is there an opportunity to improve password hashing in D8 after July 1st?

slashrsm’s picture

Do we still have to do anything related to this in D8?

Crell’s picture

slashrsm: What we should do, it sounds like, is pull in ircmaxell's BC library for 5.4 users and start using password_hash(). Then setup a migration process per #41.

slashrsm’s picture

Is it realistic to expect that pulling BC library will be approved by the community?

Crell’s picture

Why wouldn't it be? It's way smaller than the other 3rd party libraries we've got in core now. :-) As long as the sec team is on board I can't imagine a great deal of pushback.

pwolanin’s picture

I'm a little confused looking at https://github.com/ircmaxell/password_compat and http://hu1.php.net/manual/en/function.crypt.php

If crypt() supports BCRYPT, they don't make that very clear in the docs. Or is that system-dependent?

I think maybe the latter, then this is the exact reason why we didn't use it for passwords in the past.

Owen Barton’s picture

Yes - the original phpass supported (and recommended) CRYPT_BLOWFISH - it just fell back to the md5 based salting/stretching if that was not available. Drupal dropped the CRYPT_BLOWFISH support however as that it was system dependent and hence CRYPT_BLOWFISH based passwords weren't portable at the time, in the sense that if you moved a site to a different hosting provider all your passwords may stop working.

Since php 5.3.0 however CRYPT_BLOWFISH has been built in to php (and hence portable), and since 5.3.7+ it has been secure (and hence a realistic option).

The advantage of using password_compat over raw crypt() in our own salting/stretching routines is that (a) it provides a pluggable mechanism written by an expert that is built into php 5.5+ as a standard API, (b) in php 5.5+ it uses C based salting/stretching which are slightly more efficient (hence allowing more iterations for the same cost and better defending against attackers running C code).

mgifford’s picture

Ok, so do we just need to re-roll @xqus' patch in #29?

Where are we in getting a patch to have D8 start using CRYPT_BLOWFISH?

killua99’s picture

Sound great D8 start using this kind of password hash

sun’s picture

As has been mentioned earlier, the ultimate outcome of the in-depth discussion between @ircmaxell + @solardiz seems to have been the incarnation of https://wiki.php.net/rfc/password_hash

That's natively available in PHP 5.5 now: http://php.net/manual/en/function.password-hash.php

1. We should use it natively on PHP 5.5+.
2. We should use @ircmaxell's backport on PHP 5.4.
3. We need to figure out how to migrate custom hashed passwords from D7.

1+2 are trivial. 3 is more complex. Let's use a dedicated issue for that:

#1845004: Replace custom password hashing library with PHP password_hash()


Regarding step 3. above, at least I want to deeply apologize to @solardiz in the name of the Drupal community for completely hi-jacking the initiative of phpass by not only Doing It Wrong™ but also setting an example for Doing It Wrong™, which apparently caused other projects to follow Drupal's "lead." — Despite claims in the original issue that manipulated the phpass library, our hack was definitely not endorsed by everyone and I can assure that there are many people who deeply regret/hate that this change ever happened.

Thank you, @solardiz, for sharing your in-depth expertise in this field with the Drupal community. :)

Of course, also thank you, @ircmaxell, for driving the effort to make PHP core more secure by default!

gaurav.goyal’s picture

+1. Subscribing.

Crell’s picture

Since this is security hardening I'm not sure if it's beta-safe. Given that password_hash() is available in every currently supported version of PHP, however, (Remember, 5.4 loses all support this year), I do believe we should go through with this switch. The only challenge is the migration path. Is anyone willing to work on that, committers-willing? sun, are you?

killua99’s picture

Now here we're 2y ago since @ircmaxell suggested to move D8 to PHP 5.5 and use the Core solutions [#39] .

Now is quite hard still not complicated to move to PHP 5.5. In a few weeks will be a Drupal Dev Day in France, there we can discuss about to move D8 to PHP 5.5 and use core solutions for hash the password. I guess we can do it.

We should start a discussion and propose sprint in Drupal Dev Days to make it happen.

claudiu.cristea’s picture

claudiu.cristea’s picture

Guys, there's a green patch waiting for your review in #1845004-16: Replace custom password hashing library with PHP password_hash(). Thank you!

claudiu.cristea’s picture

Status: Needs work » Closed (duplicate)

This issue contains valuable discussion about hashing. However the solution has moved in #1845004: Replace custom password hashing library with PHP password_hash(). I'm closing this in favor of the other.