Password vs. Passphrase redux

So today Jesper Johannson a gentlemen whom I have the pleasure of speaking with on occasion has posted his 2nd installment on the topic of passwords here.  I encourage you all to read it - in this installment he goes deep into the math and science behind passwords and pass-phrases and attempts to measure the strength of XX character passwords and XX character pass-phrases using personal experience from his own personal pen-testing efforts and studies from reputable sources that he's managed to find and cite (I'm not quite sure how he manages to do it - this man travels more and is busier than many people I know yet he still manages to find time to do his homework and write these amazing articles).

If you haven't read his article yet - here's a teaser:

If there are 5 characters per word, we have 25+4=29 characters, where 4 are spaces, in the pass phrase. How much entropy that pass phrase contains depends on whose estimates you use. Using Shannon’s estimates of 2.3 bits per letter in an 8-letter word nets a total entropy of 29*2.3=66.7 bits. The 66.7 bits calculation is probably a reasonable upper bound on the entropy of a pass phrase, and it compares favorably with a 9-character password with only 45 bits of entropy. For a lower bound, we can use Bruce Schneier’s estimate of 1.3 bits per letter, based on a study by Thomas Cover (B. Schneier, “Applied Cryptography, 2nd Edition,” Wiley, 1996). Shannon advanced 1.3 bits per letter for 16-letter words, though, so it is probably not entirely applicable to our 5-character words. In any case, using 1.3 as the entropy estimate computes to 29*1.3= 37.7, which is actually worse than the 9-character password. Based on that number, you would need a 6-word pass phrase to attain roughly the same entropy as a 9-character password.

I'm really glad Jesper has taken the initiative to dive into the math and science behind our language and calculating entropy and this was in fact a popular point made in many of the replies to my blog - people wanted me to address this but I knew Jesper was working on it so I of course deferred to him . . . Jesper makes the assumption of 5 characters per word and a space between each word - but what about punctuation?  I believe Jesper is assuming that a pass-phrase will be composed of just 5 random words strung together with spaces (i.e. the user has substituted characters as the 'token' in their passwords with random words so instead of using random letters / numbers / symobls you're using random words).

In my first blog on passwords, I was talking about pass-phrases (passwords composed of words) but I really wanted people to start using full sentences with proper punctuation and everything as their passwords.

It would be interesting to see some math behind using proper sentences as your pass-phrases (which is what I personally do as much as possible).  How many words are in the average proper sentence and would a user be more likely to remember a longer sentence over a shorter 'random word' pass-phrase composed of 5 random words?  I believe that a user is more likely to remember "I took my dog to the vet and he's got fleas!" vs. "cow moon cars women security" (wow, that should give you some great insight into how my brain works! <G>) and in addition to the former being longer (and thus containing more entropy) than the latter, it's easier to remember and thus more usable to boot!

Don't forget, sentences usually end with an extra character (like a period, question mark or exclamation point) so that adds one extra character that must be accounted for when doing calculations on the entropy of a passphrase (and what if the sentence is a quote and the user surrounds the qoute with "double quote" characters?).  Looking back at some of my recent pass-phrases that are sentences, I have a couple that are composed of way more than 5 words (i.e. "If we weren't all crazy, we would go insane!") but I have also have some that are shorter than 5 . . .

Jesper does point out in his article that sentences may be weaker than pass-phrases composed of random words due to assumptions that can be made about word grouping and the English language (ever watch Wheel of Fortune?  If so you're probably good at this).  In addition he also points out that if the world moves to pass-phrases then cracking tools will merely adapt to this new paradigm by incorporating words instead of characters as the symbols to try in cracking . . .

I say bring it on . . . I know how to make pass-phrases, based on sentences that are both extremely long AND easy to remember that make the computational power needed to crack (using lookup tables) mind boggling and un-attainable.  "If we weren't all crazy, we would go insane!" contains 44 characters with spaces, is easy to remember and easy to type for a touch typist like myself and it contains anywhere from 44*1.3=57.2 bits of entropy (on the low end) to 44*2.3=101.2 bits of entropy perhaps on the mid to high end and either way this compares very favorably to a completely random mess of 9 characters with 5 bits of entropy per character (45bits total).

Finally - there have been some posts to certain, ahem, "security lists" calling into question the strength of Windows password hashes (of *any* length) - no doubt the work of some of those 'chainsaw conultants' that I referred to in my original post with all of the standard misconceptions about how Windows stores password hashes.  Let me set the record straight.  The NT password hash, is a straight MD4 of whatever the user types in when they are asked for their password.  Microsoft did not invent the MD4 algorithm (I believe a Mr. Ron Rivest did, the 'R' in RSA) and I am not aware of any weaknesses in our implementation of the MD4 algorithm that would make it weaker than others.  The problems involving the LM hash do not apply in any way to the NT hash and they should not be confused with each other in discussions about password hashes, they are two completely different hashes using two completely different algorithms. 

Regarding 'salt' - it is true that Windows does not use 'salt' when generating / storing the hashes on disk, we chose to take a different approach.  The password hashes that are stored on disk are encrypted with 128 bit encryption - to get at the password hash on disk from an offline system, you'd first have to crack the 128bit encryption used to protect the hash (called the SYSKEY).  This symetric key is by default stored in the registry but you can take that key, make it a pass-phrase and store it in your head if you're really worried about physical offline attacks.