This is a detour from the last couple of posts; I will pick up the blackjack thread again later. I was distracted recently by a desire to “unprotect” a password-protected Excel spreadsheet in order to view some hidden cell contents. It turns out this is not a new problem, but it is an interesting exercise, with an opportunity to improve on what seems to be the “standard” solution.

OpenOffice.org (PDF, see Section 4.18.4) provides a description of the password protection algorithm, which is not very strong. (This is not surprising, since it’s not intended to prevent *disclosure*. As the help documentation points out, it’s really just a safety net for *editing*, “to help prevent others from changing, moving, or deleting important data.”) When you enter a password to protect a worksheet, Excel does not actually store the password itself, only a 16-bit hash of the password. When you subsequently enter the password to attempt to unprotect the worksheet, Excel simply compares the hash of the password you just entered with the stored hash value.

There are about 10^30 different possible passwords with a maximum of 15 characters. But there are only 32,768 different possible hash values, so collisions should be easy to come by. That is, we don’t have to recover the *original* password to unprotect the worksheet, we only have to enter a password with the same hash value. We can brute-force attack the problem by trying passwords with all possible hash values.

[*Edit*: Unfortunately, it looks like Microsoft has gotten around to plugging this particular hole in the most recent versions of Excel (2016 where I have tested), now using SHA-512 instead of this home-grown hashing scheme. As a result, this brute-force approach is no longer feasible.]

The hash function is simple to describe: compute the bit-wise exclusive-OR of (1) the constant 0xCE4B, (2) the password length, and (3) the ASCII value of each character of the password, as a 15-bit value rotated left bit-wise by its position. That is, rotate the first character 1 bit left, the second character 2 bits left, etc. The following Python code implements this function:

def excel_hash(password): """Return hash of given string, in the range [0x8000, 0xffff].""" h = 0 for ch in password[::-1]: h = h ^ ord(ch) h = ((h << 1) & 0x7fff) | (h >> 14) return h ^ len(password) ^ 0xce4b

There is a solution that has been posted and re-posted many times, that essentially tries all 12-character passwords where each of the first 11 characters are A or B (and the 12th is any printable ASCII). This is guaranteed to yield all 32,768 possible hash values. It works. So that should be the end of it… except I’m not sure *why* this particular enumeration of passwords was chosen, since it isn’t clear to me that it “covers” all possible hash values in any structured way.

First, it is more exhaustive than it needs to be. That is, I thought perhaps this approach might have been the minimal result of a simple blind empirical search, trying longer and longer passwords of the form “[A|B]*.” (with the last printable character to “fill in the gaps”) until all possible hash values were covered. But this isn’t the case, since we can make the solution 4 times faster by using just 10-character passwords– i.e., where the first 9 are A or B, and the 10th is any printable character.

Second, why use the characters A and B? The ASCII values for A and B are 65 and 66, respectively, meaning that we are toggling *two* consecutive bits with each character, not just one, and those pairs of changing bits for consecutive password characters overlap in position, which doesn’t seem helpful.

My first thought was, “The dumbest thing that could possibly work is to use 15-character passwords, where each character is, say, either B or C.” This way, each character toggles exactly one distinct bit in the hash, so that we try exactly one password for each possible hash value. And indeed, this approach does work, and is about 6 times faster than the commonly-cited solution.

But instead of a 15-level nested for-loop, we can shorten the code somewhat by just varying *three* characters in the password, with each character exploring 5 bits of the hash. The only trick is (1) keeping the characters’ ASCII values in the printable range, and (2) “rotating” those three characters so that their blocks of 5 bits do not overlap, by inserting constant password characters between them.

The result is the following code, with instructions on how to use it:

- Open the worksheet that you want to unprotect.
- Press Alt-F11 to open Visual Basic.
- Press F7 to open an editor window.
- Copy/paste the code below into the editor window.

Sub FindPassword() Dim c1 As Integer, c2 As Integer, c3 As Integer Dim passwd As String On Error Resume Next For c1 = 64 To 95 For c2 = 64 To 95 For c3 = 64 To 95 passwd = Chr(c1) & "...." & Chr(c2) & "...." & Chr(c3) ActiveSheet.Unprotect passwd If ActiveSheet.ProtectContents = False Then MsgBox "Unprotected using password: '" & passwd & "'" Exit Sub End If Next Next Next End Sub

5. Finally, press F5 to run. A dialog is displayed indicating a valid password. The worksheet is now unprotected.

After playing around with this I got an idea: do you think there’s a closed-form solution, either to every hash or at least many/most?

If we assume there exists a password of 8 or fewer characters, we can determine if a password length for a given hash was even or odd based on the LSB of the hash. 1 means even, 0 means odd. For any odd-length-password hash, XOR the target hash with the magic number and an assumed length of 7. I choose 7 because that spreads over all the 15 hash bits without rotating any bits. The result is the target, called x, we want to find after hashing the password letters.

Call the letter at password position 1 a, at 2, b, etc. a0 refers to the LSB of a and a6 the MSB (7-bit ASCII). We already know “a0 = x1”, because the only contributor to x1 is a0. We also know “a1 = x2 ^ b0” and “a2 = x3 ^ b1 ^ c0”. Continuing along we can set up a system of 49 (7bits * 7chars) equations involving all 49 unkowns. If we add further contstraints to remove control characters (0-31 and 127), shouldn’t this be solvable?

If I understand correctly, you are asking whether we can “invert” the hash function? That is, if we actually knew the 16-bit hash value x (which might be easily found and extracted from the .xls file, I’m not sure), could we *directly* compute a password w that would work, i.e. h(w)=x?

I think we can, since the given code provides such a recipe for an 11-character password. (Given the hash value x, XOR with 11 (the length), the constant 0xce4b, and the “constant” rotated ASCII values for the 8 ‘.’ placeholders in the password template. Divide the result into 5-bit groups in positions 1-5, 6-10, and {11,12,13,14,0}, add 64 to each of these 0-31 values, and insert the resulting ASCII characters in the template “1….2….3”.)

So “closed form” is in the eye of the beholder :). It’s an interesting question whether we can do this with shorter passwords. (Note, for example, that 7 characters isn’t actually enough; we need at least 9 characters in the worst case to affect bit 0 of the hash.)

According to your description of the hash value generation algorithm, you say that “and (3) the ASCII value of each character of the password, as a 15-bit value rotated left bit-wise by its position. That is, rotate the first character 1 bit left, the second character 2 bits left, etc.”

But in your Python code (although I know your code is fully functional), why do you not adhere to the position of the characters, i.e. its index within the char array? You statically just rotate each character 1 (one) step to the left, using <> 14. Or did I miss something here?

You’re right, the code does not read quite like the description in the post. If you look at the pseudo-code in the OpenOffice requirements document (see link in the post, section 4.18.4), you’ll see a similar single-bit rotation per iteration there as well.

Think about what the code would have to look like to rotate each character individually by varying numbers of positions– the shift is easy, but wrapping the high-order bits back to the low-order positions gets messy. (And it would be even messier in, say, C++, where you have limited word width to worry about.) Try it, you’ll see what I mean.

So instead of rotating each input character by incrementally more positions to XOR into the output hash, the trick is to *rotate the output*… but for this to work, we have to visit the input characters *in reverse order* (note the [::-1] slice). This way, the first character gets rotated the least, and the last character gets rotated the most, as the text description of the algorithm suggests.

Hello,

Thanks for posting this code, I used it on protected worksheets that didn’t accept their normal password anymore and it worked wonderfully!

You saved me a lot of time 🙂

Thanks, this came in handy!

Here is a Python3 version of the code (I am using LibreOffice so no Excel-Basic) that prints a working password:

—-

—-

To find the password hash, look for “password=” after unzipping the excel file.

@Celos, I modified your comment to wrap your Python code with HTML “pre” tags.

Thank you!

I’m curious, why do you write this line: passwd = Chr(c1) & “….” & Chr(c2) & “….” & Chr(c3)? What is the purpose of those four periods and why does it still work?

https://msdn.microsoft.com/en-us/library/wfx50zyk.aspx

That means the first password tried is @….@….@ Each @ character has its 5 lower bits mutated by the three loops. Since the characters are bit-shifted by its position and then logically or’ed, the first two @ create all possible 15-bit-patterns for the ‘@….’ group. Changing the ‘.’ characters would only affect bits that are already being changed through all possible permutations.

The original solution does the same by making the first 10 characters being either A or B, thus permuting each bit position to 1 or 0. Since it seems to have an extra “bit” character, it would actually run through all 16-bit values where 15 bit suffice.

Correction:”5-bit-patterns”, not “15-bit-patterns”.

So the periods could just as well be space or any other character?

@silkfire– Yes. The periods are effectively just placeholders; replacing them with another character would still work, all possible passwords would be “visited,” just in a different order.

I tried your implementation now but Excel doesn’t seem to accept my password. I have an Excel 95 XLS workbook encrypted with password ‘password’. Its hash becomes 0x83AF (stored as 0xAF 0x83 in the BIFF record FILEPASS). So this means I can use alternative password ‘A….Z….@’, but Excel is not accepting it. What’s wrong?

There are a couple of possible issues; my first guess is that you are perhaps copy/pasting the ‘A….Z….@’ test into the password field? This doesn’t work, you must type the text of the password into the field.

The other issue is that this is only for password-protecting locked or hidden cells, as opposed to locking an entire workbook as “read-only.” (The latter is not useful anyway; just “save as” a copy of the file and edit the copy.)

Well actually I’ve protected the cells and opening the workbook. You can’t open it without entering a valid password. I’ve tried entering the alternative password manually and copy-paste. Does it work on your machine?

Protecting the entire workbook is another kettle of fish; from the OpenOffice format description document linked in the original post, in this case Excel “store[s] hash value *and* encryption key in FILEPASS record.” That is, the file itself is now encrypted, and decrypting requires more than just the hash, but also the (128-bit) encryption key created from the password.

I think you’re misunderstanding me. I don’t want to decrypt the document, I just want to be able to enter a password based on any hash, and make it work. As of now, I have not been successful. The manner I’m protecting the workbook in should nevertheless be irrelevant.

I think you’re misunderstanding me :). You can’t “make it work” since you can’t open the workbook *without* decrypting it, and unlike the *sheet* protection scheme, you have to provide a password that yields not only the same *hash*, but also the same derived 128-bit encryption/decryption key.

Lol. Then I don’t really see the point in this article 🙂

Really? You have never encountered a workbook where individual worksheets are protected? Where the workbook is viewable, or even usable/modifiable, but the content of formula cells are hidden, in an attempt to “protect the source code”? This was exactly the situation that motivated this post.

This worked for me a few months ago, but this time, I keep getting a “Not Responding” error in my title bar, and a crashed excel document. Any resolution for this?

Unfortunately, I was able to reproduce the behavior you describe. A quick web search suggests that more recent versions of Excel (I’m using 2016, but it looks like they cleaned this up in 2010 or maybe 2013) no longer use this hashing scheme, but instead use the more secure SHA-512. This means that our “space” of possible hash values is now much larger– 2^512 instead of just 2^15=32768, making this approach of trying the small number of pre-image passwords no longer feasible.