## Password protection of worksheets in Excel

This is a detour from the last couple of posts; I will pick up the blackjack thread again later.  I was distracted recently by a desire to “unprotect” a password-protected Excel spreadsheet in order to view some hidden cell contents.  It turns out this is not a new problem, but it is an interesting exercise, with an opportunity to improve on what seems to be the “standard” solution.

OpenOffice.org (PDF, see Section 4.18.4) provides a description of the password protection algorithm, which is not very strong.  (This is not surprising, since it’s not intended to prevent disclosure.  As the help documentation points out, it’s really just a safety net for editing, “to help prevent others from changing, moving, or deleting important data.”)  When you enter a password to protect a worksheet, Excel does not actually store the password itself, only a 16-bit hash of the password.  When you subsequently enter the password to attempt to unprotect the worksheet, Excel simply compares the hash of the password you just entered with the stored hash value.

There are about 10^30 different possible passwords with a maximum of 15 characters.  But there are only 32,768 different possible hash values, so collisions should be easy to come by.  That is, we don’t have to recover the original password to unprotect the worksheet, we only have to enter a password with the same hash value.  We can brute-force attack the problem by trying passwords with all possible hash values.

The hash function is simple to describe: compute the bit-wise exclusive-OR of (1) the constant 0xCE4B, (2) the password length, and (3) the ASCII value of each character of the password, as a 15-bit value rotated left bit-wise by its position.  That is, rotate the first character 1 bit left, the second character 2 bits left, etc.  The following Python code implements this function:

```def excel_hash(password):
"""Return hash of given string, in the range [0x8000, 0xffff]."""
h = 0
h = h ^ ord(ch)
h = ((h << 1) & 0x7fff) | (h >> 14)
return h ^ len(password) ^ 0xce4b
```

There is a solution that has been posted and re-posted many times, that essentially tries all 12-character passwords where each of the first 11 characters are A or B (and the 12th is any printable ASCII).  This is guaranteed to yield all 32,768 possible hash values.  It works.  So that should be the end of it… except I’m not sure why this particular enumeration of passwords was chosen, since it isn’t clear to me that it “covers” all possible hash values in any structured way.

First, it is more exhaustive than it needs to be.  That is, I thought perhaps this approach might have been the minimal result of a simple blind empirical search, trying longer and longer passwords of the form “[A|B]*.” (with the last printable character to “fill in the gaps”) until all possible hash values were covered.  But this isn’t the case, since we can make the solution 4 times faster by using just 10-character passwords– i.e., where the first 9 are A or B, and the 10th is any printable character.

Second, why use the characters A and B?  The ASCII values for A and B are 65 and 66, respectively, meaning that we are toggling two consecutive bits with each character, not just one, and those pairs of changing bits for consecutive password characters overlap in position, which doesn’t seem helpful.

My first thought was, “The dumbest thing that could possibly work is to use 15-character passwords, where each character is, say, either B or C.”  This way, each character toggles exactly one distinct bit in the hash, so that we try exactly one password for each possible hash value.  And indeed, this approach does work, and is about 6 times faster than the commonly-cited solution.

But instead of a 15-level nested for-loop, we can shorten the code somewhat by just varying three characters in the password, with each character exploring 5 bits of the hash.  The only trick is (1) keeping the characters’ ASCII values in the printable range, and (2) “rotating” those three characters so that their blocks of 5 bits do not overlap, by inserting constant password characters between them.

The result is the following code, with instructions on how to use it:

1. Open the worksheet that you want to unprotect.
2. Press Alt-F11 to open Visual Basic.
3. Press F7 to open an editor window.
4. Copy/paste the code below into the editor window.
```Sub FindPassword()
Dim c1 As Integer, c2 As Integer, c3 As Integer
Dim passwd As String
On Error Resume Next
For c1 = 64 To 95
For c2 = 64 To 95
For c3 = 64 To 95
passwd = Chr(c1) & "...." & Chr(c2) & "...." & Chr(c3)
ActiveSheet.Unprotect passwd
If ActiveSheet.ProtectContents = False Then
MsgBox "Unprotected using password: '" & passwd & "'"
Exit Sub
End If
Next
Next
Next
End Sub
```

5. Finally, press F5 to run.  A dialog is displayed indicating a valid password.  The worksheet is now unprotected.

This entry was posted in Uncategorized. Bookmark the permalink.

### 4 Responses to Password protection of worksheets in Excel

1. After playing around with this I got an idea: do you think there’s a closed-form solution, either to every hash or at least many/most?

If we assume there exists a password of 8 or fewer characters, we can determine if a password length for a given hash was even or odd based on the LSB of the hash. 1 means even, 0 means odd. For any odd-length-password hash, XOR the target hash with the magic number and an assumed length of 7. I choose 7 because that spreads over all the 15 hash bits without rotating any bits. The result is the target, called x, we want to find after hashing the password letters.

Call the letter at password position 1 a, at 2, b, etc. a0 refers to the LSB of a and a6 the MSB (7-bit ASCII). We already know “a0 = x1″, because the only contributor to x1 is a0. We also know “a1 = x2 ^ b0″ and “a2 = x3 ^ b1 ^ c0″. Continuing along we can set up a system of 49 (7bits * 7chars) equations involving all 49 unkowns. If we add further contstraints to remove control characters (0-31 and 127), shouldn’t this be solvable?

• If I understand correctly, you are asking whether we can “invert” the hash function? That is, if we actually knew the 16-bit hash value x (which might be easily found and extracted from the .xls file, I’m not sure), could we *directly* compute a password w that would work, i.e. h(w)=x?

I think we can, since the given code provides such a recipe for an 11-character password. (Given the hash value x, XOR with 11 (the length), the constant 0xce4b, and the “constant” rotated ASCII values for the 8 ‘.’ placeholders in the password template. Divide the result into 5-bit groups in positions 1-5, 6-10, and {11,12,13,14,0}, add 64 to each of these 0-31 values, and insert the resulting ASCII characters in the template “1….2….3″.)

So “closed form” is in the eye of the beholder :). It’s an interesting question whether we can do this with shorter passwords. (Note, for example, that 7 characters isn’t actually enough; we need at least 9 characters in the worst case to affect bit 0 of the hash.)

2. silkfire says:

According to your description of the hash value generation algorithm, you say that “and (3) the ASCII value of each character of the password, as a 15-bit value rotated left bit-wise by its position. That is, rotate the first character 1 bit left, the second character 2 bits left, etc.”
But in your Python code (although I know your code is fully functional), why do you not adhere to the position of the characters, i.e. its index within the char array? You statically just rotate each character 1 (one) step to the left, using <> 14. Or did I miss something here?

• You’re right, the code does not read quite like the description in the post. If you look at the pseudo-code in the OpenOffice requirements document (see link in the post, section 4.18.4), you’ll see a similar single-bit rotation per iteration there as well.

Think about what the code would have to look like to rotate each character individually by varying numbers of positions– the shift is easy, but wrapping the high-order bits back to the low-order positions gets messy. (And it would be even messier in, say, C++, where you have limited word width to worry about.) Try it, you’ll see what I mean.

So instead of rotating each input character by incrementally more positions to XOR into the output hash, the trick is to *rotate the output*… but for this to work, we have to visit the input characters *in reverse order* (note the [::-1] slice). This way, the first character gets rotated the least, and the last character gets rotated the most, as the text description of the algorithm suggests.