This is a detour from the last couple of posts; I will pick up the blackjack thread again later. I was distracted recently by a desire to “unprotect” a password-protected Excel spreadsheet in order to view some hidden cell contents. It turns out this is not a new problem, but it is an interesting exercise, with an opportunity to improve on what seems to be the “standard” solution.
OpenOffice.org (PDF, see Section 4.18.4) provides a description of the password protection algorithm, which is not very strong. (This is not surprising, since it’s not intended to prevent disclosure. As the help documentation points out, it’s really just a safety net for editing, “to help prevent others from changing, moving, or deleting important data.”) When you enter a password to protect a worksheet, Excel does not actually store the password itself, only a 16-bit hash of the password. When you subsequently enter the password to attempt to unprotect the worksheet, Excel simply compares the hash of the password you just entered with the stored hash value.
There are about 10^30 different possible passwords with a maximum of 15 characters. But there are only 32,768 different possible hash values, so collisions should be easy to come by. That is, we don’t have to recover the original password to unprotect the worksheet, we only have to enter a password with the same hash value. We can brute-force attack the problem by trying passwords with all possible hash values.
The hash function is simple to describe: compute the bit-wise exclusive-OR of (1) the constant 0xCE4B, (2) the password length, and (3) the ASCII value of each character of the password, as a 15-bit value rotated left bit-wise by its position. That is, rotate the first character 1 bit left, the second character 2 bits left, etc. The following Python code implements this function:
def excel_hash(password): """Return hash of given string, in the range [0x8000, 0xffff].""" h = 0 for ch in password[::-1]: h = h ^ ord(ch) h = ((h << 1) & 0x7fff) | (h >> 14) return h ^ len(password) ^ 0xce4b
There is a solution that has been posted and re-posted many times, that essentially tries all 12-character passwords where each of the first 11 characters are A or B (and the 12th is any printable ASCII). This is guaranteed to yield all 32,768 possible hash values. It works. So that should be the end of it… except I’m not sure why this particular enumeration of passwords was chosen, since it isn’t clear to me that it “covers” all possible hash values in any structured way.
First, it is more exhaustive than it needs to be. That is, I thought perhaps this approach might have been the minimal result of a simple blind empirical search, trying longer and longer passwords of the form “[A|B]*.” (with the last printable character to “fill in the gaps”) until all possible hash values were covered. But this isn’t the case, since we can make the solution 4 times faster by using just 10-character passwords– i.e., where the first 9 are A or B, and the 10th is any printable character.
Second, why use the characters A and B? The ASCII values for A and B are 65 and 66, respectively, meaning that we are toggling two consecutive bits with each character, not just one, and those pairs of changing bits for consecutive password characters overlap in position, which doesn’t seem helpful.
My first thought was, “The dumbest thing that could possibly work is to use 15-character passwords, where each character is, say, either B or C.” This way, each character toggles exactly one distinct bit in the hash, so that we try exactly one password for each possible hash value. And indeed, this approach does work, and is about 6 times faster than the commonly-cited solution.
But instead of a 15-level nested for-loop, we can shorten the code somewhat by just varying three characters in the password, with each character exploring 5 bits of the hash. The only trick is (1) keeping the characters’ ASCII values in the printable range, and (2) “rotating” those three characters so that their blocks of 5 bits do not overlap, by inserting constant password characters between them.
The result is the following code, with instructions on how to use it:
- Open the worksheet that you want to unprotect.
- Press Alt-F11 to open Visual Basic.
- Press F7 to open an editor window.
- Copy/paste the code below into the editor window.
Sub FindPassword() Dim c1 As Integer, c2 As Integer, c3 As Integer Dim passwd As String On Error Resume Next For c1 = 64 To 95 For c2 = 64 To 95 For c3 = 64 To 95 passwd = Chr(c1) & "...." & Chr(c2) & "...." & Chr(c3) ActiveSheet.Unprotect passwd If ActiveSheet.ProtectContents = False Then MsgBox "Unprotected using password: '" & passwd & "'" Exit Sub End If Next Next Next End Sub
5. Finally, press F5 to run. A dialog is displayed indicating a valid password. The worksheet is now unprotected.