## Recovering a HD wallet from a partial seed phrase

6

3

I am attempting to help recover a wallet where the owner only wrote down 11 of the 12 words in the seed phrase. Initially, I thought the task would be a quick and well-defined, but it appears to be a bit more complex than I assumed, and reference material is pretty scant. In the event that someone else has a similar issue to mine, I want to leave this post detailing the steps I followed (with working code samples).

The wallet I'm dealing with is Breadwallet, which apparently uses a different (older) mnemonic-to-HD-master-private-key derivation strategy from most modern wallets. For the moment, I'm only going to focus on recovering partial Breadwallet phrases, but I plan to eventually expand the answer to cover newer derivation strategies (BIP44) as well.

This is really good stuff. How would it work if was using a 13 word Electrum (2.x) wallet, where only the first seed word was missing? So we have position, but additional variability of the differences in entropy etc. with 13 versus 12... – east-end-aa – 2016-12-18T01:30:30.947

With one word missing, and a known position, regenerating the seed is easy. The checksum is built into the seed, so there's very few words that even have a possibility of working (likely only a handful of seeds you'd even need to try) – LivingInformation – 2016-12-18T01:31:55.263

You should not be asking questions in an answers section. – rny – 2016-12-22T23:18:23.507

6

(The language used in this post is Python)

Breadwallet uses BIP39 to generate the 128-bit master seed from the 12-word mnemonic. The master seed is then used to generate a set of wallets/accounts containing chains of addresses, using BIP32.

First off, import hashlib and binascii, we're going to need them later.

import hashlib
from binascii import hexlify, unhexlify


Lets assume you have 11 of the 12 words in your seed phrase. For simplicity's sake, I'll use the first 11 words in the BIP39 wordlist:

partial_seed_phrase = [ 'abandon', 'ability', 'able', 'about', 'above', 'absent', 'absorb', 'abstract', 'absurd', 'abuse', 'access' ]


The wordlist contains 2048 entries, which gives each word 11 bits of entropy (211 = 2048). The 12 words have 12*11 = 132 bits of entropy in total. The HD master seed is 128 bits in length, and there's a 4-bit checksum attached to the end, which brings the total bits up to 132. So far so good.

If we assume that the wordlist is a 2048-element list (omitted due to space constraints), we can find the index (in decimal) of the elements in partial_seed_phrase:

mnemonic_in_decimal = map(wordlist.index, partial_seed_phrase)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


Let's convert mnemonic_in_decimal into an array of 11-bit wide binary numbers.

mnemonic_in_binary = map('{0:011b}'.format, mnemonic_in_decimal)
# ['00000000000', '00000000001', '00000000010', '00000000011', '00000000100', '00000000101', '00000000110', '00000000111', '00000001000', '00000001001', '00000001010']


We know that a single word (11 bits) is missing from some unknown location in this array. In non-ideal circumstances, we would have to check each of the 12 locations for the missing word against 2048 possible words each, for a total of 24576 (12*2048 = 24576) potential master seeds.

for missing_word_position in range(0,12):
# The missing word belongs at some index from 0-11 in the final 12-word phrase

for wordlist_index in range(0, 2048):
# Iterate over all possibilities for the missing word

missing_word_binary = '{0:011b}'.format(wordlist_index)
front_half          = ''.join(mnemonic_in_binary[0:missing_word_position])
back_half           = ''.join(mnemonic_in_binary[missing_word_position:12])
seed_and_checksum   = front_half + missing_word_binary + back_half

seed     = seed_and_checksum[0:128]
checksum = seed_and_checksum[-4:]


Thankfully, we have a 4-bit checksum, which means only one in every 16 seeds (24 = 16) will be valid. This means we will end up with a final total of approximately 1536 master seeds (24576/16 = 1536) to check for funds. The checksum is derived from the first bits (in this case, 4) returned by applying the SHA-256 hash function to the seed, so the final number of valid master seeds can vary, but will average out at around 1/16th of the total possible seeds.

[ More to come later, if anyone wants to help write descriptions or code for any of the following steps, I would appreciate it! ]

To do:

1. Calculate the actual_checksum from the first 4 bits of sha256(seed)
2. compare checksum to actual_checksum. If they are equal, push the seed into an array of valid master seeds.
3. Calculate the master node, which is HMAC-SHA512(seed)
4. Calculate an account from the master node
5. Calculate a wallet chain from the account
6. Calculate the first 5 private keys in the wallet chain
7. Calculate the first 5 public keys from those private keys
8. Write up a description and steps for recovering partial phrases for BIP44 wallets
9. Write a program to query either a blockchain locally, or a blockchain explorer API online. Pass it the combined list of generated public keys, and see if any have a balance.