## What is the Merkle root?

58

45

The Bitcoin wiki Vocabulary article explains why the Merkle root exists:

Every transaction has a hash associated with it. In a block, all of the transaction hashes in the block are themselves hashed (sometimes several times -- the exact process is complex), and the result is the Merkle root. In other words, the Merkle root is the hash of all the hashes of all the transactions in the block. The Merkle root is included in the block header. With this scheme, it is possible to securely verify that a transaction has been accepted by the network (and get the number of confirmations) by downloading just the tiny block headers and Merkle tree -- downloading the entire block chain is unnecessary. This feature is currently not used in Bitcoin, but it will be in the future.

How can you check if a transaction has been verified only using Merkle roots? How does that mechanism work?

While I could grasp the definition of Merkle Tree and Root immediately, I struggled to figure out the larger context and their use, like many posts on this thread, until I did a bit more research. I try to explain a scenario here.

– RT Denver – 2018-03-27T22:46:29.370

59

The idea (as I understand it) is that the Merkle tree allows for you to verify transactions as needed and not include the body of every transaction in the block header, while still providing a way to verify the entire blockchain (and therefore proof of work) on every transaction.

To understand this, first understand the concept of a tree. Consider an 8 transaction block. Imagine each of those 8 transactions at the base of a pyramid: these are called leaves. Put four "branches" on the second tier of the pyramid and draw two lines from each of them to the leaves so that each branch has two leaves attached to it. Now join those four branches to two branches on pyramid level 3 and up to one branch (what is called the root of the tree) on the top of the pyramid. (Our tree is growing upside down in this example.)

Now we can start to understand the hashing process. Hash the hashes of the "leaves" and include that as part of the 2nd level branches that those leaves are attached to (these are called child nodes and parent nodes). Now hash the hashes of those hashes and include that as part of the third level branches. And so on. (And if you had more than 8 transactions, all you need are more levels to the pyramid.)

So now you have a root node that effectively has a hash that verifies the integrity of all of the transactions. If one transaction is added/removed or changed it will change the hash of its parent. Which will change the hash of its parent, and so on, resulting in the root node's hash (which is the Merkle root) changing as well.

So how does this help us with potentially not having to have the entire blockchain? Because we could verify the transactions as needed. If we have a transaction that claims to have been from block #234133 we can get the transactions for that block, verify the Merkle tree, and know that the transaction is valid. We can do that without necessarily knowing all of the transactions from #234132 or #234134 because we know that the blocks are tamper proof.

Even better, if we know where it is in the Merkle tree and we know the hashes of the branches we don't even need all of the transactions from #234132. (There were 868 in that block.) We start with just our transaction and its sibling (if it has one) and calculate the hash of those two and verify that it matches the expected value. From that we can ask for the sibling branch of that and calculate the hash of that and verify it. And continue with this process, up the tree. Which only takes ten verifications for 868 transactions. (That's one of the great things about trees, they can hold a lot of values with only a relatively small number of layers.)

How do we know that the source of this data isn't lying to us about the hash values? Because a hash function is one-way, there is no way that a deceptive party could guess a value that would hash with our second-to-last value to create the Merkle root. (Which we know from our verified blockchain.) This reasoning holds further down the tree: there's no way to create a fake value that would hash to our expected value. Another way to think about it, is that even a single alteration of a transaction at the base of the tree, would result in a rippling change to all the hash values of nodes in its branch all the way up to the root's hash value.

In short, the Merkle tree creates a single value that proves the integrity of all of the transactions under it. Satoshi could have just included the hash of a big list of all of the transactions in the Bitcoin header. But if he had done that that would have required you to hash the entire list of transactions in order to verify its integrity. With this way, even if there are an extremely large number of transactions the work you need to do (and the number of hashes you need to request/download) in order to verify the integrity is only log(O).

[As always, feel free to edit this. This is primarily just inference on my part from looking at the spec.]

A block header does not include the transaction ids from the transactions in the block, does it? So basically the idea of the last part of the quote will only work if txid's were included in the block headers. – Steven Roose – 2013-12-10T19:26:09.363

It reads "block header and merkle tree". That makes more sense. Does the original protocol allow for requesting merkle trees and/or headers including them? – Steven Roose – 2013-12-10T19:27:22.667

1What if we do not know the block# of the transaction. In that case are we require to iterate through all blocks on the block chain? @David Ogren – alper – 2017-12-05T14:07:18.013

Maybe this is a bad question but what if I find two certain transactions with equal hashes with birthday attack and do one of those transactions and later claim that I had done the other one. How can I be proved wrong? – tgwtdt – 2018-07-08T17:43:34.470

1It's too long to answer your question here @tgwtdt. In short, you can't execute a birthday attack because you don't have arbitrary control over inputs. Second, even a birthday attack on SHA-256 isn't realistically possible. But, in general, yes, if you can find a way to exploit SHA-256 then you can do all kinds of nasty things within bitcoin: the difficulty of reversing the hash algorithm is a founding principle. On the other hand, hash algorithm security is a very well researched field. – David Ogren – 2018-09-09T01:49:11.397

26

"Figure 7-2. Calculating the nodes in a merkle tree" from Mastering Bitcoin shows the Merkle Root (HABCD) of a list of four transactions: Tx A, Tx B, Tx C, and Tx D:

To verify that a transaction—for example, that with hash HK—is a valid transaction (i.e., part of a list of, in this example, 16 transactions with hashes HA, HB, … HP), one need only perform at most 2*log2(N) < N hashes, shown in the Merkle path here:

If HK leads to the correct Merkle root, then TK was in the transaction list.

And the Merkle path, needed to verify Hk corresponds with the Merkle root, only contains 4 hashes in the above example. The Merkle path takes up much less space than storing all the transactions in a block. (In the example above: 4 hashes takes much less space than 16.) This is why SPV is lighter-weight.

In this case N = 16, and 2*log2(16) = 5.55… is indeed less than 16.

The 2nd diagram is the only thing you need to understand everything. The "green" hash H_K is the /claim/ given to the PAYEE (who also has the Merkle root). The VERIFIER (full node) sends the "blue" hashes as /proof/, because they can be used to calculate all of the "blue dashed" hashes all the way up to the Merkle root. If the calculated Merkle root matches the known Merkle root, H_K is in the block. Obviously, the "blue" hashes are a small subset of ALL the hashes, i.e. 2*log_2(N) is less than the whole set N for N > 4. Graph it for yourself at desmos.com, with y=2log(N)/log(2) vs y=N. – Paul Parker – 2019-11-06T01:17:35.383

To verify that a transaction: How do we know the exact location of Hk on the Merkle Tree? @Geremia – alper – 2017-12-05T14:19:28.090

@Avatar To construct Merkle paths from scratch requires knowing all the transactions. Also, forge a fake Merkle path that corresponds to a given Merkle root would be even more difficult than to crack SHA256. – Geremia – 2017-12-05T15:53:36.927

For example from root when we follow: right left right left we reach to Hk, which we want to verify. But how could we know that we should follow that path? @Geremia – alper – 2017-12-05T17:02:50.263

1@Avator Verification of a Merkle path proceed from the leaf node to the Merkle root. – Geremia – 2017-12-05T17:41:13.543

Got it much clear now. But again how could the algorithm knows that from which leaf should it start from. On your example there are 16 leafs so how could we detect the starting point of the Hk on the tree’s leafs. @Geremia – alper – 2017-12-05T18:03:42.390

1@Avatar Sorry, I meant to say that one simply needs to search for the transaction in the "list of transactions." – Geremia – 2017-12-05T18:56:14.187

As I understand, we know the index of all transactions that at which leaf they located. @Geremia – alper – 2017-12-05T19:16:21.687

– Geremia – 2017-12-05T19:29:03.270

the discussion link seems dead... still haven't found the explain: if it's searched from the "list of transactions", if the position is founded, doesn't it mean that this transaction is a valid transaction? – limboy – 2018-01-14T08:51:01.977

@limboy A SPV server sends a transaction's Merkle path to the lightweight client; this is enough for the lightweight client to verify the transaction is valid, without the lightweight client needing to have a list of all the transactions. – Geremia – 2018-01-15T17:35:53.347

3

It's not true that you use just the merkle root (nor does the article say that). Rather, you use just the parts of the merkle tree that relate to your transaction. That includes the root.

1

This may be a good introduction. Ripple may be using Merkle tree but I am not sure: http://en.m.wikipedia.org/wiki/Hash_tree Also check this: https://stackoverflow.com/questions/5486304/explain-merkle-trees-for-use-in-eventual-consistency

-1

The Merkle Root, as I understand it, is basically a hash of many hashes (Good example here) - to create a Merkle Root you must start by taking a double SHA-256 hash of the byte streams of the transactions in the block. However, what this data is (the byte streams), what it looks like, and where it comes from remains a mystery to me.

-1

BE AWARE! The merkle root is important for mining. since the merkle root is the hashed value of ALL transaction hashes from the block, the value of the merkle root is taken into advance when miners do their work. See: https://en.bitcoin.it/wiki/Block_hashing_algorithm. Previous hash:

81cd02ab7e569e8bcd9317e2fe99f2de44d49ab2b8851ba4a308000000000000


Here an excerpt of the algorithm there:

>>> import hashlib
"81cd02ab7e569e8bcd9317e2fe99f2de44d49ab2b8851ba4a308000000000000" +
"e320b6c2fffc8d750423db8b1eb942ae710e951ed797f7affc8892b0f1fc122b" +
"c7f5d74d" +
"f2b9441a" +
"42a14695")