How does a node find a transaction in the blockchain?



If a (light) SPV client asks a full node for the Merkle branch containing a specific transaction (or the value of an address), how does the full node find this transaction (or the UTXOs) in the blockchain? Does the full node scan linearly through the entire blockchain or is there a faster way to link transactions and blocks?


Posted 2016-10-25T17:03:31.490

Reputation: 83

Love your question, I never really thought about that in detail. As I'm not sure at all, and rather just speculating I'm posting this as a comment (perhaps someone can use it as a starting point for an answer): SPV clients mostly ask nodes about patterns such as a bloomfilter that resolves to payments made to their addresses. Since SPV clients are rather exclusively interested in coins that still exist (or are about to exist), all of this information will be found either by searching the UTXO set or the memory pool. -End of speculation-. – Murch – 2016-10-25T22:59:58.207

@Murch The UTXO doesn't contain merkle roots to prove an output exists, in that model the node could just make up nonsense. You're right in that the UTXO can be searched in milliseconds though, but this wouldn't get any history for the client connecting. – Anonymous – 2016-10-29T09:56:26.580



You can read the bip37 specification for all of the gritty details.

how does the full node find this transaction (or the UTXOs) in the blockchain?

The client builds a filter which contains what they're interested in, be it output script (addresses), public keys, or TXID. They then send this to the node they are connected to. When they request data for a particular block the node filters it to only contain things which match the clients bloom filter, and sends it to them. To find all of their unspent UTXO, they request every single block in the chain in turn and look for things that could be theirs. As you mostly worked out, this does involve for every client the node loading 80GB of blocks from disk and performing extremely expensive filtering operations on them.

Modern nodes recognise how terrible of an idea this is, and allow users to disable the functionality completely due to the massive load and denial of service risk involved. With specially crafted transactions, a remote peer with a single command of less than 50 bytes can cause a node to load a 1MB block from disk, hash it over a million times, and then return nothing.


Posted 2016-10-25T17:03:31.490

Reputation: 12 846

Don't they just ask for the data on blocks for which they don't have information yet? I.e. if a wallet was created at block 423,000, it would only ask for blocks later than that. And then, since it'll know that it caught up to 430,000 last time only newer blocks after that? – Murch – 2016-10-29T10:14:01.647

Yes. They can also skip blocks they know happened before the wallets "birth date", as there can't possibly be transactions from that time which they can spend. This doesn't help much in practice because the bulk of the size of the chain is within the last few months. – Anonymous – 2016-10-29T10:41:17.577


Usually, the light client asks "is this transaction in this block" not "which block is this transaction in". The latter can be done without too much trouble if there's a txindex (transaction index), which is something bitcoind has an option for.

Jimmy Song

Posted 2016-10-25T17:03:31.490

Reputation: 7 330

Thanks for your answer! Still, a couple of things are not clear to me: (1) How does the SPV client determine X in the question "Is this transaction in block X"? By proping all blocks (let's say that I just installed a new SPV client and it should determine how much bitcoin I own based on my public addresses)? (2) How does it work in case of a Bloom filter? To my understanding, the full node has to calculate the hashes of all transactions and compare them to the filter? This seems too time-consuming for more than 80 GB of data. – Pold – 2016-10-25T21:00:32.173

3Nodes do not maintain a txindex by default because it would be forever growing and prohibitively expensive (enabling this option significantly slows down block insertion speed and increases disk writes). – Anonymous – 2016-10-29T09:54:45.263