2022 Nov 19
See all posts
Special thanks to Balaji Srinivasan, and Coinbase, Kraken and Binance staff for discussion.
Every time a major centralized exchange blows up, a common question that comes up is whether or not we can use cryptographic techniques to solve the problem. Rather than relying solely on “fiat” methods like government licenses, auditors and examining the corporate governance and the backgrounds of the individuals running the exchange, exchanges could create cryptographic proofs that show that the funds they hold on-chain are enough to cover their liabilities to their users.
Even more ambitiously, an exchange could build a system where it can’t withdraw a depositor’s funds at all without their consent. Potentially, we could explore the entire spectrum between the “don’t be evil” aspiring-good-guy CEX and the “can’t be evil”, but for-now inefficient and privacy-leaking, on-chain DEX. This post will get into the history of attempts to move exchanges one or two steps closer to trustlessness, the limitations of these techniques, and some newer and more powerful ideas that rely on ZK-SNARKs and other advanced technologies.
Balance lists and Merkle trees: old-school proof-of-solvency
The earliest attempts by exchanges to try to cryptographically prove that they are not cheating their users go back quite far. In 2011, then-largest Bitcoin exchange MtGox proved that they had funds by sending a transaction that moved 424242 BTC to a pre-announced address. In 2013, discussions started on how to solve the other side of the problem: proving the total size of customers’ deposits. If you prove that customers’ deposits equal X (“proof of liabilities“), and prove ownership of the private keys of X coins (“proof of assets“), then you have a proof of solvency: you’ve proven the exchange has the funds to pay back all of its depositors.
The simplest way to prove deposits is to simply publish a list of (username, balance)
pairs. Each user can check that their balance is included in the list, and anyone can check the full list to see that (i) every balance is non-negative, and (ii) the total sum is the claimed amount. Of course, this breaks privacy, so we can change the scheme a little bit: publish a list of (hash(username, salt), balance)
pairs, and send each user privately their salt
value. But even this leaks balances, and it leaks the pattern of changes in balances. The desire to preserve privacy brings us to the next invention: the Merkle tree technique.
Green: Charlie’s node. Blue: nodes Charlie will receive as part of his proof. Yellow: root node, publicly shown to everyone.
The Merkle tree technique consists of putting the table of customers’ balances into a Merkle sum tree. In a Merkle sum tree, each node is a (balance, hash)
pair. The bottom-layer leaf nodes represent the balances and salted username hashes of individual customers. In each higher-layer node, the balance is the sum of the two balances below, and the hash is the hash of the two nodes below. A Merkle sum proof, like a Merkle proof, is a “branch” of the tree, consisting of the sister nodes along the path from a leaf to the root.
The exchange would send each user a Merkle sum proof of their balance. The user would then have a guarantee that their balance is correctly included as part of the total. A simple example code implementation can be found here.
# The function for computing a parent node given two child nodes
def combine_tree_nodes(L, R):
= L
L_hash, L_balance = R
R_hash, R_balance assert L_balance >= 0 and R_balance >= 0
= hash(
new_node_hash + L_balance.to_bytes(32, 'big') +
L_hash + R_balance.to_bytes(32, 'big')
R_hash
)return (new_node_hash, L_balance + R_balance)
# Builds a full Merkle tree. Stored in flattened form where
# node i is the parent of nodes 2i and 2i+1
def build_merkle_sum_tree(user_table: "List[(username, salt, balance)]"):
= get_next_power_of_2(len(user_table))
tree_size = (
tree None] * tree_size +
[*user) for user in user_table] +
[userdata_to_leaf(for _ in range(tree_size - len(user_table))]
[EMPTY_LEAF
)for i in range(tree_size - 1, 0, -1):
= combine_tree_nodes(tree[i*2], tree[i*2+1])
tree[i] return tree
# Root of a tree is stored at index 1 in the flattened form
def get_root(tree):
return tree[1]
# Gets a proof for a node at a particular index
def get_proof(tree, index):
= log2(len(tree)) - 1
branch_length # ^ = bitwise xor, x ^ 1 = sister node of x
= index + len(tree) // 2
index_in_tree return [tree[(index_in_tree // 2**i) ^ 1] for i in range(branch_length)]
# Verifies a proof (duh)
def verify_proof(username, salt, balance, index, user_table_size, root, proof):
= userdata_to_leaf(username, salt, balance)
leaf = log2(get_next_power_of_2(user_table_size)) - 1
branch_length for i in range(branch_length):
if index & (2**i):
= combine_tree_nodes(proof[i], leaf)
leaf else:
= combine_tree_nodes(leaf, proof[i])
leaf return leaf == root
Privacy leakage in this design is much lower than with a fully public list, and it can be decreased further by shuffling the branches each time a root is published, but some privacy leakage is still there: Charlie learns that someone has a balance of 164 ETH, some two users have balances that add up to 70 ETH, etc. An attacker that controls many accounts could still potentially learn a significant amount about the exchange’s users.
One important subtlety of the scheme is the possibility of negative balances: what if an exchange that has 1390 ETH of customer balances but only 890 ETH in reserves tries to make up the difference by adding a -500 ETH balance under a fake account somewhere in the tree? It turns out that this possibility does not break the scheme, though this is the reason why we specifically need a Merkle sum tree and not a regular Merkle tree. Suppose that Henry is the fake account controlled by the exchange, and the exchange puts -500 ETH there:
Greta’s proof verification would fail: the exchange would have to give her Henry’s -500 ETH node, which she would reject as invalid. Eve and Fred’s proof verification would also fail, because the intermediate node above Henry has -230 total ETH, and so is also invalid! To get away with the theft, the exchange would have to hope that nobody in the entire right half of the tree checks their balance proof.
If the exchange can identify 500 ETH worth of users that they are confident will either not bother to check the proof, or will not be believed when they complain that they never received a proof, they could get away with the theft. But then the exchange could also just exclude those users from the tree and have the same effect. Hence, the Merkle tree technique is basically as good as a proof-of-liabilities scheme can be, if only achieving a proof of liabilities is the goal. But its privacy properties are still not ideal. You can go a little bit further by using Merkle trees in more clever ways, like making each satoshi or wei a separate leaf, but ultimately with more modern tech there are even better ways to do it.
Improving privacy and robustness with ZK-SNARKs
ZK-SNARKs are a powerful technology. ZK-SNARKs may be to cryptography what transformers are to AI: a general-purpose technology that is so powerful that it will completely steamroll a whole bunch of application-specific techniques for a whole bunch of problems that were develope