david wong

Hey! I'm David, a security consultant at Cryptography Services, the crypto team of NCC Group . This is my blog about cryptography and security and other related topics that I find interesting.

[paper review] A faster, more efficient cryptocurrency

posted 5 weeks ago

MIT Computer Science & Artificial Intelligence Lab just released some new research. The paper is called Vault: Fast Bootstrapping for Cryptocurrencies.

From a high level, it looks like a new cryptocurrency. But it is written as a set of techniques to improve Algorand and other cryptocurrencies are encouraged to adopt them, which leads me to think this will probably not become another cryptocurrency. It is very well written and I thought I would write a summary here.

After installing a cryptocurrency client, you usually sync it with the network in order to be able to participate or see the latest state of the blockchain. This whole concept of joining a cryptocurrency as a client is costly. For example, to join Bitcoin you need to download the whole blockchain which is more than 150 GB at the moment. It can take days and lot of bandwidth.

The whole point of Vault (the paper) is about reducing this bootstrapping process, and the space needed after to keep using the blockchain. It includes three techniques which I will briefly summarize below:

First technique: balance pruning.

Ethereum got rid of Bitcoin's UTXOs and keeps track of all the account balances instead. This method already saves a lot of space. Once you do that though, you are required to keep track of a nonce per account. This is to prevent transactions replay.

How does the Ethereum nonce works: when you sign a transaction, you also sign the nonce. Once it gets accepted in a block the nonce of your account gets incremented, and the next transaction will need to sign the new nonce to get accepted.

Vault doesn't keep a nonce, instead it tracks the hashes of all the transactions that were applied. This way, If you try to replay a transaction, it will be detected.

That's too costly though, we don't want to store ALL of the transactions ever. So the first technique is to force transactions to expire. Once a transaction is expired, a client can get rid of it in its storage, as it cannot be replayed (it is not valid anymore).

Another issue with Ethereum are zero-balance accounts.

Our analysis of the Ethereum state indicates that around 38% of all accounts have no funds and no storage/contract data (i.e., only an address and a nonce)

These zero-balance accounts still need to be tracked because of the nonce that is still useful to prevent future transaction replays.

In Vault, once all transactions applied on a zero-balance account have expired, you can prune the account.

This first technique allows a lot of useless data to be cleaned out. A client only stores accounts that have a balance greater than zero, zero-balance accounts if they still have transactions that could be replayed, and recent committed/pending transactions that haven't expired yet.

Second technique: balance sharding.

Keeping track of all the accounts is costly for one client... We can do better.

Vault uses a sparse Merkle tree where the leaves represent all the possible accounts.

It's quite a large tree... so we can split it in shards! The first $X$ bits of someone's public key could indicate what shard they should store. For example if we want to have 4 shards, the first 2 bits would indicate the path in the tree (0 is left, 1 is right).

Instead of storing all the leaves, you would store all the leaves that belong to your "subroot" (specified by the first bits of your public key).

By doing this we can't verify transactions anymore though...

To solve this, every transaction now ships with a proof of membership (a witness) for both the sender and recipient addresses. This proof of membership is a typical merkle tree concept, it contains the neighbor nodes of the leaf's path up to the root as well as the leaf itself (account balance). This way anyone can verify a transaction!

Furthermore, once a transaction is committed to a block, the merkle tree is updated and its root are updated as well (because an account balance (a leaf) has changed). So pending transactions in the mempool (the pool of pending transactions) need to have their membership proofs updated. Anybody has enough information to do this so it is not a problem. (By the way this is why transactions carry the recipient address' witness (the membership proof)l: because that leaf gets affected by a transaction as well.)

These two proofs are quite long though, and they accompany every transactions being broadcasted on the network!

One way to reduce that is to decrease the length of the membership proof. To do this, every client stores a level of subroots of the balance merkle tree, this way a proof doesn't need to go up to the root, but up to the level that the client has stored. The rest can be calculated from what is on disk.

Thanks to this second technique, one only needs to keep track of his own leaf (account balance) as well as the balance merkle root. And one can still verify new blocks. If one wants to help storing part of the network, one can keep track of a shard instead of the whole list of accounts.

I think the sharding thing here is actually less important than the transactions shipping with the balance proofs. But it is still a neat way of making sure that all nodes can participate in conserving the full blockchain and its balances.

Third technique: stamping certificates.

Vault is built on top of Algorand. The Algorand protocol is so elegant, I can see why they would want to make use of it.

Algorand's byzantime agreement BA* produces a "certificate" for each block committed to the blockchain. A certificate is a bundle containing more than 2/3rd signatures of the block from a randomly selected committee. Such a certificate is quite large (lots of signatures, public keys, and proof that the public key could participate in BA*). In addition, Algorand "mines" a new block every 2-3 seconds, so it has quite a large blockchain and a new client needs to download and verify a huge number of blocks to join the network.

To speed this up, Vault has nodes create additional "stamped certificates" at larger intervals (one every thousand blocks or so). From one stamped certificate, you can fast forward thousands of blocks ahead (~2 days of history) to the next stamped certificate. To create these stamped certificates, they use the same cryptographic sortition technique that Algorand uses but with a smaller number of participants to reduce the size of the stamped cert. (No, BLS and aggregate signatures won't help because as I've said a certificate contains a lot of distinct objects).

This way, to join the network, a client can validate blocks and certificates until it finds its first stamped certificate, then the client can jump thousands of blocks ahead from stamped certificate to stamped certificate until it reaches the most up-to-date block. There are more optimizations and details to it addressed in the paper.

I the end these stamped certificates are some kind of checkpoints. I find that it is not as elegant as Coda's protocol solution which only contains one zero-knowledge proof of the whole chain, but ZK-SNARKS being quite complicated I can see how one would want to avoid them.

Well done! You've reached the end of my post. Now you can leave me a comment :)