david wong

Hey! I'm David, the author of the Real-World Cryptography book. I was previously a crypto architect at O(1) Labs (working on the Mina cryptocurrency), before that I was the security lead for Diem (formerly Libra) at Novi (Facebook), and a security consultant for the Cryptography Services of NCC Group. This is my blog about cryptography and security and other related topics that I find interesting.

ZK HACK: 1st puzzle write up posted November 2021

Last Tuesday was the start of ZK HACK, a "7-week virtual event featuring weekly workshops and advanced puzzle solving competitions". All related to zero-knowledge proofs, as the name suggests. The talks of the first day were really good, and you can rewatch them here. At the end of the first day, a puzzle called Let's hash it out was released. This post about solving this puzzle.

The puzzle

The puzzle is a Github repo containing a Rust program. If you run it, it displays the following message:

Alice designed an authentication system in which users gain access by presenting it a signature on a username, which Alice provided. One day, Alice discovered 256 of these signatures were leaked publicly, but the secret key wasn't. Phew. The next day, she found out someone accessed her system with a username she doesn't know! This shouldn't be possible due to existential unforgeability, as she never signed such a message. Can you find out how it happend and produce a signature on your username?

Looking at the code, it looks like there's indeed 256 signatures over 256 messages (that are just hexadecimal strings though, not usernames).

The signature verification

The signatures are BLS signatures (a signature scheme that makes use of pairing and that I've talked about it here).

Looking at the code, there's nothing fancy in there. There's a verification function:

pub fn verify(pk: G2Affine, msg: &[u8], sig: G1Affine) {
    let (_, h) = hash_to_curve(msg);
        (h.into(), pk.into()),

which pretty much implements the BLS signature verification algorithm to check that

$$ e(\text{sig}, -G_2) \cdot e(\text{h}, \text{pk}) = 1 $$

Note: if you read one of the linked resource, "BLS for the rest of us", this should make sense. If anything is confusing in this section, spend a bit of time reading that article.

We know that the signature is simply the secret key $sk$ multiplied with the message:

$$\text{sig} = [\text{sk}]h$$

The public key is simply the secret key $\text{sk}$ hidden in the second group:

$$\text{pk} = [\text{sk}]G_2$$

So the check gives us:

$$ \begin{align} & \;e([sk]h, -G_2) \cdot e(h, [sk]G2) \ =& \;e(h, G_2)^{-\text{sk}} \cdot e(h, G_2)^\text{sk} \ =& \;1 \end{align} $$

The hash to curve

Actually, the username is not signed directly. Since a signature is the first argument in the pairing it needs to be an element of the first group (so $[k]G_1$ for some value $k$).

To transform some bytes into a field element $k$, we use what's called a hash-to-curve algorithm. Here's what the code implements:

pub fn hash_to_curve(msg: &[u8]) -> (Vec<u8>, G1Affine) {
    let rng_pedersen = &mut ChaCha20Rng::from_seed([
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1,
    let parameters = CRH::<G1Projective, ZkHackPedersenWindow>::setup(rng_pedersen).unwrap();
    let b2hash = blake2s_simd::blake2s(msg);
        CRH::<G1Projective, ZkHackPedersenWindow>::evaluate(¶meters, b2hash.as_bytes())

What I'm reading is that it:

  1. initializes some collision-resistant hash function (CRH) from a hardcoded seed
  2. uses the BLAKE2 hash function to hash the username (msg) into 256 bits.
  3. Uses the CRH to hash the BLAKE2 digest into a group element

Looking at CRH, it's simply a Pedersen hashing (I've talked about that hash function here) that converts a series of bits $b_1, b_2, b_3, \cdots$ into

$$ [b_1]G_{1,1} + [b_2]G_{1, 2} + [b_3]G_{1, 3} + \cdots $$

where the $G_{1,i}$ are curve points that belongs to the first group (generated by $G_1$) and derived randomly via the hardcoded seed (in a way that prevents anyone from guessing their discrete logarithm).

What are we doing?

What are we looking for? We're trying to create a valid signature (maybe a signature forgery attack then?) on our own nickname (so more than just an existantial forgery, a chosen-message attack).

We can't change the public key (so no rogue key attack), and the message is fixed. This leaves us with the signature as the only thing that can be changed. So indeed, a signature forgery attack.

To recap, we have 256 valid signatures:

  • $e(\text{sig}_1, -G_2) \cdot e(h(m_1), \text{pk}) = 1$
  • $\vdots$
  • $e(\text{sig}_{256}, -G_2) \cdot e(h(m_{256}), \text{pk}) = 1$

and we want to forge a new one such that:

$$ e(\text{bad_sig}, -G_2) \cdot e(h(\text{"my nickname"}), \text{pk}) = 1 $$

Forging a signature

Reading on the aggregation capabilities of BLS, it seems like the whole point of that signature scheme is that we can just add things with one another. So let's try to think about adding signatures shall we?

What happens if I add two signatures?

$$ \begin{align} &\; \text{sig}_1 + \text{sig}_2 \ =&\; [\text{sk}]h_1 + [\text{sk}]h_2 \end{align} $$

if only we could factor $sk$ out... but wait, we know that $h_1$ and $h_2$ are additions of the same curve points (by definition of the Pedersen hashing):

$$ \begin{align} &\; \text{sig}_1 + \text{sig}_2 \ =&\; [\text{sk}]h_1 + [\text{sk}]h_2 \ =&\; \text{sk}\ &\; + \text{sk} \end{align} $$

where the $b_{i}$ (resp. $b'_{i}$) are the bits of $h_1$ (resp. $h_2$). So the added signature are equal to the signature of the added bitstrings:

$$ [b_{1} + b'{1},\; b{2} + b'{2},\; b{3} + b'_{3},\; \cdots] $$

We just forged a signature! Now, that shouldn't mean much, because remember, these bits represent the output of a hash function and a hash function is resistant against pre-image attacks.

Wait a minute...

Our hash function is a collision-resistant hash function, but that's it.

Linear combinations

OK, so we forged a signature by adding two signatures. But we probably didn't get what we wanted, what we wanted is to obtain the bits $\tilde{b}_1, \tilde{b}_2, \tilde{b}_3, \cdots$ that represent the hashing of my own username.

Maybe if we add more signatures together we can get? Actually, we can use all the signatures and combine them. And not just by adding them, we can take any linear combination (we're in a field, not constrained by 0 and 1).

So here's the system of equations that we have to solve:

  • $\tilde{b}_1 = x_1 b_1 + x_2 b'_1 + x_3 b''_1$
  • $\vdots$
  • $\tilde{b}_{256} = x_1 b_{256} + x_2 b'_{256} + x_3 b''_{256}$

Note that we can see that as solving $xA = b$ where each row of the matrix $A$ represents the bits of a digest, and $b$ is the bitvector of my digest (the hash of my username).

Once we find that linear combinations, we just have to apply it to the signatures to obtain a signature that should work on my username :)

$$ \text{bad_sig} = x_1 \text{sig}_1 + x_2 \text{sig}_2 + x_3 \text{sig}_3 + \cdots $$

Coding the answer

Because I couldn't find a way to solve a system of equations in Rust, I simply extracted what I needed and used Sage to do the complicated parts. Here's the Rust code that creates the matrix $A$ and the vector $b$:

// get puzzle data
let (_pk, ms, _sigs) = puzzle_data();

// here's my name pedersen hashed
let (digest_hash, _digest_point) = hash_to_curve("mimoo".as_bytes());

// what's that in terms of bits?
use ark_crypto_primitives::crh::pedersen::bytes_to_bits;
let bits = bytes_to_bits(&digest_hash);
let bits: Vec<u8> = bits.into_iter().map(|b| b as u8).collect();

// order of the subgroup of G1
println!("R = GF(0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001)");

// xA = b
print!("A = matrix(R, [");
for m in ms {
    let (digest, _point) = hash_to_curve(&m);
    let bits = bytes_to_bits(&digest);
    let bits: Vec<u8> = bits.into_iter().map(|x| x as u8).collect();
    print!("{:?}, ", bits);
println!("b = vector(R, {:?})", bits);

Note: the system of equation is over some field $R$. Why? Because eventually, the linear combination happens between curve points that are elements of the first group, generated by $G_1$, and as such are scalars that live in the field created by the order of the group generated by $G_1$.

In sage, I simply have to write the following:

x = A.solve_left(b)

and back in Rust we can use these coefficients to add the different signatures with one another:

let mut bad_sig = G1Projective::zero();
for (coeff, sig) in coeffs.into_iter().zip(sigs) {
    let coeff = string_int_to_field_el(coeff);
    bad_sig += sig.mul(coeff);

// the solution
verify(pk, "mimoo".as_bytes(), bad_sig.into());

It works!

comment on this story

David Wong's 7 rules of programming posted September 2021

Don't worry about the pretentious title, I was just inspired by Rob Pike's 5 Rules of Programming and by the Risk Manifesto of the The Security Engineer Handbook (which is explained in more details in the book):

Threat modeling over mindless improvements. Secure early rather than later. Forcing software over forcing people. Role play over complacency.

So I made my own! Without further ado:

  1. Put it in writing. If you leave for a month-long spiritual trip and someone needs to fix a bug in your code, will they have to wait for you? If an expert wants to assess the algorithm you're implementing, but have no knowledge in the programming language you used, will they manage to do it? These are questions you should ask yourself when writing a complex piece of code. It happens often enough that a protocol is implemented differently from the paper. Write specifications!
  2. The concorde is fast but dangerous. When writing optimized, faster code, or perhaps over-engineered or overly-clever code, you're trading off maintainability, clarity, and thus security of your code. I often hear programming language features claiming zero-overhead, but they always forget to talk about the cost paid in ambiguity and complexity.
  3. Plant types and data structures, then water. As Rob Pike says, use the right data structures, from there everything should flow naturally. Well documented structures are the best abstraction you'll find. Furthermore, functions that make too many assumptions on their inputs will have trouble down the line when someone changes the code underneath them. Encoding invariants in your types is a great way to address this issue. The bottom line is that you can get most of your clarity and security through creating the right types and objects!
  4. Don't let your dependencies grow on you. Dependencies easily grow like cancer cells. Beware of them. Analyse your dependencies, monitor and review the ones you use, rewrite or copy code to avoid importing large dependencies, and add friction to your process of adding new dependencies. Supply-chain attacks are being more and more of a thing, be aware.
  5. Make the robots enforce it. Enforce good code patterns and prevent bad code patterns through continuous integration where you can. This is pretty much to the "forcing software over forcing people" of the Risk Manifesto.
  6. Design lego cities, not real ones. The less connected your code is, the clearer it is to follow as the different parts will be self-contained, and the easier it will be to update some part of the code. Think about parts of your code as external libraries that are facing more than just your codebase.
1 comment

Project Memento: An NFT art project posted September 2021

Believe it or not, even though I was one of the participant in the standardization of ERC-721, the Ethereum standard for NFTs, I had never read the final standard, yet alone written an ERC-721 smart contract. At the time I was against the standard, at least in its form, due to its collision with the ERC-20 standard (you can't create a smart contract that is both a compliant ERC-20 token and a compliant ERC-721 NFT). My arguments ended up losing, ERC-721 was standardized with the same function names as ERC-20, and I heard that today another standard is surfacing (ERC 1155) to fix this flaw (among others).

Anyway, I recently decided to bury the hatchet and read the damn standard. And after a few weekends of hacking with my friends Eric Khun and Simon Patole, we wrote an ERC-721 compliant dapp as an art project. It's called project memento and it boasts a grid of tiles each containing a letter, a-la million dollar page. People can acquire letters, alter them, and form lines (or diagonals) of words together. I'm curious as to what will become of this, but this was fun!

project memento

You can check out project memento here

1 comment

The Real-World Cryptography book is done and shipping! posted September 2021

Today I received the first copy of my book Real-World Cryptography! You can't imagine how much sweat and love I put into this work in the last two years and a half. It's the book I've always wanted to read, and it's the book I thought the field was missing. It's the book I wanted to have as a student when I was learning about hash functions and ciphers, and it's the book I wish I could have referred to my fellow pentesters when they had questions about TLS or end-to-end encryption. It's the book I'd use as a developer looking for cryptographic libraries and best practice. It's the first book with a cryptocurrency chapter, and it's the book cryptographers will read to learn about password-authenticated key exchanges, sponge functions, the noise protocol framework, post-quantum cryptography, and zero-knowledge proofs. Real-world cryptography is what the field of applied cryptography really looks like today. It's all there.

Get the book here!

real-world cryptography book

real-world cryptography book

real-world cryptography book

real-world cryptography book

real-world cryptography book

real-world cryptography book

1 comment

Looking for a cryptography engineer job? Interested in zero-knowledge proofs? posted August 2021

Hey you!

If you've been following my blog recently, you must have seen that I joined O(1) Labs to work on the Mina cryptocurrency. Why? Simply because I felt like it was the most interesting project in the space, and now that I'm deep in OCaml and Rust code trying to understand PLONK and other state-of-the-art zero-knowledge proof systems work I can tell you that it is indeed the most interesting project in the space :)

My two zero-knowledge friends are too busy doing a PhD. They like self-inflicted pain. (Mathias and Michael, I'm looking at you.) So if you're free, ping me! And if you're not free, ping me anyway because we might be able to work something out.

Because people need a job description, check that one out: https://boards.greenhouse.io/o1labs/jobs/4023646004

But realistically, just contact me on twitter or here if you have any question about what the culture of the company is, what your day-to-day would look like, and if you can work remotely (yes).

comment on this story

How does PLONK work? Part 11: Our final protocol! (Without the copy constraints) posted August 2021

In this eleventh video, I go back to the PLONK protocol and finally explain how it works with polynomial commitments. This version of the protocol is not finished, as it doesn't have zero-knowledgeness (the polynomial evaluations leak information about the polynomials) and the wiring (or copy constraint) has not been enforced (e.g. the output wire of this gate should be the left wire of this other gate). In the next video, I will explain how copy constraints can be enforced via the PLONK permutation argument.

Stay tuned for part 12... Check the full series here.

comment on this story