david wong

Hey! I'm David, the author of the Real-World Cryptography book. I'm a crypto engineer at O(1) Labs on the Mina cryptocurrency, previously I was the security lead for Diem (formerly Libra) at Novi (Facebook), and a security consultant for the Cryptography Services of NCC Group. This is my blog about cryptography and security and other related topics that I find interesting.

Linearization in Plonk and Kimchi. Why? posted April 2022

This is a short note on the linearization step in Plonk (the zero-knowledge proof system), which is in my opinion the hardest step to understand.

Essentially, the linearization in Plonk exists due to two reasons:

  1. The polynomials we're dealing with are too big for the SRS (in the case of a polynomial commitment scheme, or PCS, like KZG) or URS (in the case of a PCS like bulletproof).
  2. During a proof verification, the verifier reconstructs the commitment to the polynomial that is aggregating all the constraints. To construct that commitment, the verifier adds commitments together, but must avoid multiplications as the commitment scheme used is only additively homomorphic.

Let's look at each of these in order.

The Reference String has a maximum size limit

The polynomials we're dealing with are too big for the SRS (in the case of a polynomial commitment scheme, or PCS, like KZG) or URS (in the case of a PCS like bulletproof).

Imagine that you have a polynomial with 4 coefficients:

$$ f = f_0 + f_1 x + f_2 x^2 + f_3 x^3 $$

Now imagine that your reference string is of size 2. This means that you can commit polynomials that have 2 coefficients max. The only way to commit to $f$ is to commit to the two polynomials with 2 or less coefficients separately:

  • $com_1 = f_0 + f_1 x$
  • $com_2 = f_2 + f_3 x$

Then, as part of the protocol, the prover evaluates this polynomial at some challenge point $\zeta$ and produces an evaluation proof for that. The verifier will have to first reconstruct the commitment to that polynomial, which can be constructed as:

$$ com = com_1 + \zeta^2 com_2 $$

Take a few minutes to understand why this works if needed.

Commitments are not homomorphic for the multiplication

During a proof verification, the verifier reconstructs the commitment to the polynomial that is aggregating all the constraints. To construct that commitment, the verifier adds commitments together, but must avoid multiplications as the commitment scheme used is only additively homomorphic.

Both the KZG and bulletproof PCS's use the Pedersen commitment, which is an homomorphic commitment scheme for the addition, but not for the multiplication. In other words, this means we can add commitments together (or add a commitment to itself many times), but not multiply commitments together.

So if $com_1$ is the commitment of the polynomial $g_1$ and $com_2$ is the commitment of the polynomial $g_2$. Then the following is fine:

$$ com_1 + com_2 = com(g_1 + g_2) $$

While the following is not possible:

$$ com_1 \cdot com_2 $$

In this last case, the solution is to simply have the prover evaluate one of the commitments. For example, if we want to evaluate the polynomial (behind this commitment) to a challenge point $\zeta$, the prover could provide $g_1(\zeta)$ and the verfier would then compute the commitment of $g_1 \cdot g_2$ as

$$ g_1(\zeta) \cdot com_2 $$

This works, as long as the protocol uses this commitment to later verify an evaluation proof of $g_1(\zeta) \cdot g_2(\zeta)$.

Note that if $g_1$ is a polynomial that needs to remain private (part of the witness), then it should be blinded if you care about zero-knowledge.

As such, the linearization step of Plonk is a way to have the verifier construct a commitment as a linear combination of commitments. These commitments are generally preprocessed commitments contained in the verifier index (also called verifier key) or commitments that the verifier can generate themselves (the commitment to the public input polynomial for example).

3 comments

My friends always ask me what the heck is blockchain. It’s simple really! posted April 2022

I posted this on twitter initially, although it's short I think it's worthy of being reshared here.

The simplest abstraction is to see cryptocurrency / blockchain / distributed ledger technology as a database running on a single computer. Everybody can access this database and there’s some simple logic that allows you to debit your account and credit someone else’s account. The computer has a queue to make sure transactions are processed in order.

Blockchains that support smart contracts allow for people to install programs to that computer, which will add a bit more logic than simply debiting/crediting accounts. Others can then just send transactions to this computer to make a function call to any program (smart contract) that was installed on the computer.

The cool thing really is that we’re all using the same computer with the same database and the same programs.

Now in practice, nobody would trust a single point of failure like that. What if the computer crashes, or burns in a fire? Distributed systems to the rescue! We use distributed system protocols to run the same database on many computers distributed around the world. These distributed system protocols effectively simulate a single database/computer so that a few computers failing doesn’t mean the end of the blockchain.

On top of that, we refuse to trust the computers that participate in this protocol. They could be lying about the balance in your account. We want the computers to police one another and agree on the database they are simulating. That’s where consensus protocols are used: to make the distributed database secure even when some of the participants are malicious.

And that’s it. That’s blockchain tech for you. The obvious application is money, as the secure and simulated single computer is useful to simplify a payment system, but really any distributed database that cannot trust some of its participants can benefit from the advances there.

If you have questions about these analogies or other blockchain concepts, ask them in the comment section and I'll try to update this post :)

1 comment

Kimchi: The latest update to Mina's proof system posted March 2022

I initially posted this on https://minaprotocol.com/blog/kimchi-the-latest-update-to-minas-proof-system

kimchi

(photo taken from https://unsplash.com/photos/M_mDgb8guhA)

We recently released an update to our proof system for Mina called Kimchi. Kimchi is the main machinery we use to generate the recursive proofs that allow the Mina blockchain to remain of a fixed size of 22KB. It will also soon enable privacy and local execution of smart contracts with our snapps update. In this post, we'll go through what Kimchi is and what's different about it.

First, it is good to see where in the stack we are. Looking at the edge of the network, what we can see is a big blackbox: Mina.

If you're curious enough, you can open it and see what's inside. At some point you'll end up with pickles. Pickles is the recursion layer, it is the protocol that we use to create proofs of proofs of proofs of ... and reduce the blockchain to a fixed-size of under 22KB.

Pickles need something to create proofs though, and this is what Kimchi is:

In this post, we'll attempt to create some abstractions and simplifications to teach intuitions about what Kimchi is. We will try to keep explanations at a relatively high level, so it will be up to you to open the blackboxes that we will lay before you.

One of these black boxes, for example, are the pasta curves.

There's clearly a theme here. All that's left is to change Mina's name to another pickled condiment.

A bit of context

Kimchi is based on PLONK, a proof system released in 2019 by Ariel Gabizon, Zachary J. Williamson, and Oana Ciobotaru. Since then, many ameliorations and extensions have been proposed. There's been fflonk, turbo PLONK, ultra PLONK, plonkup, and recently plonky2. It's hard to follow, but essentially all these protocols implement variants of PLONK. Thus, we call them plonkish protocols. Kimchi is such a plonkish protocol.

Today, PLONK is regarded as one of the most ambitious general-purpose zero-knowledge proof constructions. Many projects like Zcash, Polygon Zero (formerly known as Mir Protocol), Aztec network, Dusk, MatterLabs (zksync), Astar, and anoma, have their own implementation of the proof system.

But first, what is a proof system?

Let's look at a scenario that should shed some light on the construction's interface. If you understand this section, you're halfway there, as you'll be able to think for yourself on how general-purpose zero knowledge proofs can be used without having to figure out what's going on inside of it.

Let's imagine that Alice has a sudoku puzzle. She sends it to Bob, and Bob finds a solution to the puzzle. To prove it to her, he sends her the solution.

Alice can then run the solution, with the sudoku, in a program that will return true if the solution is correct.

The problem is that Bob doesn't really want to share his solution... So instead, he runs the program himself, on his laptop, and tells Alice "I know a solution, trust me, I just ran the verify solution program on my laptop and it returned true".

Obviously Alice has no reason to trust Bob. We're in a bit of a pickle. This is where protocols like PLONK can be useful. The first step is to take our verify solution program in a special format (I'll come back to that later) and compile it down to two distinct blobs of data:

  • the prover index (sometimes called the prover key, although it's not a real key)
  • the verifier index (sometimes called the verifier key, not a key either)

Then the prover (Bob) can use a prove algorithm with the prover index, the sudoku puzzle, as well as his solution to execute the program and produce a proof of correct execution (a proof that the program returned true in this case).

We call the sudoku the public input, as it is known by others (like Alice), and the solution the private input, as Bob wants it to remain secret.

Note: here we don't really care about the output (we just want the program to correctly run to completion, which is equivalent to returning true). Sometimes though, we do care about having a public output (perhaps the result of the execution, which Alice can use). In these cases, the public output is part of the public input. (This detail doesn't really matter in practice.)

Now, Bob can simply give the proof to Alice, and Alice can verify it with another algorithm called verify. If the verify function returns true, she knows that Bob correctly ran the program verify solution to completion (using her sudoku puzzle and his solution).

What about Snapps?

In Mina, snapps (zero-knowledge smart contracts) can be written in typescript using the snarkyjs library, and then compiled down to some intermediary representation with snarky. A Kimchi compiler can then be used to compile the program into the prover and verifier indexes, and both sides can use Kimchi provided functionalities to produce proofs as well as verify them.

The verifier index is uploaded on chain, which allows anyone to verify proofs contained in transactions that claim that they executed a snapp correctly.

Arithmetic circuits

Let's now address the elephant in the room: cryptography is about math and numbers, but programs like our sudoku solution verifier are about instructions and bits. That's not going to work. What can we do about it? The solution is to use arithmetic circuits!

Arithmetic circuits are circuits made out of arithmetic gates:

I'll argue that with an addition gate (that adds two numbers together) and a multiplication gate (that multiplies two numbers together) we can rewrite most programs.

Let's look at a simple example. Let's imagine that I want to use a bit x in a computation. To do that, I first need to make sure that x is indeed a bit (that it is 0 or 1).

Look at the following equation: x(x-1) = 0. This should be true, only if x is 0 or 1. We call that a constraint (we're constraining x to some values). Writing circuits for our proof system is about writing such constraints. Many of them in fact.

Let's see how that works with our arithmetic gates.

First we add 1 to -x (I'll explain how to get -x from x later). Then we use the multiplication gate with the output:

The output of the multiplication gate must be 0 (that's the constraint we want to write).

This is it, our circuit for now only has two gates. In PLONK, you would write this down as a list of gates acting on registers (the two inputs L and R, and the output O):

Now, we aren't really doing an addition of L and R here, we're rather adding 1 to -R. How do we do this? Perhaps we could have a "add with constant" gate, and a "subtract gate"? Instead, we "tweak" our addition gate:

Now let's add the multiplication gate to the list. But remember, the output register is not used, as it must be 0. So we tweak that gate as well:

More on tweaking these gates later.

There's one last thing that we're missing: the output register of the first gate is the left register of the second gate. We can simply wire the two registers to encode this correspondance:

And this is how we represent a circuit in PLONK! We just list all the gates (and their parameters) as well as the wiring between the registers they act on.

Now, when a prover wants to produce a proof, they will run the program and record the values of each registers in an execution trace. For example, here is one that takes x = 0.

Note: the values in the left register of the first gate, and the output register of the second gate, can be anything as they are not used by the gates.

One simplification I made, is that we don't know where x comes from here. In a real circuit, the right registers of this execution trace are wired to another register containing the value x (or perhaps it is given as private or public input to the circuit, but I will avoid explaining how this works here).

Another simplification I made, is that in reality the addition and multiplication gates are implemented as a single tweakable gate that we call the generic gate in Kimchi:

Tweaking the parameters of the generic gate essentially turns it into different gates (addition, multiplication, addition with constant, subtraction, etc.)

From PLONK to Kimchi

Kimchi is a collection of improvements, optimizations, and alterations made on top of PLONK. For example, it overcomes the trusted setup limitation of PLONK by using a bulletproof-style polynomial commitment inside of the protocol. This way, there is no need to trust that the participants of the trusted setup were honest (if they were not, they could break the protocol). Talking about circuits, since we're talking about circuits here, Kimchi adds 12 registers to the 3 registers PLONK already had:

These registers are split into two types of registers: the IO registers, which can be wired to one another, and temporary registers (sometimes called advice wires) that can be used only by the associated gate.

More registers means that we now can have gates that take multiple inputs instead of just one:

This opens new possibilities. For example, a scalar multiplication gate would require at least three inputs (a scalar, and two coordinates for the curve point). As some operations happen more often than others, they can be rewritten more efficiently as new gates. Kimchi offers 9 new gates at the moment of this writing:

Another concept in Kimchi is that a gate can directly write its output on the registers used by the next gate. This is useful in gates like "poseidon", which need to be used several times in a row (11 times, specifically) to represent the poseidon hash function:

Another performance improvement implemented in Kimchi are lookups. Sometimes, some operations can be written as a table. For example, an XOR table:

An XOR table for values of 4 bits is of size 28. Implementing this with generic gates would be hard and lengthy, so instead Kimchi builds the table and allows gates (so far only Chacha uses it) to simply perform a lookup into the table to fetch for the result of the operation.

There's much more to Kimchi, but this is for another time. You can check out the implementation here and if you're curious opening the other blackboxes you can check our our in-depth explanations here.

To summarize

Pickles now uses an upgraded proof system: Kimchi. Kimchi brings several optimizations and quality-of-life improvements to circuit builders. This should allow for faster provers, larger circuits, and potentially shorter proof sizes!

comment on this story

The code is the specification? Introducing cargo spec posted March 2022

Today, I want to introduce a tool called cargo-spec. I've been using it at work to specify kimchi, the general-purpose zero-knowledge proof system that is used in production for the Mina blockchain.

Before I introduce the tool, let me give some motivation behind why I created it, as well as why it is designed the way it is.

Specifications are important

First, let's talk about specifications. Most of the cryptographic schemes that are used in the wild tend to be specified. This is not necessarily a crypto thing, but this is where I have experience. These specifications often look like RFCs, but it is not the only standard.

More importantly, specifications are multi-purpose. For example, they are used to help others implement an algorithm or protocol. This is usually the reason for these RFCs, but not necessarily what my tool targets.

Specifications are also used to help others understand how a protocol works. Indeed, if you want to understand a protocol, and it only exists as code, you'll have to reverse engineer the code. Not everyone is good at reverse engineering. I would even argue that most people are bad at it. Not everyone can read the language you implemented your protocol in. Not everyone has the time to do it.

I used to work in a team where researchers wanted to formally analyze a protocol, but had no clue how it worked. And of course, they didn't want to read the massive Rust codebase to do that. Security engineers would want to review it for bugs, but what is a bug without a spec? How can you understand the logic without a higher level document describing the protocol?

This is where specifications can also be really useful: to let security engineers audit your code. With a spec, they can simply match it to the code, and any divergence is a bug.

Your code is a book

I want to take a short detour to talk about writing. Writing code is like writing a book. It will be read again and again, changed, and maintained by others.

A book has sections, chapters, intros, outros, callouts, etc. Why shouldn't code have the same things? Code sorts of has that already: files, modules, packages, namespaces, function names, variable names, comments, etc. are all tricks a developer can use to make their code readable.

But this doesn't mean you can't add actual sections in your code! There's probably a reference to Knuth's literal programming, but it's a bit old, so I'll give you a reference I really enjoyed recently: Literate Programming in the Large by Timothy Daly.

In this talk, Timothy makes the point that we're the first user of our documentation: as we will all forget the code we wrote at some point, documentation (or a specification) might help drastically. Without it, changing the code could become a herculean task.

So Timothy's point is that you should write a book about your code. That there's no reason not to write and write and write. Perhaps there'll be too much stuff? But we live in the future and we don't look at real books, but at pages that you can grep and index. Even outdated stuff might help, as it will give some insight or rational.

How to specify?

Back to specs! The way I've worked in the past with specifications, was to write them as documents that lived outside of the codebase. But when you work on a project with a home-made protocol, you always have a reference implementation. That reference implementation is a living thing, it changes over time. This means that specifications of living projects tend to diverge from the implementation(s), unless they are maintained by rigorous developers.

The other solution is to write your specification in the code, where updates can be made by developers more easily as they adjust the code. But a specification is a structured document, with intros, outros, overviews, and other things that aren't really a good fit for being split apart in multiple files.

One day, I asked myself the question, why not both?

This is where cargo-spec comes in.

Cargo-spec

cargo-spec is a tool written in Rust, although it works with codebases in any languages, to implement these ideas.

The tool expects two things:

  • a template, which contains the organization of your spec in markdown. You can write content there, but also use placeholders when you want parts to be filled by your code.
  • a specification file, that helps you list the places in your code that you want to use in the specification.

The tool then extract parts of your code, replace the placeholders of your spec with that content, and produces the final specification (for now two formats are available: markdown and respec).

In the diagram above that I made at work, I show how the kimchi specification is created. I then use mdbook to serve it, as it contains LaTeX equations. I could have used hugo (which I did, initially), or really any other tool. You might also just want to have your spec in markdown file and leave it at that.

What is extracted from your code? Comments starting with a special prefix: //~ (or #~ in python, or (*~ ... *) in OCaml, etc.)

Rustdoc vs spec doc

You can ignore this section if you're not interested in specifying Rust code, although I'll give some insights that might be useful for other languages that also support special comments for code documentation.

By default, Rust has two types of comments: the normal ones (//), but also documentation comments (///, //!, and /** ... */). When you run cargo doc, the Rust documentation comments from your code get parsed and an HTML documentation is generated from them.

Since you can't use both spec comments and doc comments, how can you reconcile the two? The philosophy of cargo-spec, is that a language doc comment should be used to specify the interface of the code only; not the internal logic. As such, documentation should treat its library like a blackbox. Because who uses documentation? Developers who want to work with the library without having to understand its inners.

On the other hand maintainers, contributors, reviewers, etc. will mostly look at what's inside, and this is what you should specify using spec comments.

Examples

You're curious to see it in action? Interestingly, The specification of cargo spec is written using cargo spec itself. It's a pretty simple respec specification mostly here to showcase the tool.

During the last weeks I've been working on the Kimchi specification at work, where I've been using this tool as well. is written using it as well. You can check it out here, but keep in mind that it is still work in progress.

I'm excited to see if this will change the game, and if it will push more people to specify their code. Let me know if you find it useful :)

comment on this story

Contributing to open source while learning Rust and zero-knowledge proofs posted March 2022

I made a tweet that caught a lot of attention:

So I made a video to introduce the kimchi repository to newcomers who want to contribute: If you're interested, ping me on twitter :) there's a lot of opportunities to learn in there! comment on this story

Libra/Diem's second life as Aptos posted February 2022

As some of the readers of this blog know, I worked for two years at Novi (Facebook) on the Diem (formerly known as Libra) cryptocurrency. The project was recently dismantled, most likely due to regulator unwillingness to help a non-government backed cryptocurrency reach millions of people at scale.

Today, a group of 17 ex Novi/Diem engineers and researchers announced Aptos, presumably a fork of Diem. From what I understand, this means that a state-of-the-art blockchain that has been pushing the envelop in terms of innovation, security, and performance, and that's been in development for the last 4 years, now has a path to launch.

I'm guessing that we're going to see the typical proof-of-stake approach being implemented (unlike Diem's proof of authority) as well as a revamped set of smart contracts (written in Move) to govern the new blockchain. Besides that, the blockchain was already in a solid shape a year ago so I don't foresee any major changes.

Congratulation to the new team :) this is exciting.

comment on this story

Supply chain attacks are the new big thing posted January 2022

Over 90 WordPress themes, plugins backdoored in supply chain attack

(source: bleepingcomputer.com)

A product can be seen as a production line. That's what The Phoenix Project novel argues. It makes sense to me. Things gets assembled and passed around, work stations can become bottlenecks, and at the end of the line you deliver the product to the user. In that production line, pieces come from your own warehouse, or from external vendors. That distinction my friend, is what we call "insider threat vs supply chain attacks" in security.

Insider threat is a problem that's mostly ignored today. Large corporations have spies in them, that's a given. They have poor ways to deal with them, but that's fine, because most of these spies are working from other governments and they're tolerated in the grand scheme of things. They might inject backdoors here and there (see the juniper incident) but most of the times they remain unnoticed and unproblematic.

Supply chain attacks, on the other hand, are from less sophisticated attackers. They're from script kiddies, blackhats, "hackers", and others that we generally don't label much as advanced (like in advanced persistent threat). But they are getting more and more advanced.

To counter supply chain attacks, the first thing you can do is inventory (some people might call this threat modeling). The idea, really, is to list all of the pieces and the flows that make up your product. An acyclic directed graph of machines and humans. Once you have that, it's much clearer what the different attack vectors are.

I'm a software person, so I mostly focus on software security. When you write software, you worry about a few things:

  1. your dependencies
  2. the flows around your code evolution
  3. what goes into a release
  4. how that release gets into production

The fourth point is sort of an infra problem, so it's not the most urgent problem you should think about as a developer. I'd say the third point could be an infra problem as well, unless you're lucky enough to have release engineers or dev infra people working in your company. The second point is probably Github, and all the corner cases you have around it. For the most part, you can choose to trust Github. The first point, the dependencies you're using, that is your problem.

I learned a few things, back at Facebook when I was working on libra/diem, and I think that could be useful to others. First, not languages are created equal. Languages like Golang are fantastic because they provide a rich standard library. What some would call a battery-included stdlib. Thanks to that, you rarely see Golang projects depend on more than 100 transitive dependencies (this means that I'm including dependencies that are used by your direct dependencies, and so on). Other languages, like javascript and Rust can have their dependency graph just blow up due to poor standard libraries and great package managers.

When I looked at what we were doing at the time with libra/diem, I got obsessed with the large amount of dependencies that our software used. A year in, I got inspired by Feymann's story about investigating the challenger disaster, and the way he managed to pinpoint the problem by using the wisdom of the crowd.

The wisdom of the crowd is this concept where the average of all people's guesses to a question can be extremely close to the real answer. The best demonstration to this concept is when a bunch of people are asked to estimate the number of jelly beans contained in a jar. Check that video on youtube. If you're french, fouloscopie is a great youtuber who does research about crowds and their behavior.

So I used the same methodology. I created a form and sent it to all the engineers at the company. The form asked the following: "for each pieces of our big puzzle, what is your estimation of the risk from 0 to 5? If you don't know leave it blank." The result was eye opening. The trivial parts were deemed not risky, as expected. The confusing parts (like consensus and HSM code) had estimations that were all over the place. Dependencies were ranked as the highest risk. Finally, some other parts that I had overlooked got ranked as high-risk which made a lot of sense and changed our strategy for the rest of the year.

So dependencies were definitely a problem, people agreed. What could we do? First, I developed a tool called dephell that would allow us to get some quick stats on the dependencies we were using in our Rust project. The tool was a bit buggy, and incorrect at times, but it was good enough to show us outliers. We used it on every libraries in our project to get a sense of what dependencies they were relying on. We had hundreds and hundreds of dependencies, so that was quite a hard task, but a few patterns stood out.

dephell

Sometimes, we used many different dependencies to solve the same problem. The fix was easy: decide which one was the most solid dependency and use that one. Other times, we would use a large dependency, that would import more dependencies, and notice that if we would rewrite it ourselves we could live with a much smaller amount of code and no dependencies. This is what I did when I wrote libra's noise implementation which removed a large amount of code and a few dozens of dependencies to a few hundreds of lines of code. Sometimes, we realized that the dependencies we were using were not great: they were not maintained, had unknown maintainers and contributors, lots of unsafe code, and so on. Finding a better alternative was usually the way to go.

That analysis got us to remove a lot of dependencies. But dependencies continue to creep in, and so we needed a flow to prevent that growth. The obvious solution was to add friction to the process of adding a dependency. Making sure that I was involved in accepting any PR that added a new dependency (via continuous integration) was enough. Most of the times it would go like this:

  • hey! Can I get a review from you for this PR, it looks like you need to look at it since I'm adding a new dependency.
  • for sure! I looked at the dependency and it's adding X lines of code, as well as Y additional dependencies. Is this something that's really necessary? Are there better alternatives?
  • hum... actually I could write that in a few lines of code, let me try that instead.

At that point, you're having a good sense of what your dependencies are, you sorted them out and even did some cleaning, and you stopped the growth. You still have problems: dependencies evolve. We know, from experience, that the risk of not updating dependencies is higher than updating. That's because a lot of your updates will be bug fixes. So you need to update. Updating adds a few non-negligible risks though:

  1. a backdoor could have been introduced in a (transitive) dependency
  2. new dependencies could be added by the update
  3. new bugs could be introduced by the update

The cost of 3 is negligible compared to the bug fixes. Remember: you can't review every dependencies you're using. You wouldn't be using third-party dependencies in the first place if you had all the time in the world. 2 has to be taken into account in your dependency growth control strategy. 1 is the subject of this post.

To tackle 1, John and I created whackadep. Whackadep is a service, with a web UI, that monitors your repository. Originally architected to be language-agnostic, it was for obvious reasons built primarily for Rust codebases.

whackadep

Whackadep's role was to periodically check if new updates were available, and to quickly show you two things:

  • what was the risk of not updating
  • what was the risk of updating

The risk of not updating is generally calculated via the help of RUSTSEC, the RustSec Advisory Database, as well as a calculation of the difference between the semantic versions. If the update is X breaking versions on top of your version, then it'll be harder to update if they're a serious bug that needs urgent fixing. All of this was taken into account to calculate a priority score that would help a human user figure out what to do.

The risk of updating was calculated from a few heuristics, but was meant to be extendable. The presence of an updated build.rs file, for example, was a big red flag. The presence of a new, never-seen-before, contributor in a release was also seen as a red flag. A number of these customizable heuristics would be used to calculate a risk score, which could be used by the user of the tool to decide if that warranted a human review.

Of course, in Rust-land, you get dependency updates every single day. So you need more reassurance that a review is not needed. For this, we came up with the idea of a trusted set of libraries. As Linus said one day (although I can't find that quote anymore): security works via a network, and if you don't do security via a network you're dumb. He said it sort of like that I believe. The idea is that you can't do everything yourself, and the only way to scale is to trust others. Thus, by whitelisting a number of libraries as being part of a club of well-maintained libraries with well-known authors, we could reduce the burden of reviewing dependencies.

Anyway, the whole point of this article is to show you that dependencies are a huge problem in most software stacks, and that solutions don't really exist to deal with this. Human processes are doomed to fail, and thus you need an automated and machine-built component to help you there. Interestingly, I just finished reading Zero to One a few days ago, and in the book Peter Thiel makes the point that machines shouldn't try to replace humans but instead provide services to help make them superheroes. This is what whackadep was aiming to do: give developers super powers. I hope I inspired some of you to continue pursuing this idea, because we're going to need some help.

comment on this story