posted last month
A paper on (attacking) database encryption was just released by Paul Grubbs, Marie-Sarah Lacharite, Brice Minaud, and Kenneth G. Paterson. Matthew Green just posted an article about it. I don't know much about database encryption, but from my experience there is a lot of solutions and it is quite hard to understand what you're getting from them. So I really like this kind of articles. I figured I would add my 2 cents to encourage people to write about database encryption.
I want to mention this first, because this is one of the most stupid solution I've seen being used to either anonymize a database or protect it. What you do is that you keep track of a mapping between real words and random words. For example "pasta" means "female" and "guitar" means "male". Then you replace all these words by their counterparts (so all the mention of "female" in your database get replaced by "pasta"). This allow you to have a database that doesn't hold any information. In practice, this is leaking way too much and it is in no way a good idea. It is called tokenization.
Databases contain user information, like credit card numbers and passwords. They are often the target of attacks, as you can read in the news. For example, you can check on haveibeenpwned.com if your email has been part of a database breach. A friend of mine has been part of 19 database breaches apparently.
To prevent such things from happening, on the server side, you can encrypt your database.
Why not, after all.
The easier way to do this is to use Transparent Database Encryption (TDE) which a bunch of database engine ship by default. It's really simple, some columns in tables are encrypted with a key. The
id and the
table name are probably used to generate the nonce, and there's probably some authentication of the row or something so that rows can't be swapped. If your database engine doesn't support TDE, people will use home-made solutions like sqlcipher. The problem with TDE or sqlcipher, is that the key is just lying around next to your database. In practice, this seems to be good enough to get compliant to regulation stuff (perhaps like FIPS and PCI-DSS).
This is also often enough to defend against a lot of attacks, which are often just dumb smash-and-grab attacks where the attacker dumps the database and flees the crime scene right away. If the attacker is not in a hurry though, he can look around and find the key.
It's not that hard.
EDIT: one of the not so great thing about this kind of solution, is that the database is not really searchable unless you left some fields unencrypted. If you don't do searches, it's fine, but if you do, it's not so fine. The other solutions I talk about in this post also try to provide searchable encryption. They belong in the field of Searchable Symmetric Encryption (SSE).
The solution here is to segment your architecture into two parts, a client who holds the key and an untrusted database. If the database is breached, then the attacker learns nothing.
I hope you are starting to realize that this solution is really about defense in depth, we're just moving the security somewhere else.
If both sides are breached, it's game over, if the client is breached an attacker can also start making whatever requests it wants. The database might be rate-limited, but it won't stop slow attacks. For password hashing, there have been hilarious proposals to solve this with security through obesity, and more serious proposals to solve this with password hashing delegation (see PASS or makwa).
But yeah, remember, this is defense in depth. Not that defense in depth doesn't work, but you have to frame your research in that perspective, and you also have to consider that whatever you do, the more annoying it is and the more efficient it will be at preventing most attacks.
Now that we got the setup out of the way, how do we implement that?
The naive approach is to encrypt everything client-side. But this is awful when you want to do complicated queries, because the server has to pretty much send you the entire database for you to decrypt it and query it yourself. There were some improvements known there, but there is only so much you can probably do.
Other solutions (cryptdb, mylar, opaque, preveil, verena, etc.) use a mix of cryptography (layers of encryption, homomorphic encryption, order preserving encryption, order revealing encryption, etc.) and perhaps even hardware solutions (Intel SGX) to solve the efficiency issue.
Unfortunately, most of these schemes are not clear about the threat model. As they do not use the naive approach, they have to leak some amount of information, and it is not obvious what attacks can do. And actually, if you read the research done on the subject, it seems like they could in practice leak a lot more than what we would expect. Some researchers are very angry, and claim that the situation is worst than having a plaintext database due to do the illusion of security when using such database encryption solutions.
Is there no good solution?
In practice, fully homomorphic encryption (FHE) could work to prevent the bad leakage we talked about. I don't know enough about it, but I know that there is no CCA2-secure FHE algorithm, which might be an issue. The other, more glaring, issue is that FHE is slow as hell. It could be practical in some situations, but probably not at scale and for this particular problem. For more constrained types of queries, somewhat homomorphic encryption (SHE) should be faster, but is it practical? Partial homomorphic encryption (HE) is fast but limited. The paillier cryptosystem, an HE algorithm, is actually used in cryptdb to do addition on ciphertexts.
The take away is that database encryption is about defense in depth. Which means it is useful, but it is not bullet proof. Just be aware of that.