How can you tell if a cipher is secure?
I was asked that question during an interview a while ago. Back then it troubled me because it seemed so basic and yet and I had no idea how to answer it. I became vivid and didn't know what to say and later I didn't get the job. I thought it would be interesting to write down how I would answer this question now.
We're in the dark
If we have no idea how the algorithm works and can only do some playing with it (or worse, we can only listen to people using it) then we can safely call it a Black box. This is pretty hard to attack and also pretty hard to talk about security here because it might be not secure at all (like Enigma during the war) but we would still have a hard time telling.
We're almost in the dark
If I talk about Blackboxes then I should talk about Whiteboxes. If you have no idea what your cipher is doing, but you can analyze it, then you call that a Whitebox. For example your phone, your credit card, and so on, they all use cryptographic tools that you can take a look at since it is in your own hands. Cryptography was not thought for that kind of model. It was primarily born to hide private conversations from other curious parties and ciphers were supposed to be used by good people only. In our modern world, clever people found a new use for cryptography in our everyday-devices. And it quickly became problematic since a lot of new attacks rose up.
It doesn't mean that attacking a whitebox is easy though. People use reverse engineering, heuristics, side-channel attacks (fault injection, power analysis, sound analysis...) and a lot of other techniques to break them. There are a lot of researches, especially in the domain of smartcards.
We're in the clear
In cryptography, Kerckhoffs's principle was stated by Auguste Kerckhoffs in the 19th century: A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.
OK. Now let's talk about a cipher we can see and touch and play with anyway we want.
We will not talk about the implementation of the cipher because it will have the same problems the whitebox have: it will be subject to new attacks depending where and how you implement it. It is a whole new level of research and precautions you have to take. Think about side-channel attacks, bugs (buffer overflow), human errors, padding attacks (if we chose a bad mode of operation for example), etc...
So we'll place ourselves in a more restrictive model.
There is also a lot of different kind of ciphers (and it would be an history class to write about it) so I'll just talk about the most used type of ciphers nowadays: Stream ciphers and Block ciphers (both symmetric ciphers) and Asymmetric ciphers
Although let's just say that it was proven by Shannon in the 40s that the One Time Pad is the only cipher having perfect secrecy.
What kind of secrecy?
There are multiple kinds of secrecy in the literature. Perfect Secrecy that we just talked about. The kind of secrecy that doesn't leak any information about the plaintext and the key (although it might leak the maximum length of it). And if you read more about it you will see that it is not practical thus almost never used.
So we defined a weaker type of secrecy: Semantic secrecy.
There are multiple definitions but they are equivalent to IND-CPA or IND-CCA or IND-CCA2 depending on what you want from your cipher.
- IND here means Indistinguishable
- CPA here means "under Chosen Plaintext Attack".
- CCA means "under Chosen Ciphertext Attack" and it is a stronger definition of secrecy.
- CCA2 means "under Adaptive Chosen Ciphertext Attack"
Note that Non-Malleability has nothing to do with secrecy. For example the One Time Pad is perfectly secure, yet it is Malleable. Although you can prove that a cipher is non-malleable under chosen ciphertext attack and it would be the same thing as ind-cca. Also, some kind of ciphers are malleable on purpose, see Homomorphic encryption.
There are also talks about Provable secrecy where we reduce the whole cipher/cryptosystem to the solidity of a problem difficult to compute. It's done more for Asymmetric encryption that generally relies on math more than symmetric encryption.
For example RSA is provably secure because its entire security relies on the Factoring Problem, which states that it is increasingly very difficult to compute
n = p * q n a large number and p, q primes.
A good cipher
So if we want to prove that our cipher is secure. We would have to prove that an Adversary would have no advantage in a guessing game where he would send us two plaintexts and we would send him back one of the encrypted plaintext (chosen at random) expecting him to guess which one it is (see image above).
This is difficult to prove. For asymmetric encryption we'd rather reduce that to other assumptions. For symmetric encryption we would have to make sure its encrypted ciphertexts look random every time. Like an ideal cipher.
for Stream Ciphers the randomness of the ciphertexts depends a lot on the randomness of the Pseudo Random Number Generator you are building it with (PRNG).
for Block Ciphers there are two things that are important: Confusion and Diffusion. Confusion states that a small change in the key has to change the whole ciphertext, Diffusion states that a small change in the plaintext has to change the whole ciphertext. In AES for example, Confusion is done by using a Key Derivation Function to create several subkeys and XOR them to the internal state on multiple rounds. Diffusion is done during the different rounds with a mix of Permutations and Substitution (that's why we call AES a substitution-permutation network).
A cryptanalyst is the black beast of cryptographers. His job is to attack our lovely ciphers. But this is useful in knowing how secure is a cipher. AES has been considered a solid cipher because it is build on solid principles like the avalanche principle (confusion and diffusion) but not only because of that. Because it has resisted to known attacks since it has been born. A good cipher should resist multiple years of attacks. A good cipher should withstand the efforts of cryptanalyst in time.
What does a cryptanalyst do to break a cipher?
The first good answer is bruteforce or exhaustive search. If we can simply bruteforce a cipher then it is obviously not good. When you do this you have to consider the birthday paradox and time-memory trade off attacks. tl;dr: it doesn't take as long as you think to bruteforce something because of probabilities.
And here is why AES is considered a solid cipher again. There are no known attacks better than bruteforcing.
For cryptographers, a cryptographic "break" is anything faster than a brute force
It is known (thanks to Shannon in 1949) that in a known plaintext attack we can build an equation linking the plaintext with the ciphertext and where the unknowns are the bits of the key. Here the algebraic/linear attack would be to solve the system we now have. If it is more complicated than that, we often talk about second order attack, third order attack, etc...
In a Differential cryptanalysis what we do is we try to notice a correlation in the differences between the internal state of several messages getting encrypted. Often to find a subkey and then recover the key Total Break.
So the answer could be summed up like this: After researching how to make a good cipher (reducing it to a known hard math problem if it's an asymmetric cipher; making sure of it having nice criterion (confusion & diffusion) in the case of a block cipher; making sure of the non-correlation of its PNRG and the randomness of its PNRG if it's a stream cipher), you would try to break it with known attacks. Then you would enlarge the model you are working with to see if your cipher is still as secure, and if not how you would modify it or use it to make it secure. For example with textbook RSA you can forge your own packets from it (if you have
c = m^e you can create
2^e * c which would successfully decrypt as
2*m) and its deterministic properties (that makes it not semantically secure) leak information (
(c/n) = (m^e/p) (m^e/q) = (m/p)(m/q) = (m/n) with
(a/b) being the symbol of Legendre/Jacobi here). Without a padding system like OAEP RSA leaks the Jacobi Symbol of m.
I'm still a student. Those are information I've gathered and this is my understanding of it.
I've stumbled on Dan Boneh Number Theory Cheat sheets. Number 1 and Number 2. Quick to read, I'm going to print them and display them somewhere on my walls :)
I also ran into the homepage of Vitaly Shmatikov. He uploaded a lot of slides, presentations and resources on a lot of different courses related to security and cryptography. He also lists a lot of interesting papers. I want to read everything but right now I have to focus on my exams (and interviews for my internship...).
EDIT: Oh but one last link. Orange Labs publications. There are some interesting papers in there too. I'm mostly writing this post to bookmark all those great links somewhere.
EDIT2: How to do a litterature search. That might be useful.
EDIT3: I have also returned to the Rss readers that I had banished from my life something like 7 years ago. I have a tendency to get addicted to things pretty quickly and back then I had subscribed to way too many feeds (I think one post would pop every 2 minutes) and I was constantly reading something. But I figured, what if I filled it with all those blogs on cryptography/security. That would be working and not slacking. So that's what I did. I'm using Digg on desktop and Feedly on my cellphone. And of course I'll be posting here the articles I find interesting :)
In one of my class the teacher advise against
I wondered why. I remembered reading some articles about random vs urandom and urandom being better, but that was years ago and my memory is not fresh. Wikipedia does advise against it, as does the manpage if you want to generate a long term key.
I also stumbled on one of Thomas Pornin's answer on the security SO also pointing to a blogpost from Thomas Hühn
Fact: /dev/urandom is the preferred source of cryptographic randomness on UNIX-like systems.
An interesting 0day on paypal was discolsed by Yasser Ali.
We have found out that an Attacker can obtain the CSRF Auth which can be valid for ALL users, by intercepting the POST request from a page that provide an Auth Token before the Logging-in process, check this page for the magical CSRF Auth “https://www.paypal.com/eg/cgi-bin/webscr?cmd=_send-money”. At this point the attacker Can CSRF “almost” any request on behave of this user.
A CSRF attacks (Cross-Site Request Forgery) happens when you can send a link to someone (or embed it into an iframe on your website) and it makes the user do something on a particular website (like paypal) that he didn't intend to do. Or as the name of the attack says, it makes him send a request you forged from outside the website.
A CSRF token is used to cancel this attack. It's usually a random value that is send along the request and verified server side. This value is difficult to predict and thus you usually can't forge it along the request.
I wrote about Differential Power Analysis (DPA) but haven't said that there were way more efficient attacks (although that might be more costy to setup). Differential Fault Analysis is a kind of differential cryptanalysis: you analyse the difference between blocks of the internal state and try to extract a subkey or a key. Here we do a fault injection on the internal state of the smartcard during an encryption operation (usually with lasers (photons have the property of igniting a curant in a circuit), or by quickly changing the temperature).
The attack presented in http://eprint.iacr.org/2010/440.pdf and https://eprint.iacr.org/2003/010.pdf is targeting the last subkey.
We inject a fault on 1 byte of AES (in the picture we consider the internal state of AES to be a 4x4 matrix of bytes) at a particular spot (before the last round) and we see that at one point it creates a diagonal of errors. We can XOR the internal state without fault with the faulty one to display only the propagation of the fault.
Here, by doing an hypothesis on keys and seeing how the Addkey operation is modifying this difference we can compute the last subkey.
On AES-128, it is sufficient to know K10 to find the cipher key, but on AES-256, you must know K13 and K14
Although this is only my understanding of the DFA. It also seems to be easier to produce on RSA (and it was originally found by Shamir on RSA).
Google is introducing a new Captcha, instead of trying to read a distorted word and write it down (because robots have troubles reading distorted words) you will just have to click on one button. Google will analyze small cues that prove that you are not a robot (like the movement of the cursor before clicking on the button). Some of those cues will only happen when the mouse will hover the widget, as google is not in control of the entire document when used outside of google's domain.
see more info on wired
edit: more info on google blog as well
someone on hackernews:
It's definitely not only relying on cursor movements. A simple $('iframe').contents().find('.recaptcha-checkbox-checkmark').click() proved that I'm not a robot, without me touching the mouse.
It seems to rely heavily on your cookies as well.
I'm overwhelmed with interviews those last days. I just spent more than 8 hours on one (remote location, late trains, low battery, no keys... long story).
And like this is not enough, Hack Summit just started
Right now Ed Roman is introducing the hack.summit(). talking about a lot of good stuff and giving some book recommendations.
EDIT: Some notes on what he just said:
- pair coding (when you code and someone is watching, or the inverse)
- pomodoro technique (I'm already doing this, even made this)
- remove distractions (mail, phone...)
- pretend to talk and explain what you're doing/coding (close to my theory about writing down stuff to organize your thoughts, and pretend to write stuff on the table with your fingers to memorize it)
- iterate quickly, fail quickly and often.
- personalize your IDE
- use git, use the command line...
Now Scott Hanselman is talking but I have to go to the Champs Elysées drink some Glühwein at the Marché de noël some pardon me ;)
Now Tom Chi
I'm studying the internals of hash functions and MACs right now. One-way Compression Functions, Sponge functions, CBC-MAC and... the Merkle–Damgård construction. Trying to find a youtube video about it I run into... The Cryptography course of Dan Boneh I already took 3 years ago. I have a feeling I will forever return to that course during my career as a cryptographer.
The whole playlist is here on youtube and since his course is awesome I just watched again the whole part about MACs. And I thought I should post this explanation of the birthday paradox since as he says:
Everybody should see a proof of the birthday paradox at least once in their life
Something that always bugged me though is that he says the formula for the birthday is
1.2 sqrt(365) whereas it should be square root of 366 since there are indeed 366 different birthdays possible.
This morning I had a course on Return Oriented Programming given by Jonathan Salwan, a classmate of mine also famous inventor of RopGadget.
The slides are here.
A lot of interesting things there. Apparently it's still kind of impossible to completely protect your C code against that kind of attack. Even with all the ASLR, PIE, NX bit and other protections... There is also an awesome lecture about ROP on Coursera I linked to in the previous post here.
Basically, since you can't execute code in the stack, and since the addresses of libraries are randomized because of ASLR, you can find bits of codes ending with a return (called gadgets) and chain them since you control the stack (thus the saved EIPs). What I learned by doing was that it gets complicated if it's 64bits (since a lot of address will have a lot of 0x00 and you can't point to those doing a buffer overflow through a strcpy or something similar) and you won't get a lot of those gadgets if you have dynamically loaded libraries. Static libraries are loaded in the .text section (which is executable of course), so that's all good. Also a good way to store strings of data are in the .data section since it is untouched by the randomization contrarily to the stack.
A lot of researches is done on the subject and new tools like RopGadget are coming, using an old concept (but still actively researched): the SAT solvers. There seems to be a problem though, those SAT solvers yield a set of gadgets to be used for some action you want to accomplish with your shellcode, but you have to do the work of putting them in the right order.
This is what I took from that talk, you can question the guy if that interests you!
I've already talked about Coursera before, and how much I liked it.
The Cryptography course by Dan Boneh is amazing and I often come back to it when I need a reminder. For example, even today I rewatched his video on AES because I was studying Differential Fault Analysis on AES (which is changing bits of the state during one round of AES to leak information about the last round subkey).
So if I could give you another course recommendation, it would be Software Security by Michael Hicks. It looks ultra complete and the few videos I've watched (to complete the security course I'm taking at the University of Bordeaux by Emmanuel Fleury) are top notch.
Communication Theory of Secrecy Systems is a paper published in 1949 by Claude Shannon discussing cryptography from the viewpoint of information theory. It is one of the foundational treatments (arguably the foundational treatment) of modern cryptography. It is also a proof that all theoretically unbreakable ciphers must have the same requirements as the one-time pad.
Crypt, already omitted by most linux distributions, is being retired as well by OpenBSD: http://www.tedunangst.com/flak/post/retiring-crypt
The crypt function is a unix classic. Unfortunately, its age is showing. It’s an interface from another time, out of place on modern systems, and it’s time for OpenBSD to move on.
I found an old Matthew Green's post where he wrote a really useful list of cryptography blogs and resources
I'll get back here after reading everything.
Studying about smartcard there seem to be a lot about whitboxes to learn, since it is indeed a whitebox: the encryption/decryption that are done inside the cards can be analyzed since you own the card. Analysis are separated in different categories like non-intrusive and intrusive. Intrusive because for efficient analysis you would have to remove some part of the plastic covering the interesting parts and directly plug yourself on the chip. This is what Differential Power Analysis (DPA) do, it's a stronger kind of Simple Power Analaysis (SPA).
Kocher & al found out about this in 1998 and released a paper that is still very useful today: http://www.cryptography.com/public/pdf/DPA.pdf
The idea is to record the power consumption of the chip along multiple encryptions. You then obtain curves with pics that you can correlate to XORs operations being performed. You can guess what cipher is used, and where are the known rounds/operations of the cipher from the intensities of some peaks, and the periodicity of some patterns. In the paper they study DES which is still the state of the art for block ciphers then.
Looking at a big number of such curves, along with the messages (or ciphertexts) they encrypted, you can focus on one operation and one bit of the internal state to find out one bit of one of the subkey. One bit should affect the number of XORs being performed thus you should find a correlation between the bit you're looking for and the power consumption at one point. Repeat and find all the other ones. It's powerful because you only need to find one bit of the subkey, one after the other.
It's pretty hard to explain it without pictures (and a video would be even better, that's always something I have been wanting to do, if I dig deeper into it maybe I'll try that). But the basic idea is here, if you want more info check the original paper
It was already pretty amazing when Cloudflare introduced Universal SSL (and this blog uses cloudflare ssl by the way).
Today the EFF has launched Let's Encrypt that aims to simplify the setup of SSL. They claim it takes 20-30 seconds to deploy SSL to your server. And this for free.