Quick access to articles on this page:
more on the next page...
Someone asked that question on reddit, and so I replied with a high level answer that should provide a clear enough view of the algorithm:
From a high level, here's what a PRNG is supposed to look like:
you start with a seed
(if you reuse the same seed
you will obtain the same random
numbers), you initialize it into a state
. Then, every time you want to obtain a random
number, you transform that state
with a oneway function \(g\). This is because you don't want people to find out the state out of the random
output.
You want another random
number? You first transform the state
with a one way function \(f\): this is because you don't want people who found out the state
to be able to retrieve past states
(forward secrecy). And then you use your function \(g\) again to output a random
number.
Mersenne Twister (MT) is like that, except:
 your first
state
is not used to output any random
numbers
 a
state
allows you to output not only one, but 624 random
numbers (although this could be thought as one big random
number)
 the \(g\) function is reversible, it's not a oneway function, so MT it is not a cryptographically secure PRNG.
With more details, here's what MT looks like:
the \(f\) function is called "twist", the \(g\) function is called "temper". You can find out how each functions work by looking at the working code on the wikipedia page of MT.
The socat thingy created some interest in my brain and I'm now wondering how to build a NOBUS (Nobody But Us) backdoor inside DiffieHellman and how to reverse it if it's not a proper NOBUS.
Ways to do that is to imagine how the DH nonprime modulus could have been generated to allow for a backdoor. For it to be a NOBUS it should not easily be factorable, but for it to allow a PohligHellman attack it must have a Bsmooth order with B small enough for the adversary to compute the discrete log in a subgroup of order B.
I'm currently summing up my research in the open on a github repo: How to backdoor DiffieHellman, lessons learned from the Socat nonprime prime. If anyone is interested in any parts of this research (factorizing the modulus, thinking of ways to build the backdoored modulus, ...) please shoot me a message :)
If you go on the github repository you will see an already working proof of concept that explains each of the steps (generation, attack)
This proof of concept underlines one of the ways the malicious committer could have generated the nonprime modulus \(p = p_1 p_2\) with both \(p_i\) primes such that \(p_i  1\) are smooth. The attack works, but I'm thinking about ways of reversing such a nonprime modulus that would disable the NOBUS property of the backdoor. Spoiler alert: Pollard's p1 factorization algorithm.
Anyway, if you're interested in contributing to that research, or if you have any comments that could be useful, please shoot me a message =)
On February 1st 2016, a security advisory was posted to Openwall by a Socat developer: Socat security advisory 7  Created new 2048bit DH modulus
In the OpenSSL address implementation the hard coded 1024 bit DH p parameter was not prime. The effective cryptographic strength of a key exchange using these parameters was weaker than the one one could get by using a prime p. Moreover, since there is no indication of how these parameters were chosen, the existence of a trapdoor that makes possible for an eavesdropper to recover the shared secret from a key exchange that uses them cannot be ruled out.
A new prime modulus p parameter has been generated by Socat developer using OpenSSL dhparam command.
In addition the new parameter is 2048 bit long.
This is a pretty weird message with a Juniper feeling to it.
Socat's README tells us that you can use their free software to setup an encrypted tunnel for data transfer between two peers.
Looking at the commit logs you can see that they used a 512 bits DiffieHellman modulus until last year (2015) january when it was replaced with a 1024 bits one.
Socat did not work in FIPS mode because 1024 instead of 512 bit DH prime is required. Thanks to Zhigang Wang for reporting and sending a patch.
The person who pushed the commit is Gerhard Rieger who is the same person who fixed it a year later. In the comment he refers to Zhigang Wang, an Oracle employee at the time who has yet to comment on his mistake.
The new DH modulus
There are a lot of interesting things to dig into now. One of them is to check if the new parameter was generated properly.
It is a prime. Hourray! But is it enough?
It usually isn't enough. The developper claims having generated the new prime with openssl's dhparam command (openssl dhparam 2048 C
), but is it enough? Or even, is it true?
To get the order of the DH group, a simple \(p  1\) suffice (\(p\) is the new modulus here). This is because \(p\) is prime. If it is not prime, you need to know its factorization. This is why the research on the previous nonprime modulus is slow... See Thai Duong's blogpost here, the stackexchange question here or reddit's thread.
Now the order is important, because if it's smooth (factorable into "small" primes) then active attacks (small subgroup attacks) and passive attacks (PohligHellman) become possible.
So what we can do, is to try to factor the order of this new prime.
Here's a small script I wrote that tries all the primes until... you stop it:
# the old "fake prime" dh params
dh1024_p = 0xCC17F2DC96DF59A446C53E0EB826550CE388C1CEA7BCB3BF1694D8A945A2CEA95B22255F9259941C22BFCBC8C857CBBFBC0EE840F98703BF609B08C68E99C605FC00D66D90A8F5F8D38D43C88F7ABDBB28AC04694A0B867337F06D4F04F6F5AFBFAB8ECE75534D7F7D17780E12464AAF9599EFBCA6C54177437AB9EC8E073C6D
dh1024_g = 2
# the new dh params
dh2048_p = 0x00dc216456bd9cb2acbec998ef953e26fab557bcd9e675c043a21c7a85df34ab57a8f6bcf6847d056904834cd556d385090a08ffb537a1a38a370446d2933196f4e40d9fbd3e7f9e4daf08e2e8039473c4dc0687bb6dae662d181fd847065ccf8ab50051579bea1ed8db8e3c1fd32fba1f5f3d15c13b2c8242c88c87795b38863aebfd81a9baf7265b93c53e03304b005cb6233eea94c3b471c76e643bf89265ad606cd47ba9672604a80ab206ebe07d90ddddf5cfb4117cabc1a384be2777c7de20576647a735fe0d6a1c52b858bf2633815eb7a9c0ee581174861908891c370d524770758ba88b3011713662f07341ee349d0a2b674e6aa3e299921bf5327363
dh2048_g = 2
# is_prime(dh2048_p) > True
order = dh2048_p  1
factors = [2]
print "2 divides the order"
# let's try to factorize the order by trial divisions
def find_factors(number):
factors = []
# use different techniques to get primes, dunno which is faster
index = 0
for prime in Primes():
if Mod(number, prime) == 0:
print prime, "divides the order"
factors.append(prime)
if index == 10000:
print "tested up to prime", prime, "so far"
index = 0
else:
index += 1
return factors
factors += find_factors(order / 2)
It has been running for a while now (up to 82018837, a 27bits number) and nothing has been found so far...
The thing is, a PohligHellman attack is doable as long as you can compute the discrete log modulo each factors. There is no notion of "small enough factor" defined without a threat model. This backdoor is not gonna be usable by small players obviously, but by bigger players? By statesized attackers? Who knows...
EDIT: I forgot order/2 could be a prime as well. But nope.
I was watching this excellent video on the birth of elliptic curves by Dan Boneh, and I felt like the explanation of DiffieHellman (DH) felt short. In the video, Dan goes on to explain the typical DH key exchange:
Alice and Bob each choose a public point \(g\) and a public modulus \(N\).
By the way. If you ever heard of "DH1024" or some big number associated to DiffieHellman, that was probably the bitsize of this public modulus \(N\).
The exchange then goes like this:

Alice generates her private key \(a\) and sends her public key \(g^a\pmod{N}\) to Bob.

Bob generates his own private key \(b\) and sends his public key \(g^b\pmod{N}\) to Alice.
 They can both generate their shared key by doing either \((g^b)^a \pmod{N}\) for Alice, or \((g^a)^b \pmod{N}\) for Bob.
Dan then explains why this is secure: because given \((g, g^a, g^b)\) (the only values an eavesdropper can observe from this exchange) it's very hard to compute \(g^{ab}\), and this is called the Computational DiffieHellman problem (CDH).
But this doesn't really explain how the scheme works. You could wonder: but why doesn't the attacker do the exact same thing Bob and alice just did? He could just iterate the powers of \(g\) until \(g^a\) or \(g^b\) is found, right?
A key exchange with hash functions
Let's replace the exponentiation by a hash function. Don't worry I'll explain:
\(g\) will be our public input and \(h\) will be our hash function (e.g. sha256). One more thing: \(h^{3}(g)\) translates to \(h(h(h(g)))\).
So now our key exchange looks like this:

Alice generates an integer \(a\) large enough and compute \(a\) iteration of the hash function \(h\) over \(g\), then sends it to Bob.

Bob does the same with an integer \(b\) and sends \(h^b(g)\) to Alice (exact same thing that Alice did, different phrasing.)
 They both compute the share private key by doing either \((h^b(g))^a\) for Alice, or \((h^a(g))^b\) for Bob.
So if you understood the last part: Alice and Bob both iterated the hash functions on the starting input \(g\) a number of \(a+b\) times. If Alice's public key was \(h(h(g))\) and Bob's public key was \(h(h(h(g)))\) then they both end up computing \(h(h(h(h(h(g)))))\).
That seems to work. But how is this scheme secure?
You're right, it is not. The attacker can just hash \(g\) over and over until he finds either Alice's or Bob's public key.
So let's ask ourselves this question: how could we make it secure?
If Bob or Alice had a way to compute \(h^c(x)\) without computing every single hash (\(c\) hash computations) then he or she would take way less time to compute their public key than an attacker would take to retrieve it.
Back to our discrete logarithm in Finite groups
This makes it easier to understand how the normal DH exchange in finite groups is secure.
The usual assumptions we want for DH to works were nicely summed up in Boneh's talk
The point of view here is that discrete log is difficult AND CDH holds.
Another way to see this, is to see that we have algorithm to quickly calculate \(g^c \pmod{n}\) without having to iterate through every integers before \(c\).
To be more accurate: the algorithms we have to quickly exponentiate numbers in finite groups are way faster than the ones we have to compute the discrete logarithm of elements of finite groups. Thanks to these shortcuts the good folks can quickly compute their public keys while the bad folks have to do all the work.
Taken from the SLOTH paper, the current estimated complexities of the best known attacks against MD5 and SHA1:
 Commonprefix collision  Chosenprefix collision 
MD5  2^16  2^39 
SHA1  2^61  2^77 
MD5SHA1  2^67  2^77 
MD5SHA1 is a concatenation of the outputs of both hashes on the same input. It is a technique aimed at reducing the efficiency of these attacks, but as you can see it, it is not that efficient.
I never understood why Firefox doesn't display a warning when visiting nonhttps websites. Maybe it's too soon and there are too many notls servers out there and the user would learn to ignore the warning after a while?
I don't know, so I wrote a few lines and made the addon here
Just drag and drop the .xpi
in your firefox. You can also review the ultraminimal code in index.js and build the xpi yourself with Mozilla's JDK
A few weeks ago I wrote about testing RSA public keys from the most recent Alexa's top 1 million domains handshake log that you can get on scans.io.
Most public exponents \(e\) were small and so no small private key attack (Boneh and Durfee) should have happened. But I didn't explained why.
Why
The private exponent \(d\) is the inverse of \(e\), that means that \(e * d = 1 \pmod{\varphi(N)}\).
\(\varphi(N)\) is a number almost as big as \(N\) since \(\varphi(N) = (p1)(q1)\) in our case. So that our public exponent \(e\) multiplied with something would be equal to \(1\), we would at least need to loop through all the elements of \(\mathbb{Z}_{\varphi(N)}\) at least once.
Or put differently, since \(e > 1 \pmod{\varphi(N)}\), increasing \(e\) over \(\varphi(N)\) will allow us to get a \(1\).
l = 1024
p = random_prime(2^(l/2), lbound= 2^(l/2  1))
q = random_prime(2^(l/2), lbound= 2^(l/2  1))
N = p * q
phiN = (p1) * (q1)
print len(bin(int(phiN / 3)))  2 # 1024
print len(bin(int(phiN / 10000000))) # 1002
This quick test with Sage shows us that with a small public exponent (like 3, or even 10,000,000), you need to multiply it with a number greater than 1000 bits to reach the end of the group and possibly ending up with a \(1\).
All of this is interesting because in 2000, Boneh and Durfee found out that if the private exponent \(d\) was smaller than a fraction of the modulus \(N\) (the exact bound is \(d < N^{0.292}\)), then the private exponent could be recovered in polynomial time via a lattice attack. What does it mean for the private exponent to be "small" compared to the modulus? Let's get some numbers to get an idea:
print len(bin(N))  2 # 1024
print len(bin(int(N^(0.292))))  2 # 299
That's right, for a 1024 bits modulus that means that the private exponent \(d\) has to be smaller than 300 bits. This is never going to happen if the public exponent used is too small (note that this doesn't necessarely mean that you should use a small public exponent).
Moar testing
So after testing the University of Michigan · Alexa Top 1 Million HTTPS Handshakes, I decided to tackle a much much larger logfile: the University of Michigan · Full IPv4 HTTPS Handshakes. The first one is 6.3GB uncompressed, the second is 279.93GB. Quite a difference! So the first thing to do was to parse all the public keys in search for greater exponents than 1,000,000 (an arbitrary bound that I could have set higher but, as the results showned, was enough).
I only got 10 public exponents with higher values than this bound! And they were all still relatively small (633951833, 16777259, 1065315695, 2102467769, 41777459, 1073741953, 4294967297, 297612713, 603394037, 171529867).
Here's the code I used to parse the log file:
import sys, json, base64
with open(sys.argv[1]) as ff:
for line in ff:
lined = json.loads(line)
if 'tls' not in lined["data"] or 'server_certificates' not in lined["data"]["tls"].keys() or 'parsed' not in lined["data"]["tls"]["server_certificates"]["certificate"]:
continue
server_certificate = lined["data"]["tls"]["server_certificates"]["certificate"]["parsed"]
public_key = server_certificate["subject_key_info"]
signature_algorithm = public_key["key_algorithm"]["name"]
if signature_algorithm == "RSA":
modulus = base64.b64decode(public_key["rsa_public_key"]["modulus"])
e = public_key["rsa_public_key"]["exponent"]
# ignoring small exponent
if e < 1000000:
continue
N = int(modulus.encode('hex'), 16)
print "[",N,",", e,"]"
There is no day 4, this is over... And I've got a ton to work on/read about/catch up with.
But first! I'm spending the week end in San Francisco before flying to Austin, if anyone wants to hang out in SF feel free to contact me on twitter =)
(and if you work for Dropbox, feel free to invite me to eat at your one michelin star cafetaria)
Takehome message
 Tor's security seems a bit shaky to me
 QUIC crypto will die. Just look at tls 1.3
 TLS 1.3 is still a clusterfuck
 Lots of stuff to break in SSE and PPE
 Intel is doing something really cool with SGX
 The Juniper paper is going to be a big deal
 The BREACH improvement is going to be a big deal
Papers to read
First, a bunch of slides are already available through the real world crypto webpage. And I've been taking notes every day: day1, day2, day3.
Now here's my to read list from the important talks:
And bonus, here are some paper that have nothing to do with RWC but that I still want to read right now:
Next conventions to attend
I actually have no idea about that. You?