Quick access to articles on this page:
more on the next page...
First: happy new year dear readers !
As I'm packing my bags to leave the temporary comfort of my parent's place, I'm taking the time to write a bit about my life (this is a blog after all).
I started this blog more than 2 years ago right before moving to Bordeaux to start a master in Cryptography. I had just finished a long bachelor of Mathematics between the universities of Lyon (France) and McMaster (Canada) and had decided to merge my major with an old passion of mine (Computer Science).
Hey guys, I'm David Wong, a 24 years old french dude who's going to start a Master of Cryptology in the university of Bordeaux 1.
I had been blogging for decades and it's naturally that I decided to start something that I could look at and be proud at the end of my master. Sort of a journal of a 2 years project. I was also counting on it to give me some motivation in time of adversity, and it started taking shape with tutorial videos on classes I couldn't understand (here's my first on Differential Power Analysis) and long articles about failed interviews (here's the one I made after interviewing with Cloudflare)
I still have no clue what my future job will be, that's why I had the idea of making this small blog where I could post about my ventures into this new world and, hopefully, being able to take a step back and see what I did, what I liked, what happened in two years of Master (and maybe more).
Fast forward and I was interning at Cryptography Services, the crypto team of NCC Group. An amazing internship of around 5 months spent in Chicago in the Matasano office: working on public audits (OpenSSL, Let's Encrypt) and private ones, giving presentations at the company, publishing a research paper, training a class in crypto at Blackhat, hanging out at Defcon and writing several articles for our crypto bulletin.
I'll also post some thoughts about the new city I'll be moving to : Bordeaux. This is for at least 2 years, or less if I change my mind. Anyway, this is going to be exciting!
That was 2 years ago, and indeed those years are now filled with memories and achievements that I will forever cherish. If you're passing by France and you didn't plan a visit in Bordeaux, you're missing out.
But anyway, as you probably know since you don't miss any of my blogpost, I've been since hired by the same people and will be back in the office in two weeks. In two weeks because before then I will be at the Real World Crypto convention in Stanford university, and after that at NCC Con in Austin. A lot is going to happen in just a few weeks, plus I'll have to find a new place to live and re-calibrate with the desk I left behind...
Now, here we go again.
With all the current crypto talks out there you get the idea that crypto has problems. crypto has massive usability problems, has performance problems, has pitfalls for implementers, has crazy complexity in implementation, stupid standards, millions of lines of unauditable code, and then all of these problems are combined into a grand unified clusterfuck called Transport Layer Security.
Check that 32c3 video of djb and tanja
It explains a few of the primitives backed by PQCrypto for a post-quantum world. I did myself a blog series on hash-based signatures which I think is clearer than the above video.
I wanted to check for weak private exponents in RSA public keys of big website's certificates. I went on scans.io and downloaded the Alex Top 1 Million domains handshake of the day. The file is called zgrab-results
and weighs 6.38GB uncompressed (you need google's lz4 to uncompress it, get it with brew install lz4
).
Then the code to parse it in python:
with open('rro2asqbnwy45jrm-443-https-tls-alexa_top1mil-20151223T095854-zgrab-results.json') as ff:
for line in ff:
lined = json.loads(line)
if 'tls' not in lined["data"] or 'server_certificates' not in lined["data"]["tls"].keys() or 'parsed' not in lined["data"]["tls"]["server_certificates"]["certificate"]:
continue
server_certificate = lined["data"]["tls"]["server_certificates"]["certificate"]["parsed"]
public_key = server_certificate["subject_key_info"]
signature_algorithm = public_key["key_algorithm"]["name"]
if signature_algorithm == "RSA":
modulus = base64.b64decode(public_key["rsa_public_key"]["modulus"])
e = public_key["rsa_public_key"]["exponent"]
N = int(modulus.encode('hex'), 16)
print "modulus:", N
print "exponent:", e
I figured if the public exponent was too small (e.g. smaller than 1000000, an arbitrary lower bound), it would not work. Unfortunately it seemed like every single one of these RSA public keys were using the public exponent 65537
.
PS: to parse other .csv files, just open sqlite and write .import the_file.csv tab
, then .schema tab
or any SQL query on tab
will work ;)

A few days ago, Juniper made an announcement on 2 backdoors in their ScreenOS application.

But no details were to be found in this advisory. Researchers from the twitter-sphere started digging, and finally the two flaws were found. The first vulnerability is rather crypto-y and this is what I will explain here.
First, some people realized by diffing strings
of the patched and vulnerable binaries that some numbers were changed

Then they realized that these numbers were next to the parameters of the P-256 NIST ECC curve. Worse, they realized that the modified values were these of the Dual EC PRNG: from a Juniper's product information page you could read that Dual EC had been removed from most of their products except ScreenOS. Why's that? No one knows, but they assured that the implementation was not visible from the outside, and thus the NSA's backdoor would be unusable.

Actually, reading the values in their clean binaries, it looks like they had changed the NSA's values introducing their own \(Q\) point and thus canceling NSA's backdoor. But at the same time, maybe, introducing their own backdoor. Below the NSA's values for the point \(P\) and \(Q\) from the cached NIST publications:

Reading the previous blog post, you can see how they could have easily modified \(Q\) to introduce their own backdoor. This doesn't mean that it is what they did. But at the time of the implementation, it was not really known that Dual EC was a backdoor, and thus there was no real reason to change these values.

According to them, and the code, a second PRNG was used and Dual EC's only purpose was to help seeding it. Thus no real Dual EC output would see the surface of the program. The second PRNG was a FIPS approved one based on 3DES and is -- as far as I know -- deemed secure.

Another development came along and some others noticed that the call for the second PRNG was never made, this was because a global variable pnrg_output_index
was always set to 32
through the prng_reseed()
function.
Excerpt of the full code:

This advance was made because of Juniper's initial announcement that there were indeed two vulnerabilities. It seems like they were aware of the fact that Dual EC was the only PRNG being used in their code.

Now, how is the Dual EC backdoor controlled by the hackers? You could stop reading this post right now and just watch the video I made about Dual EC, but here are some more explanations anyway:

This above is the basis of a PRNG. You start it with a seed \(s_0\) and every time you need a random number you first create a new state from the current one (here with the function \(f\)), then you output a transformation of the state (here with the function \(g\)).
If the function \(g\) is one-way, the output doesn't allow you to retrieve the internal state and thus you can't predict future random numbers, neither retrieve past ones.
If the function \(f\) is one-way as well, retrieving the internal state doesn't allow you to retrieve past state and thus past random numbers generated by the PRNG. This makes the PRNG forward-secure.

This is Dual EC. Iterating the state is done by multiplying the current state with the point \(P\) and then taking it's \(x\)-th coordinate. The point \(P\) is a point on a curve, with \(x\) and \(y\) coordinates, multiplying it with an integer gives us a new point on the curve. This is a one-way function because of the elliptic curve discrete logarithm problem and thus our PRNG is forward-secure (the ECDLP states that if you know \(P\) and \(Q\) in \(P = dQ\), it's really hard to find \(d\)).

The interesting thing is that, in the attacker knows the secret integer \(d\) he can recover the next internal state of the PRNG. First, as seen above, the attacker recovers one random output, and then tries to get the full output: the real random output is done by truncating the first 16 bits of the full output. This is done in \(2^16\) iterations. Easy.
With our random number \(r_1\) (in our example), which is the \(x\) coordinate of our point \(s_1 Q\), we can easily recover the \(y\) coordinate and thus the entire point \(s_1 Q\). This is because of how elliptic curves are shaped.
Multiplying this point with our secret value \(d\) we obtain the next internal state as highlighted at the top of this picture:

This attack is pretty destructive and in the order of mere minutes according to Dan Bernstein et al

For completeness, it is important to know that there were two other constructions of the Dual EC PRNG with additional inputs, that allowed to add entropy to the internal state and thus provide backward secrecy: retrieving the internal state doesn't allow you to retrieve future states.
The first construction in 2006 broke the backdoor, the second in 2007 re-introduced it. Go figure...

tl;dr:
this is what you should type:
strings your_binary | grep -C5 -i "c97445f45cdef9f0d3e05e1e585fc297235b82b5be8ff3efca67c59852018192\|8e722de3125bddb05580164bfe20b8b432216a62926c57502ceede31c47816edd1e89769124179d0b695106428815065\|1b9fa3e518d683c6b65763694ac8efbaec6fab44f2276171a42726507dd08add4c3b3f4c1ebc5b1222ddba077f72943b24c3edfa0f85fe24d0c8c01591f0be6f63"
After all the Jupiner fiasco, I wondered how people could look if a binary contained an implementation of Dual EC, and worse, if it contained Dual EC with the NSA's values for P and Q.
The easier thing I could think of is the use of strings
to check if the binary contains the hex values of some Dual EC parameters:
strings your_binary | grep -C5 -i `python -c "print '%x' % 115792089210356248762697446949407573530086143415290314195533631308867097853951"`
This is the value of the prime p
of the curve P-256. Other curves can be used for Dual EC though, so you should also check for the curve P-384:
strings your_binary | grep -C5 -i `python -c "print '%x' % 39402006196394479212279040100143613805079739270465446667948293404245721771496870329047266088258938001861606973112319"`
and the curve P-521:
strings your_binary | grep -C5 -i `python -c "print '%x' % 6864797660130609714981900799081393217269435300143305409394463459185543183397656052122559640661454554977296311391480858037121987999716643812574028291115057151 "`
I checked the binaries of ScreenOS (taken from here) and they contained these three curves parameters. But this doesn't mean anything, just that these curves are stored, maybe used, maybe used for Dual EC...
To check if it uses the NSA's P and Q, you should grep for P and Q x coordinates from the same NIST paper.

This looks for all the x coordinates of the different P for each curves. This is not that informative since it is the standard point P as a generator of P-256
strings your_binary | grep -C5 -i "6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296\|aa87ca22be8b05378eb1c71ef320ad746e1d3b628ba79b9859f741e082542a385502f25dbf55296c3a545e3872760ab7\|c6858e06b70404e9cd9e3ecb662395b4429c648139053fb521f828af606b4d3dbaa14b5e77efe75928fe1dc127a2ffa8de3348b3c1856a429bf97e7e31c2e5bd66"
Testing the ScreenOS binaries, I get all the matches. This means that the parameters for P-256 and maybe Dual EC are indeed stored in the binaries.

weirdly, testing for Qs I don't get any match. So Dual EC or not?
strings your_binary | grep -C5 -i "c97445f45cdef9f0d3e05e1e585fc297235b82b5be8ff3efca67c59852018192\|8e722de3125bddb05580164bfe20b8b432216a62926c57502ceede31c47816edd1e89769124179d0b695106428815065\|1b9fa3e518d683c6b65763694ac8efbaec6fab44f2276171a42726507dd08add4c3b3f4c1ebc5b1222ddba077f72943b24c3edfa0f85fe24d0c8c01591f0be6f63"
Re-reading CVE-2015-7765:
The Dual_EC_DRBG 'Q' parameter was replaced with 9585320EEAF81044F20D55030A035B11BECE81C785E6C933E4A8A131F6578107 and the secondary ANSI X.9.31 PRNG was broken, allowing raw Dual_EC output to be exposed to the network. Please see this blog post for more information.
Diffing the vulnerable (and patched binaries. I see that only the P-256 curve \(Q\) was modified from Juniper's values, other curves were left intact. I guess this means that only the P-256 curve was being used in Dual EC.
If you know how Dual EC works (if you don't check my video), you know that to establish a backdoor into it you need to generate \(P\) and \(Q\) accordingly. So changing the value \(Q\) with no correlation to \(P\) is not going to work, worse it could be that \(Q\) is too "close" to P and thus the secret \(d\) linking them could be easily found ( \(P = dQ \)).
Now one clever way to generate a secure \(Q\) with a strong value \(d\) that only you would know is to choose a large and random \(d\) and calculate its inverse \(d^{-1} \pmod{ord_{E}} \). You have your \(Q\) and your \(d\)!
\[ d^{-1} P = Q \]
bonus: here's a small script that attempts to find \(d\) in the hypothesis \(d\) would be small (the fastest way to compute an elliptic curve discrete log is to use Pollard Rho's algorithm)
p256 = 115792089210356248762697446949407573530086143415290314195533631308867097853951
a256 = p256 - 3
b256 = 41058363725152142129326129780047268409114441015993725554835256314039467401291
## base point values
gx = 48439561293906451759052585252797914202762949526041747995844080717082404635286
gy = 36134250956749795798585127919587881956611106672985015071877198253568414405109
## order of the curve
qq = 115792089210356248762697446949407573529996955224135760342422259061068512044369
# init curve
FF = GF(p256)
EE = EllipticCurve([FF(a256), FF(b256)])
# define the base point
G = EE(FF(gx), FF(gy))
# P and Q
P = EE(FF(0x6b17d1f2e12c4247f8bce6e563a440f277037d812deb33a0f4a13945d898c296), FF(0x4fe342e2fe1a7f9b8ee7eb4a7c0f9e162bce33576b315ececbb6406837bf51f5))
# What is Q_y ?
fakeQ_x = FF(0x9585320EEAF81044F20D55030A035B11BECE81C785E6C933E4A8A131F6578107)
fakeQ = EE.lift_x(fakeQ_x)
print discrete_log(P, fakeQ, fakeQ.order(), operation='+')
The lift_x
function allows me to get back the \(y\) coordinate of the new \(Q\):
EE.lift_x(fakeQ_x)
(67629950588023933528541229604710117449302072530149437760903126201748084457735 : 36302909024827150197051335911751453929694646558289630356143881318153389358554 : 1)
Short blogpost on a quick way to analyze a TLS handshake:
In one terminal, setup the server:
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -nodes
openssl s_server -cert cert.pem -key key.pem
The first command use the req
toolkit of openssl. It is usually used to create certificate request (and the request is then later approved by a CA), but since we're in a rush, let's just use it with the -x509
option to directly generate a certificate. rsa:2048
generates a key with the algorithm RSA and with a modulus of size 2048 bits. -nodes
disable the use of a passphrase to protect the key (the default protect the key by encrypting it with DES).
In a second terminal, start capturing packets:
tcpdump -i lo0 -s 65535 -w exchange.cap
65535 is the maximum length of a packet.
Start a handshake in a third terminal:
openssl s_client -connect localhost:4433
Now open the .cap with Wireshark!
Heard a bit late about the factorable research results and how they used batch gcd to recover a bunch of servers' private keys.
The question one could think of is how to efficiently do a batch gcd on a big set of public keys?
From this utility:
- Actual pairwise GCD
This performs n*(n-1)/2 GCD operations on the moduli. This is slow. Don't use this.
- Accumulating Product
This iterates over all input moduli, performing a GCD of each one against the product of all previous.
Once it finds a candidate, it scans all previous moduli to find out which ones it shared a factor with
(either GCD or division, depending on whether one or both were found).
The main scan cannot be done in parallel, and even though it seems like this is O(n), the increasing size
of the accumulated product results it lots of long multiplication and long divison so it's still painfully
slow for large numbers of moduli.
Looks like the most efficient ways come from Dan Bernstein (again!), in a 7 pages paper
the Threat Model for BGP Path Security document lists, as RFCs usually do, relevant terms with their respective definitions. It can be a quick way to get an understanding of these abbreviations you often come across but never dare to google:
-
Autonomous System (AS): An AS is a set of one or more IP networks operated by a single administrative entity.
-
AS Number (ASN): An ASN is a 2- or 4-byte number issued by a registry to identify an AS in BGP.
-
Border Gateway Protocol (BGP): A path vector protocol used to convey "reachability" information among ASes in support of inter-domain routing.
-
False (Route) Origination: If a network operator originates a route for a prefix that the operator does not hold (and that has not been authorized to originate by the prefix holder), this is termed false route origination.
-
Internet Service Provider (ISP): An organization managing (and typically selling) Internet services to other organizations or individuals.
-
Internet Number Resources (INRs): IPv4 or IPv6 address space and ASNs.
-
Internet Registry: An organization that manages the allocation or distribution of INRs. This encompasses the Internet Assigned Number Authority (IANA), Regional Internet Registries (RIRs), National Internet Registries (NIRs), and Local Internet Registries (LIRs) (network operators).
-
Network Operator: An entity that manages an AS and thus emits (E)BGP updates, e.g., an ISP.
-
Network Operations Center (NOC): A network operator employs a set of equipment and a staff to manage a network, typically on a 24/7 basis. The equipment and staff are often referred to as the NOC for the network.
-
Prefix: A prefix is an IP address and a mask used to specify a set of addresses that are grouped together for purposes of routing.
-
Public Key Infrastructure (PKI): A PKI is a collection of hardware, software, people, policies, and procedures used to create, manage, distribute, store, and revoke digital certificates.
-
Relying Parties (RPs): An RP is an entity that makes use of signed products from a PKI, i.e., it relies on signed data that is verified using certificates and Certificate Revocation Lists (CRLs) from a PKI.
-
RPKI Repository System: The RPKI repository system consists of a distributed set of loosely synchronized databases.
-
Resource PKI (RPKI): A PKI operated by the entities that manage INRs and that issue X.509 certificates (and CRLs) that attest to the holdings of INRs.
-
RPKI Signed Object: An RPKI signed object is a data object encapsulated with Cryptographic Message Syntax (CMS) that complies with the format and semantics defined in [RFC6488].
-
Route: In the Internet, a route is a prefix and an associated sequence of ASNs that indicates a path via which traffic destined for the prefix can be directed. (The route includes the origin AS.)
- Route Leak: A route leak is said to occur when AS-A advertises routes that it has received from AS-B to the neighbors of AS-A, but AS-A is not viewed as a transit provider for the prefixes in the route.