A great FAQ, written by Marc Joye (Thomson R&D), on Whitebox Cryptography.
Thansk Tancrède for the link!
Q1: What is white-box cryptography?
A major issue when dealing with security programs is the protection of "sensitive" (secret, confidential or private) data embedded in the code. The usual solution consists in encrypting the data but the legitimate user needs to get access to the decryption key, which also needs to be protected. This is even more challenging in a software-only solution, running on a non-trusted host.
White-box cryptography is aimed at protecting secret keys from being disclosed in a software implementation. In such a context, it is assumed that the attacker (usually a "legitimate" user or malicious software) may also control the execution environment. This is in contrast with the more traditional security model where the attacker is only given a black-box access (i.e., inputs/outputs) to the cryptographic algorithm under consideration.
Q2: What is the difference with code obfuscation?
Related and complementary techniques for protecting software implementations but with different security goals include code obfuscation and software tamper-resistance. Code obfuscation is aimed at protecting against the reverse engineering of a (cryptographic) algorithm while
software tamper-resistance is aimed at protecting against modifications of the code.
All these techniques have however in common that the resulting implementation must remain directly executable.
Or as Francis Gabriel writes here
Code obfuscation means code protection. A piece of code which is obfuscated is modified in order to be harder to understand. As example, it is often used in DRM (Digital Rights Management) software to protect multimedia content by hiding secrets informations like algorithms and encryption keys.
And I just passed the last exam of this semester, which should be the last exam of my life =)
Now is time to take a few days to relax and eat nice food because it will soon be christmas ^^ (or holidays, as I heard some american say to avoid saying christmas).
A few interesting things I had to do during my exams these last few days:
- Simple Power Analysis (SPA). Guess what algorithm is used from smartcards' traces and calculate the exponent if it's a binary exponentiation
In the picture you can see two patterns, "1" is represented by two operations in the algorithm, and one of them is squaring which happens also when you have a "0" in your exponent's binary representation. So following the computations revealed by the power trace you can guess the binary representation of the exponent.
I had to read this article explaining two malloc implementations and their vulnerabilities. GNU Lib C (used in Linux) and System V AT&T (used in Solaris, IRIX). I knew the double chained list system but System V uses a different approach: binary tree and also a
realfree function that completes the
Airbus made a "private" challenge called « Trust the future » and accessible only by some selected schools (epitech, insa, and others). I wasn't invited to participate but there was a "crypto" challenge I thought was interesting. Since the challenge just finished I'm posting the write up.
Crypto challenge #1
We have 4 certificates and a
challenge1 file that seems to be a s/mime file of a pkcs#7 enveloped-data object.
2.4.3 EnvelopedData Content Type
This content type is used to apply privacy protection to a message. A
sender needs to have access to a public key for each intended message
recipient to use this service. This content type does not provide
3.2 The application/pkcs7-mime Type
The application/pkcs7-mime type is used to carry CMS objects of several types including envelopedData and signedData. The details of constructing these entities is described in subsequent sections. This section describes the general characteristics of the application/pkcs7-mime type.
We dump the info of each certificates in human readable format, openssl has commands for that (I think certtool does as well, but I'm on windows using cmder and openssl is the one included).
openssl x509 -in alice.crt -text -noout -out alice.crt.txt
We see that alice, bob and charly use the same rsa exponent (3).
If you're familiar with RSA (and it's highly probable you are if you read this blog) you can skip this section.
RSA is an asymmetric encryption scheme (also used as a signature). It works by generating a set of private key/public key, the private key is of course kept private and the public key is publicly disclosed. If someone wants to send us a private message he can encrypt it with our public key and we will be able to decrypt it with the private key. The public key is the pair of number (n, e) where n is called the modulus and e is called the exponent. If we want to encrypt a message m with the public key we "basically" do
c = m^e modulo n and send c. To decrypt it we use our private key d like this:
m = c^d modulo n.
The math behind this is that n is generated from two secret primes p and q (big enough)
n = p x q and
d = e^-1 modulo (p-1)(q-1),
phi(n) being the order of the multiplicative group
Z/nZ. The security comes from the fact that it's Computationally Hard to find the inverse of e if we don't know p and q. By the way, Heartbleed (a recent attack on openssl) led to finding one of the prime, thus the entire decomposition of n.
Textbook RSA vs real life RSA
This is all theory. And in practice we have to go through several steps to encrypt an ascii message, make sure it is of length lesser than the modulus, make sure the modulus is big enough, etc...
Textbook RSA is also deterministic and thus not semantically secure (see my previous post) + it is malleable: imagine you intercept c, and of course you know (n, e) (the public key). You could compute
c' = 2^e * c = 2^e * m^e = (2m)^e modulo n, this would correctly decrypt as
Thus, to counter those in practice, RSA Encrytion uses padding (usually OAEP) to make it probabilist and not malleable.
Let's go back to our challenge
We open our
Content-Disposition: attachment; filename="smime.p7m"
Content-Type: application/x-pkcs7-mime; smime-type=enveloped-data; name="smime.p7m"
To read that we need to extract the pkcs7 object and parse it. Openssl allows us to do this:
openssl smime -in challenge1 -pk7out -out challenge1.p7m
openssl asn1parse -text -in challenge1.p7m
We get an annoying dump of info to read. With three of those things:
95:d=6 hl=2 l= 16 prim: INTEGER :6384E2B2184BCBF58ECCF10CA7A6563C
113:d=5 hl=2 l= 13 cons: SEQUENCE
115:d=6 hl=2 l= 9 prim: OBJECT :rsaEncryption
126:d=6 hl=2 l= 0 prim: NULL
128:d=5 hl=4 l= 256 prim: OCTET STRING [HEX DUMP]:C1E2357C6FC53F1CC5E0E76EB1224BE8F24E8839251CF954A98090C4549F1BAFB7BCB1006DD2A982D56C1D2D3E6B422122D01F78DA0099776B789162E8CE94EE3D1E7B88AA8175DBB86F2F4AC65361BC949B3B90460741CAEE6F1ABDC5BD6C5296FBC8F2DAFF77F7110EA32D330D38DD2CA2FE13E785C86FE2210B58074C2DA5F440794BA023FC98B3D1E7DC979DBAC6672B5C19ABF4A91E21D5E474475BC09B78910D1F8E0290B38AE8D756E04D7F5EFBA64BFB5A0E96CD3DE1D82F609544A423F666D08B63262229687E1982BC8E424C7B5266B11A59036625F8E92C06740A3C9D8F3CE87FEB1F4444BC2039C8C6FF0AB9457D8AA63851ECF3C4AF1A2328FD
Which means the same message was sent to three recipients, identified by their serial number which we recognize as being our alice, bob and charly.
We also get this at the end:
1110:d=4 hl=2 l= 9 prim: OBJECT :pkcs7-data
1121:d=4 hl=2 l= 20 cons: SEQUENCE
1123:d=5 hl=2 l= 8 prim: OBJECT :des-ede3-cbc
1133:d=5 hl=2 l= 8 prim: OCTET STRING [HEX DUMP]:01D4CE3AF4D17ABB
Which means that the data sent (after this dump) is encrypted by 3DES version 3 (three different keys) in CBC mode with an IV
I like to put reminders like this so you don't have to switch to Wikipedia if you don't remember what are those letters.
DES (Data Encryption Standard) is the famous no-longer-used block cipher (because it was broken ages ago).
EDE3 short for the third version of the Triple DES block cipher (that is still considered secure today, it was a response to DES no longer being secure) which uses 3 different keys. Encrypting is done like this:
- we encrypt with key1
- then we decrypt with key2
- then we encrypt again with key3
E(k3, D(k2, E(k1, M)))
Hence the triple DES.
CBC is a mode of operation. A block cipher can only encrypt/decrypt blocks of a certain size (64bits with DES). If you want to do more (or less) you have to use a mode of operation (and a padding system).
Chinese Remainder Theorem
Here the interesting thing is that the same message was sent to three different recipients, encrypted with the same exponent (3). Let's write down the informations we have:
c1 = m^3 modulo n1
c2 = m^3 modulo n2
c3 = m^3 modulo n3
c1 being the encrypted message sent to Alice, n1 being Alice's modulus, and so on...
We have a system with one unknown: the message. The Chinese Remainder Theorem works in a similar fashion to Lagrange Interpolation (anecdote time: it is used in Shamir's Secret Sharing).
So that we have:
m^3 = c1 * n2 * n3 * ((n2 * n3)^-1 [n1]) +
c2 * n1 * n3 * ((n1 * n3)^-1 [n2]) +
c3 * n1 * n2 * ((n1 * n2)^-1 [n3])
modulo n1 * n2 * n3
A brief explanation:
`c1 = m^3 modulo n1, to place it in a formula modulo
n1 * n2 * n3 we have to cancel it when it's modulo n2 or modulo n3. How to make something congruent to zero when its modulo n2 or n3 ? Make it a multiple of n2 or n3. So we multiply c1 with n2 and n3. But then when it will be modulo n1 we will have the value
c1 * n2 * n3 which is not correct (
c1 = m^3 modulo n1 !). So let's cancel the n2 and n3 with their inverse modulo n1. We then have
c1 * n2 * n3 * ((n2 * n3)^-1 [n1]). We do this with all the equations to find the bigger equation. This is the Chinese Remainder Theorem. Simple no?
And this result is even more useful since we know that:
m < n1
m < n2
m < n3
m^3 < n1*n2*n3
Of course if m was greater than one of the modulus then it would decrypt incorrectly.
So what we have is:
m^3 = something modulo n1*n2*n3
m^3 = something
That's right, we can get rid of the modulo. We then do a normal cubic root and we find m.
Here's the quick python code I hacked together for this:
(by the way we can quickly get the modulus of each recipients with openssl:
openssl x509 -in alice.crt -modulus)
## 6384E2B2184BCBF58ECCF10CA7A6563C (Alice)
c1 = "C1E2357C6FC53F1CC5E0E76EB1224BE8F24E8839251CF954A98090C4549F1BAFB7BCB1006DD2A982D56C1D2D3E6B422122D01F78DA0099776B789162E8CE94EE3D1E7B88AA8175DBB86F2F4AC65361BC949B3B90460741CAEE6F1ABDC5BD6C5296FBC8F2DAFF77F7110EA32D330D38DD2CA2FE13E785C86FE2210B58074C2DA5F440794BA023FC98B3D1E7DC979DBAC6672B5C19ABF4A91E21D5E474475BC09B78910D1F8E0290B38AE8D756E04D7F5EFBA64BFB5A0E96CD3DE1D82F609544A423F666D08B63262229687E1982BC8E424C7B5266B11A59036625F8E92C06740A3C9D8F3CE87FEB1F4444BC2039C8C6FF0AB9457D8AA63851ECF3C4AF1A2328FD"
n1 = "EFBA9C442084759DC9770021B03C2E2913053E770779316F92C5DBFCAE4D3682E64006E38FA6A3AC24CC13AD2E747A50E5735064549F590294E36F2A1B23DB29567B49C007F8F8C224D3CD19B81D3F198C540291C135965E549881B775EEE29684F0E6CD4C2A017BE38F2E78E070D503BE9EE3EA2C491E53DE9C705FEB973918A168F275D90D055778289598BD2377D79ACC1BA493F570C5C8301913CEF12CD513321F8F320D8EC8172182D03F33721F02DFCE24463AE7A6CAB7C3A0CBB7D2AB149D347A2C9ABDB81BE4B60CAECBF31CF79C4BA0081FC00BB0939A950CBACA5B7B79FF92AF273B0D01A7E183FF30C90F27705D18F70EBB32281C5A873ED0A90D"
## 9F9D51BC70EF21CA5C14F307980A29D8 (Bob)
c2 = "27EB5A62A311DFAECF09318BEF7D60B98EA151AF09BDBF2A89A884617B8A8A14FF6F8045A8FD5D8956F5768C32A7E47AB17FA08D5F7D2EB590C4FC8296A1F70069C338CFA3C131A58FE05A75E36D7457ECC7B1BB403C1FF31FA66FB478B1F4548325B57961191AE1BFDDD7F5AF6FEC6FD94F66B1BC482337B579AD790466D1F33EDB09AA388085053D3C383F91A8EC40DB150365735B7E2D01F172E23717D31ABE0350FAC6730673C3C70EE593E8008A222DE40CF8A62615D119CD119FE8C30DA49E7A1D3596279B659E72C584D6D8262B1126B8C8BCDC09A31761EE746A14ADA7EC387AF2C52CBABDD8F443DF1F4C5E70C83B82A12F3BBCB21494689D13791F"
n2 = "C0028301D5645483592E2117463A804454BD1AE33B1CBEF33D9CDFF86EFD46CB24BA1984874DAE3A4A3D3609D9607276F91DFAD38E9ED6B6BCE15F29C49E1C7E0B532AE2A1343AF3F8064B4A093F23F981624D4E08F901787B761847B4F0963512EAC13B5D47DA4295FF614501A3D7D7EDA6B4E6197B974A70BF11FDC5D619D50974415E209DE76C68F190DBCA5BB6F1963C1E0E987BE105FF9082EAE003C2051A4C95E3299D3C1BC64D5F8E95D40C7B27EEEA5EBCD9B54CD6B5655B0AB9AEAA15E976AE37EA228A151D2AFD5417D4BCBC3BFB396E6B10AF561CBE0FD0374081B7034585C54096849FF82B79A117E44AE8FDB5B304BC0D9CA297C2F4A57FF91F"
## A6D4EF4DD38B1BB016D250C16A680470 (Charly)
c3 = "04991C5BA4882F329B03B18E2B317F4A54905ED4EB832B084A42AD700A0D3136A14BB57D61D4A1982E2CAB0FF773356759EE4AD77C1982E642CF574332AB32D109952FDE6221D77C35E4D0B69E559392DBE602E5336BD09239E85F21A70F4A824907AF75C9C372D4BE4C15E45431C35FE678E2646017D74186B3B084A41F217655A2ED262AA5C300BA737AB0DF270BD0B38A2FF215A3B5DB3CBB79350DDFEF1A08E40CB253B506D92002BBF4AD112AC1DDDB96CD4539A01035E76B1CC5C43427F46C83DBAA318387FE2C8C7FAA75FC0099050CF98671015A568CFFC56DFF6F8CB80A6A55B4CCB0D825AA9D99098DDA5D2EEC7D40D0BCCDA42D9E618A09AEC50A"
n3 = "CFA7854352FF9DF5E84AB10AB8F034F8106811D973BAB528BFCBD3DCBCFDF9FB5C398E23B58BA883F7C78F47C6694B4F042CDB8E54E856040F8A8A9ADBCA4C6D0813894C43352EB3EE19C1F76DF46DFD1B6BB38349BCE811036B0ED7ACFE2E5045FE4232F11DA3F113189A176964C206155342FD9E2E8AD11CBBCB85DFDF30E62AEA068F2DD7CEC6CF818D1E312BBA5FA6385461CA5ADCA0F95B6299FA366EEF8856416D72A42A93FD979E269D8FEA143870985FD353C85850FB4A11B6E4BA483CDC97F7E1717C34DF7D9E34DF83F67A9DA97ACA69926167D44C2CB3BB858EC041A244A6197D7F3B9AFD02A0562F13EACE6494F289184DAD16D2D995ED1ADC13"
## base16 -> base10
c1 = int(c1, 16)
c2 = int(c2, 16)
c3 = int(c3, 16)
n1 = int(n1, 16)
n2 = int(n2, 16)
n3 = int(n3, 16)
## extended euclide algorithm
Returns (gcd, x, y) where gcd is the greatest common divisor of a and b
with the sign of b if b is nonzero, and with the sign of a if b is 0.
The numbers x,y are such that gcd = ax+by."""
prevx, x = 1, 0; prevy, y = 0, 1
q, r = divmod(a,b)
x, prevx = prevx - q*x, x
y, prevy = prevy - q*y, y
a, b = b, r
return a, prevx, prevy
## chinese remainder formula
n2n3 = n2 * n3
n1n3 = n1 * n3
n1n2 = n1 * n2
n2n3_ = xgcd(n2n3, n1)
n1n3_ = xgcd(n1n3, n2)
n1n2_ = xgcd(n1n2, n3)
m3 = c1 * n2n3 * n2n3_ + c2 * n1n3 * n1n3_ + c3 * n1n2 * n1n2_
m3 = m3 % (n1n2 * n3)
from decimal import *
getcontext().prec = len(str(m3))
x = Decimal(m3)
power = Decimal(1)/Decimal(3)
answer = x**power
ranswer = answer.quantize(Decimal('1.'), rounding=ROUND_UP)
diff = x - ranswer**Decimal(3)
if diff == Decimal(0):
print("x is the cubic number of", ranswer)
print("x has a cubic root of ", answer)
- The xgcd function is included in sage but here I use Python so I included it in the code.
- We need to use the decimal package to calculate the cubic root because our number is too big.
We then get this big ass number that we convert to hexadecimal (
hex(number) in python). This yields:
We refer once more to the RFCs
8.1 Encryption-block formatting
A block type BT, a padding string PS, and the data D shall be
formatted into an octet string EB, the encryption block.
EB = 00 || BT || PS || 00 || D . (1)
The block type BT shall be a single octet indicating the structure of
the encryption block. For this version of the document it shall have
value 00, 01, or 02. For a private- key operation, the block type
shall be 00 or 01. For a public-key operation, it shall be 02.
The padding string PS shall consist of k-3-||D|| octets. For block
type 00, the octets shall have value 00; for block type 01, they
shall have value FF; and for block type 02, they shall be
pseudorandomly generated and nonzero. This makes the length of the
encryption block EB equal to k.
We have our 3DES key:
4f8957408f0ea202c785b95e206b3ba8da3dba7aea08dca1 to use.
Let's get the hexdump the end of the file (you can use commandline utilities like base64, hexdump, dd and xdd):
openssl smime -in challenge1 -pk7out > b64file`
base64 -d b64file > hexfile
hexdump -s 1135 hexfile
And finally decrypt our
encrypted file with openssl since it provides a command for that:
openssl des-ede3-cbc -d -iv 01D4CE3AF4D17ABB -K 4f8957408f0ea202c785b95e206b3ba8da3dba7aea08dca1 -in encrypted
Voila ! That was really fun :)
I was wondering why RSA was used in the SSL handshake, and why Diffie-Hellman was used instead in a Perfect Forward Secrecy scheme.
There is, however, an advantage of DH over RSA for generating ephemeral keys: producing a new DH key pair is extremely fast (provided that some "DH parameters", i.e. the group into which DH is computed, are reused, which does not entail extra risks, as far as we know). This is not a really strong issue for big servers, because a very busy SSL server could generate a new "ephemeral" RSA key pair every ten seconds for a very small fraction of his computing power, and keep it in RAM only, and for only ten seconds, which would be PFSish enough.
Some funny slides from Vitaly Shmatikov on side channels attacks: http://www.cs.utexas.edu/~shmat/courses/cs361s/sidechannels.pdf
So you can tell what someone is typing just by analyzing the sound of the fingers on the keyboard, from a certain distance.
If you observe someone typing at his computer from an outside window, you can analyze the reflections in many objects (glass teapots, plastic bottles, spoons!!! and even eyes).
Like we weren't worried enough.
I was trying to find some info about the Heap and malloc (for the level 14 of microcorruption) when I ran into some very good videos from the Infosec Institute. I cannot find the name of the speaker but damn he's so good I just lost 2 hours of my life just watching his videos about nmap, pentesting, metaspoilt, and so on...
Here is his video on Heap Overflow:
And his talks are on several different youtube channels. I don't know how legit this is, and if someone can find the name of that guy I would love to know more about him.
More about him: Advanced Recon, Advanced Exploitation, and so on...
I just turned in my cryptanalysis project: A Linear Cryptanalysis of A5/2 with Sage. Had to write a rapport in LaTeX.
Now I have to finish the challenges over at Microcorruption and produce a write up in LaTeX as well.
Well LaTeX is awful as a writing syntax. I'd rather focus on writing with John Gruber's excellent markdown syntax and then later convert it to LaTeX. And Pandoc does just that! It's magic. Now that I have all my
.md files I concat them with a quick python script
output = ""
for ii in range(1,15):
markdown = open(str(ii) + ".md", "r")
output += markdown.read()
output += "\n\n"
output_file = open("rapport.md", "w")
A bit of Pandoc magic and voilà ! I have a beautiful .tex
Now let's finish Microcorruption (or at least try :D it's getting pretty hard).
PS: I use Markdown for everything. This blog is written in Markdown and then converted to HTML. I'm also writing a book in Markdown. Well Markdown is awesome.
In my Software Security class that looks like a continuous game in assembly, we're now learning format string and heap overflow through Protostar a set of challenges on those attacks. It's a nice addition to crackmes and microcorruption.
I was asked during an interview how to build a system where Alice could send encrypted messages to Bob. And I was asked to think outloud.
So I imagined a system where both Alice and Bob would have a set of (public key, private key). I thought of RSA as they all use RSA but ECIES (Elliptic Curve Integrated Encryption Scheme) would be more secure for smaller keys. Although here ECIES is not a "pure" asymmetric encryption scheme and Elgamal with ECs might be better.
Once one wants to communicate he could send a "hello" request and a handshake protocol could take place (to generate a symmetric encryption key (called a session key in this case)).
I imagined that two session keys would be generated by each end. Different set of keys depending on the direction. One for encrypting the messages and one for making the MAC (that would be then appended to the encrypted message. So we EtM (Encrypt-then-Mac)).
Then those keys would be encrypted with the public signature of the other one and sent over the wire like this. And Let's add a signature so we can get authentication and they also won't get tampered. Let's use ECDSA (Elliptic Curve Digital Signature Algorithm) for that.
Although I'm wondering if two symmetric keys for encrypting messages according to the direction is really useful.
I was then asked to think about renewal of keys. Well, the public keys of Alice and Bob are long term keys so I didn't bother with that. About the symmetric keys? What about their TTL (Time To Live)?
My interviewer was nice enough to give me some clues: "It depends on the quantity of messages you encrypt in that time also."
So I thought. Why not using AES in CTR mode. So that after a certain number of iteration we would just regenerate the symmetric keys.
I was then asked to think about Forward Secrecy.
Well I knew the problem it was solving but didn't know it was solving it.
Mark Stamp (a professor of cryptography in California (how many awesome cryptography professors are in California? Seriously?)) kindly uploaded this video of one of his class on "Perfect Forward Secrecy".
So here the trick would be to do an Ephemeral Diffie-Hellman to use a different session key everytime we send something.
This EDH would of course be encrypted as well by our public key system so the attacker would first need to get the private keys to see that EDH.
How can you tell if a cipher is secure?
I was asked that question during an interview a while ago. Back then it troubled me because it seemed so basic and yet and I had no idea how to answer it. I became vivid and didn't know what to say and later I didn't get the job. I thought it would be interesting to write down how I would answer this question now.
We're in the dark
If we have no idea how the algorithm works and can only do some playing with it (or worse, we can only listen to people using it) then we can safely call it a Black box. This is pretty hard to attack and also pretty hard to talk about security here because it might be not secure at all (like Enigma during the war) but we would still have a hard time telling.
We're almost in the dark
If I talk about Blackboxes then I should talk about Whiteboxes. If you have no idea what your cipher is doing, but you can analyze it, then you call that a Whitebox. For example your phone, your credit card, and so on, they all use cryptographic tools that you can take a look at since it is in your own hands. Cryptography was not thought for that kind of model. It was primarily born to hide private conversations from other curious parties and ciphers were supposed to be used by good people only. In our modern world, clever people found a new use for cryptography in our everyday-devices. And it quickly became problematic since a lot of new attacks rose up.
It doesn't mean that attacking a whitebox is easy though. People use reverse engineering, heuristics, side-channel attacks (fault injection, power analysis, sound analysis...) and a lot of other techniques to break them. There are a lot of researches, especially in the domain of smartcards.
We're in the clear
In cryptography, Kerckhoffs's principle was stated by Auguste Kerckhoffs in the 19th century: A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.
OK. Now let's talk about a cipher we can see and touch and play with anyway we want.
We will not talk about the implementation of the cipher because it will have the same problems the whitebox have: it will be subject to new attacks depending where and how you implement it. It is a whole new level of research and precautions you have to take. Think about side-channel attacks, bugs (buffer overflow), human errors, padding attacks (if we chose a bad mode of operation for example), etc...
So we'll place ourselves in a more restrictive model.
There is also a lot of different kind of ciphers (and it would be an history class to write about it) so I'll just talk about the most used type of ciphers nowadays: Stream ciphers and Block ciphers (both symmetric ciphers) and Asymmetric ciphers
Although let's just say that it was proven by Shannon in the 40s that the One Time Pad is the only cipher having perfect secrecy.
What kind of secrecy?
There are multiple kinds of secrecy in the literature. Perfect Secrecy that we just talked about. The kind of secrecy that doesn't leak any information about the plaintext and the key (although it might leak the maximum length of it). And if you read more about it you will see that it is not practical thus almost never used.
So we defined a weaker type of secrecy: Semantic secrecy.
There are multiple definitions but they are equivalent to IND-CPA or IND-CCA or IND-CCA2 depending on what you want from your cipher.
- IND here means Indistinguishable
- CPA here means "under Chosen Plaintext Attack".
- CCA means "under Chosen Ciphertext Attack" and it is a stronger definition of secrecy.
- CCA2 means "under Adaptive Chosen Ciphertext Attack"
Note that Non-Malleability has nothing to do with secrecy. For example the One Time Pad is perfectly secure, yet it is Malleable. Although you can prove that a cipher is non-malleable under chosen ciphertext attack and it would be the same thing as ind-cca. Also, some kind of ciphers are malleable on purpose, see Homomorphic encryption.
There are also talks about Provable secrecy where we reduce the whole cipher/cryptosystem to the solidity of a problem difficult to compute. It's done more for Asymmetric encryption that generally relies on math more than symmetric encryption.
For example RSA is provably secure because its entire security relies on the Factoring Problem, which states that it is increasingly very difficult to compute
n = p * q n a large number and p, q primes.
A good cipher
So if we want to prove that our cipher is secure. We would have to prove that an Adversary would have no advantage in a guessing game where he would send us two plaintexts and we would send him back one of the encrypted plaintext (chosen at random) expecting him to guess which one it is (see image above).
This is difficult to prove. For asymmetric encryption we'd rather reduce that to other assumptions. For symmetric encryption we would have to make sure its encrypted ciphertexts look random every time. Like an ideal cipher.
for Stream Ciphers the randomness of the ciphertexts depends a lot on the randomness of the Pseudo Random Number Generator you are building it with (PRNG).
for Block Ciphers there are two things that are important: Confusion and Diffusion. Confusion states that a small change in the key has to change the whole ciphertext, Diffusion states that a small change in the plaintext has to change the whole ciphertext. In AES for example, Confusion is done by using a Key Derivation Function to create several subkeys and XOR them to the internal state on multiple rounds. Diffusion is done during the different rounds with a mix of Permutations and Substitution (that's why we call AES a substitution-permutation network).
A cryptanalyst is the black beast of cryptographers. His job is to attack our lovely ciphers. But this is useful in knowing how secure is a cipher. AES has been considered a solid cipher because it is build on solid principles like the avalanche principle (confusion and diffusion) but not only because of that. Because it has resisted to known attacks since it has been born. A good cipher should resist multiple years of attacks. A good cipher should withstand the efforts of cryptanalyst in time.
What does a cryptanalyst do to break a cipher?
The first good answer is bruteforce or exhaustive search. If we can simply bruteforce a cipher then it is obviously not good. When you do this you have to consider the birthday paradox and time-memory trade off attacks. tl;dr: it doesn't take as long as you think to bruteforce something because of probabilities.
And here is why AES is considered a solid cipher again. There are no known attacks better than bruteforcing.
For cryptographers, a cryptographic "break" is anything faster than a brute force
It is known (thanks to Shannon in 1949) that in a known plaintext attack we can build an equation linking the plaintext with the ciphertext and where the unknowns are the bits of the key. Here the algebraic/linear attack would be to solve the system we now have. If it is more complicated than that, we often talk about second order attack, third order attack, etc...
In a Differential cryptanalysis what we do is we try to notice a correlation in the differences between the internal state of several messages getting encrypted. Often to find a subkey and then recover the key Total Break.
So the answer could be summed up like this: After researching how to make a good cipher (reducing it to a known hard math problem if it's an asymmetric cipher; making sure of it having nice criterion (confusion & diffusion) in the case of a block cipher; making sure of the non-correlation of its PNRG and the randomness of its PNRG if it's a stream cipher), you would try to break it with known attacks. Then you would enlarge the model you are working with to see if your cipher is still as secure, and if not how you would modify it or use it to make it secure. For example with textbook RSA you can forge your own packets from it (if you have
c = m^e you can create
2^e * c which would successfully decrypt as
2*m) and its deterministic properties (that makes it not semantically secure) leak information (
(c/n) = (m^e/p) (m^e/q) = (m/p)(m/q) = (m/n) with
(a/b) being the symbol of Legendre/Jacobi here). Without a padding system like OAEP RSA leaks the Jacobi Symbol of m.
I'm still a student. Those are information I've gathered and this is my understanding of it.
I've stumbled on Dan Boneh Number Theory Cheat sheets. Number 1 and Number 2. Quick to read, I'm going to print them and display them somewhere on my walls :)
I also ran into the homepage of Vitaly Shmatikov. He uploaded a lot of slides, presentations and resources on a lot of different courses related to security and cryptography. He also lists a lot of interesting papers. I want to read everything but right now I have to focus on my exams (and interviews for my internship...).
EDIT: Oh but one last link. Orange Labs publications. There are some interesting papers in there too. I'm mostly writing this post to bookmark all those great links somewhere.
EDIT2: How to do a litterature search. That might be useful.
EDIT3: I have also returned to the Rss readers that I had banished from my life something like 7 years ago. I have a tendency to get addicted to things pretty quickly and back then I had subscribed to way too many feeds (I think one post would pop every 2 minutes) and I was constantly reading something. But I figured, what if I filled it with all those blogs on cryptography/security. That would be working and not slacking. So that's what I did. I'm using Digg on desktop and Feedly on my cellphone. And of course I'll be posting here the articles I find interesting :)
In one of my class the teacher advise against
I wondered why. I remembered reading some articles about random vs urandom and urandom being better, but that was years ago and my memory is not fresh. Wikipedia does advise against it, as does the manpage if you want to generate a long term key.
I also stumbled on one of Thomas Pornin's answer on the security SO also pointing to a blogpost from Thomas Hühn
Fact: /dev/urandom is the preferred source of cryptographic randomness on UNIX-like systems.
I wrote about Differential Power Analysis (DPA) but haven't said that there were way more efficient attacks (although that might be more costy to setup). Differential Fault Analysis is a kind of differential cryptanalysis: you analyse the difference between blocks of the internal state and try to extract a subkey or a key. Here we do a fault injection on the internal state of the smartcard during an encryption operation (usually with lasers (photons have the property of igniting a curant in a circuit), or by quickly changing the temperature).
The attack presented in http://eprint.iacr.org/2010/440.pdf and https://eprint.iacr.org/2003/010.pdf is targeting the last subkey.
We inject a fault on 1 byte of AES (in the picture we consider the internal state of AES to be a 4x4 matrix of bytes) at a particular spot (before the last round) and we see that at one point it creates a diagonal of errors. We can XOR the internal state without fault with the faulty one to display only the propagation of the fault.
Here, by doing an hypothesis on keys and seeing how the Addkey operation is modifying this difference we can compute the last subkey.
On AES-128, it is sufficient to know K10 to find the cipher key, but on AES-256, you must know K13 and K14
Although this is only my understanding of the DFA. It also seems to be easier to produce on RSA (and it was originally found by Shamir on RSA).
I'm studying the internals of hash functions and MACs right now. One-way Compression Functions, Sponge functions, CBC-MAC and... the Merkle–Damgård construction. Trying to find a youtube video about it I run into... The Cryptography course of Dan Boneh I already took 3 years ago. I have a feeling I will forever return to that course during my career as a cryptographer.
The whole playlist is here on youtube and since his course is awesome I just watched again the whole part about MACs. And I thought I should post this explanation of the birthday paradox since as he says:
Everybody should see a proof of the birthday paradox at least once in their life
Something that always bugged me though is that he says the formula for the birthday is
1.2 sqrt(365) whereas it should be square root of 366 since there are indeed 366 different birthdays possible.
This morning I had a course on Return Oriented Programming given by Jonathan Salwan, a classmate of mine also famous inventor of RopGadget.
The slides are here.
A lot of interesting things there. Apparently it's still kind of impossible to completely protect your C code against that kind of attack. Even with all the ASLR, PIE, NX bit and other protections... There is also an awesome lecture about ROP on Coursera I linked to in the previous post here.
Basically, since you can't execute code in the stack, and since the addresses of libraries are randomized because of ASLR, you can find bits of codes ending with a return (called gadgets) and chain them since you control the stack (thus the saved EIPs). What I learned by doing was that it gets complicated if it's 64bits (since a lot of address will have a lot of 0x00 and you can't point to those doing a buffer overflow through a strcpy or something similar) and you won't get a lot of those gadgets if you have dynamically loaded libraries. Static libraries are loaded in the .text section (which is executable of course), so that's all good. Also a good way to store strings of data are in the .data section since it is untouched by the randomization contrarily to the stack.
A lot of researches is done on the subject and new tools like RopGadget are coming, using an old concept (but still actively researched): the SAT solvers. There seems to be a problem though, those SAT solvers yield a set of gadgets to be used for some action you want to accomplish with your shellcode, but you have to do the work of putting them in the right order.
This is what I took from that talk, you can question the guy if that interests you!