david wong

Hey! I'm David, cofounder of zkSecurity and the author of the Real-World Cryptography book. I was previously a crypto architect at O(1) Labs (working on the Mina cryptocurrency), before that I was the security lead for Diem (formerly Libra) at Novi (Facebook), and a security consultant for the Cryptography Services of NCC Group. This is my blog about cryptography and security and other related topics that I find interesting.

What are x509 certificates? RFC? ASN.1? DER? posted April 2015

RFC

So, RFC means Request For Comments and they are a bunch of text files that describe different protocols. If you want to understand how SSL, TLS (the new SSL) and x509 certificates (the certificates used for SSL and TLS) all work, for example you want to code your own OpenSSL, then you will have to read the corresponding RFC for TLS: rfc5280 for x509 certificates and rfc5246 for the last version of TLS (1.2).

rfc ex

x509

x509 is the name for certificates which are defined for:

informal internet electronic mail, IPsec, and WWW applications

There used to be a version 1, and then a version 2. But now we use the version 3. Reading the corresponding RFC you will be able to read such structures:

Certificate  ::=  SEQUENCE  {
    tbsCertificate       TBSCertificate,
    signatureAlgorithm   AlgorithmIdentifier,
    signatureValue       BIT STRING  }

those are ASN.1 structures. This is actually what a certificate should look like, it's a SEQUENCE of objects.

  • The first object contains everything of interest that will be signed, that's why we call it a To Be Signed Certificate
  • The second object contains the type of signature the CA used to sign this certificate (ex: sha256)
  • The last object is not an object, its just some bits that correspond to the signature of the TBSCertificate after it has been encoded with DER

ASN.1

It looks small, but each object has some depth to it.

The TBSCertificate is the biggest one, containing a bunch of information about the client, the CA, the publickey of the client, etc...

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,
    serialNumber         CertificateSerialNumber,
    signature            AlgorithmIdentifier,
    issuer               Name,
    validity             Validity,
    subject              Name,
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
                         -- If present, version MUST be v2 or v3
                          subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
                      -- If present, version MUST be v2 or v3
     extensions      [3]  EXPLICIT Extensions OPTIONAL
                      -- If present, version MUST be v3
}

DER

A certificate is of course not sent like this. We use DER to encode this in a binary format.

Every fieldname is ignored, meaning that if we don't know how the certificate was formed, it will be impossible for us to understand what each value means.

Every value is encoded as a TLV triplet: [TAG, LENGTH, VALUE]

For example you can check the GITHUB certificate here

github cert

On the right is the hexdump of the DER encoded certificate, on the left is its translation in ASN.1 format.

As you can see, without the RFC near by we don't really know what each value corresponds to. For completeness here's the same certificate parsed by openssl x509 command tool:

x509 openssl parsed

How to read the DER encoded certificate

So go back and check the hexdump of the GITHUB certificate, here is the beginning:

30 82 05 E0 30 82 04 C8 A0 03 02 01 02

As we saw in the RFC for x509 certificates, we start with a SEQUENCE.

Certificate  ::=  SEQUENCE  {

Microsoft made a documentation that explains pretty well how each ASN.1 TAG is encoded in DER, here's the page on SEQUENCE

30 82 05 E0

So 30 means SEQUENCE. Since we have a huge sequence (more than 127 bytes) we can't code the length on the one byte that follows:

If it is more than 127 bytes, bit 7 of the Length field is set to 1 and bits 6 through 0 specify the number of additional bytes used to identify the content length.

(in their documentation the least significant bit on the far right is bit zero)

So the following byte 82, converted in binary: 1000 0010, tells us that the length of the SEQUENCE will be written in the following 2 bytes 05 E0 (1504 bytes)

We can keep reading:

30 82 04 C8 A0 03 02 01 02

Another Sequence embedded in the first one, the TBSCertificate SEQUENCE

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,

The first value should be the version of the certificate:

A0 03

Now this is a different kind of TAG, there are 4 classes of TAGs in ASN.1: UNIVERSAL, APPICATION, PRIVATE, and context-specific. Most of what we use are UNIVERSAL tags, they can be understood by any application that knows ASN.1. The A0 is the [0] (and the following 03 is the length). [0] is a context specific TAG and is used as an index when you have a series of object. The github certificate is a good example of this, because you can see that the next index used is [3] the extensions object:

TBSCertificate  ::=  SEQUENCE  {
    version         [0]  EXPLICIT Version DEFAULT v1,
    serialNumber         CertificateSerialNumber,
    signature            AlgorithmIdentifier,
    issuer               Name,
    validity             Validity,
    subject              Name,
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
                         -- If present, version MUST be v2 or v3
                          subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
                      -- If present, version MUST be v2 or v3
     extensions      [3]  EXPLICIT Extensions OPTIONAL
                      -- If present, version MUST be v3
}

Since those obects are all optionals, skipping some without properly indexing them would have caused trouble parsing the certificate.

Following next is:

02 01 02

Here's how it reads:

  _______ tag:      integer
 |   ____ length: 1 byte
 |  |   _ value:  2
 |  |  |
 |  |  |
 v  v  v
02 01 02 

The rest is pretty straight forward except for IOD: Object Identifier.

Object Identifiers

They are basically strings of integers that reads from left to right like a tree.

So in our Github's cert example, we can see the first IOD is 1.2.840.113549.1.1.11 and it is supposed to represent the signature algorithm.

So go to http://www.alvestrand.no/objectid/top.html and click on 1, and then 1.2, and then 1.2.840, etc... until you get down to the latest branch of our tree and you will end up on sha256WithRSAEncryption.

Here's a more detailed explanation on IOD and here's the microsoft doc on how to encode IOD in DER.

Well done! You've reached the end of my post. Now you can leave a comment or read something else.

Comments

Jide Akinyemi

Thanks

Mamoon Ahmed

David, This is the best article on the internet to further explain the missing concepts of ASN.1 encoding. You saved alot of my time. Thank you so much and keep up the good work bro ....

Mikaz

This was useful for me today. Good job !

Nagarjuna

You saved alot of my time. Thank you so much and keep up the good work bro

You Qi

Read

Traian

Thanks, nice description and referenced resources

Hanif

awesome explanation.

ananda

This is very helpful. You made it easy to understand. Thanks David.

Tai Sun

Great explanation! Thanks!

Geric

Short and concise! Great Article!

Tsun

Thanks!

lxjsailor

?????
thanks?

me

cool

electron

Thanks!

Binh Thanh Nguyen

Thanks, nice post

Irad

Hi, I wish to parse some non x509 DER structure and be able to extract fields values out of it. I'm working under macOS platform, and unfortunately I can only use native code (no openssl).

So I turned to the notorious SecAsn1Decode and realised that it also expect to have a template of the DER format (SecAsn1Template).. so I pretty much need to have the formatted layout before I want to decode an instance formatted in this way.

Conceptually, I'm not sure I understand, because the DER format explain the format by itself. I've tested my assumption using asn.1 online decoder and it indeed revealed the format by itself

you can look at the following example :
https://lapo.it/asn1js/#MIGOMQswCQYDVQQGEwJJTDEPMA0GA1UECAwGaXNyYWVsMQwwCgYDVQQHDANUTFYxCzAJBgNVBAoMAlRTMR4wHAYDVQQLDBVDQV9jZXJ0aWZpY2F0ZV9zZXJ2ZXIxGzAZBgNVBAMMEnpvaGFyc19NYWNCb29rX1BybzEWMBQGCSqGSIb3DQEJARYHekB6LmNvbQ

So, perhaps you can thing of a good reason why is the template is needed ?

Thanks !
Irad

david

@Irad: simple!

without a template, all you'll see is something like "here is an integer, 5, here is a bytestring, [5, 4], etc."

with a template, you'll see something like "the version is 5, the date is [5, 4], etc."

Etiksan

Thank you for this article this is a very nice article about cybersecurity which is helpful for me.

Yogesh Shahane

Very good article that explains concisely the relation between X.509 and ASN.1
It also has very good references to other good links. Overall seems like a very good website. I will be coming back frequently!!
Thanks again. Keep up this great work.

Celia Zou

It's an excellent article for Certificate Decoding. Thanks~

leave a comment...