I spent a year of my life making an ASN.1 compiler in D

Published: 2025/10/23
| Updated: 2025/10/23

… and it’s still nowhere near complete.

In this post I’ll just rambling about ASN.1; parts of the compiler implementation, and some of the tool’s output rather than the tool itself as its still too WIP to really advertise on its own yet.

This post is unstructured, so you can just pick somewhere random and start reading from there with no/minimal context lost.

Note: the name of the tool is dasn1.

Summary

Motivation

I’m currently writing Juptune – a toy async I/O framework that attempts to implement as much of its stack as possible in pure D.

I’m really interested in writing an implementation of TLS, which means I need to be able to handle x.509 certificates (i.e. TLS/SSL certs), which means I need to be able to handle their underlying data encoding: ASN.1’s DER encoding.

So basically I just wanted to do this for fun at the end of the day, nothing much deeper than that. I’ve never written or worked on a proper compiler project before that wasn’t toy-sized so I saw a ton of growth potential… the main thing that’s grown however is the mental scar ASN.1’s left on me.

I’ve succesfully generated code that can parse a couple of x.509 certificates I’ve thrown at it, and I’ve started work on an almost-D-native (excluding crypto primitives) implementation of TLS 1.3.

I’m constantly amazed about how much of modern life relies on these ancient, overly complicated specs from the 90s. ASN.1 is used everywhere in some form or another and yet I bet you’ve never even heard of it before, just have a look on wikipedia.

Very briefly – what is ASN.1?

ASN.1 is the result of a bunch of graybeards from the late 80s+ trying to design an overengineered data specification language. In other words, it’s protobuf on steroids.

There’s two parts of ASN.1: There’s the ASN.1 notation (defined by x.680, x.681, x.682, and x.683), and then there’s the various encodings (BER, CER, DER, PER, XER, JER…). In this post I’ll mainly be focusing on the notation + DER.

Similarly to protobuf you use the notation to define a structured way to represent data, and then use tooling that can generate encoders/decoders for a specific encoding, in a specific programming language.

Here’s a choice snippet of the ASN.1 notation for RFC 5280 (which defines what’s commonly known as TLS certificates):

-- Modules are strongly versioned - something I'll talk about later!
    iso(1) identified-organization(3) dod(6) internet(1)
    security(5) mechanisms(5) pkix(7) id-mod(0) id-pkix1-implicit(19)
} DEFINITIONS IMPLICIT TAGS ::=
-- Aliases to "built-in" types.
KeyIdentifier ::= OCTET STRING
KeyUsage ::= BIT STRING {
     nonRepudiation          (1),  -- recent editions of X.509 have
                                -- renamed this bit to contentCommitment
-- A struct/class equivalent
PolicyInformation ::= SEQUENCE {
     policyIdentifier   CertPolicyId,
     -- Types can be given constraints - something I'll also talk about later!
     policyQualifiers   SEQUENCE (SIZE (1..MAX)) OF PolicyQualifierInfo OPTIONAL
-- A pretty clean way of defining unique identifiers
id-pkix  OBJECT IDENTIFIER  ::=
         { iso(1) identified-organization(3) dod(6) internet(1)
                    security(5) mechanisms(5) pkix(7) }
id-kp OBJECT IDENTIFIER ::= { id-pkix 3 }
        -- arc for extended key purpose OIDS
id-kp-serverAuth             OBJECT IDENTIFIER ::= { id-kp 1 }
id-kp-clientAuth             OBJECT IDENTIFIER ::= { id-kp 2 }
id-kp-codeSigning            OBJECT IDENTIFIER ::= { id-kp 3 }
id-kp-emailProtection        OBJECT IDENTIFIER ::= { id-kp 4 }
id-kp-timeStamping           OBJECT IDENTIFIER ::= { id-kp 8 }
id-kp-OCSPSigning            OBJECT IDENTIFIER ::= { id-kp 9 }

Encoding wise here’s a quick of some of the more well known ones:

BER – Basic binary encoding rules. Uses a Tag-Length-Value (TLV) format that supports functionally infinite lengths of data.
CER – A limited subset of BER where each value can only have one possible encoding. It uses some odd design choices (such as always using the infinite-length forms of encoded data), so no one really uses it.
DER – A limited subset of BER where each value can only have one possible encoding, but with a more sane choice of decisions made compared to CER. This is used extensively for cryptographic purposes.
PER – A binary encoding that uses contraint information to encode data in the absolute minimum amount of bits possible. There’s also like 4 variations of this one btw.
OER – Similar to PER except it keeps the bytes of values separate, whereas PER can pack the bits of different values together tightly.
XER – An XML based encoding. The ASN.1 notation grammar actually has XML-specific parts to it just for this encoding (because of course it does).
JER – A JSON based encoding.

Did I ever mention that ASN.1 is complicated? On the one hand the sheer amount of possible encodings is daunting, but on the other hand it shows a certain flexbility that ASN.1 provides – you could even invent your own domain-specific encoding if needed.

ASN.1’s notation can be really complex

Loosely speaking you can define ASN.1’s notation as being the “base” notation defined in x.680, with the sometimes-optional addon specifications defined in x.681, x.682, x.683.

These specifications are also written in academicese so for mere uneducated mortals such as myself, simply trying to read and understand what the specifications are saying in the first place is already a large hurdle. I think I’ve started to get the hang of it though.

Fortunately for my use case of handling x.509 certificates, there’s no hard requirement for anything beyond x.680 and so x.680 is the only spec I’ve attempted to implement so far (outside of x.690 which describes how BER/CER/DER works – which is actually a joy to read compared to the x.68x specs).

x.680 isn’t the worst thing in the world to implement, it’s just the fact that there’s a lot more to it than you’d think from a quick glance at a code example, as well as some relatively annoying “transformation” (semantic) rules you have to acccount for.

Generally though I’d say the really difficult parts seem to come from its extensions.

x.680 woes – historical deprecations

One of the more annoying parts of implementing a parser for ASN.1’s notation is that x.680 has been revised several times over the years, which includes the deprecation + removal of certain features.

And so some other specifications you read through will either:

Use older forms of syntax that are no longer recommended/supported.
Replace the older forms of syntax with newer variants which can be much more complicated to implement.

Meaning that if you want to write a compiler for ASN.1 for a specific use case, but want it to also be an implementation of the more modern specs… then you’ll have to partially implement/hack around some of the older stuff that’s no longer defined in the up to date spec documentation.

An example would be the ANY DEFINED BY syntax, which I have a separate section on.

x.681

This is essentially the academic equivalent of an Elder Scroll – you will go insane attempting to read let alone mentally parse this damn thing.

x.681 describes the Information Class Object system. I’d love to talk to you more about it more in depth but I haven’t put in enough effort to confidently state much about how it works.

One of the few parts I sort of understand and can talk about is that x.681 has a really cool feature where Information Classes can be given a custom initialisation syntax:

-- Given this information class
    CALLED &name [WHO IS &age YEARS OLD]
-- Can be initialised as either one of these
bradley1 PERSON ::= { CALLED "Bradley" }
bradley2 PERSON ::= { CALLED "Bradley" WHO IS 26 YEARS OLD }

I’d absolutely love to attempt to implement x.681 for the challenge of this feature alone, however I only have so much energy (and sanity), so it’ll likely be a while until I even properly consider it.

x.682

x.682 describes the Table Constraint feature. I’m going to be honest I don’t understand a single thing about this feature – I took one look at the specification and was like “absolutely not”.

x.683

x.683 describes the ability to create templated (sorry, “parameterised”) types. Similar to the other ASN.1 extensions I haven’t looked much into this feature, but it appears to be a lot simpler to implement than the others.

In essence, one of the things you can do is this:

MyTemplatedThingy{ValueT} ::= SEQUENCE {
MyStringThingy ::= MyTemplatedThingy{UTF8String}

It supports values as well as types within its template parameters (similarly to D!) so there’s a few cool things you can do with it I guess.

ASN.1’s notation is also pretty cool!

Despite the many, many, many pains of this god forsaken technology, it’s actually really interesting and powerful at the same time.

ASN.1’s constraint system

ASN.1’s notation contains a pretty neat feature where you can add special constraints onto types + fields. So rather than having a stray “ProtcolPacket.field1.field2.xyz MUST be between 0 and 2” that’s super easy to miss, you can instead describe this constraint within ASN.1 itself which (good) tooling will then take into account for you.

Here’s some examples of the simpler constraints available:

UInt8 ::= INTEGER (0..255) -- Constrain to a specific range of values
LegacyFlag ::= INTEGER (0) -- Constrain to a single value
LegacyFlags ::= INTEGER (0 | 2 | 4 | 8) -- You can combine constraints via the UNION (shorthand '|') operator.
LegacyFlags2 ::= INTEGER (0 | 2 ^ 4..8) -- You can also use the INTERSECTION (shorthand '^') operator to specify alternatively valid constraints.
-- You can limit the size of some types
Password ::= UTF8String (SIZE (8..32)) -- Must be between 8 and 32 chars.
NumberList ::= SET SIZE (2..MAX) OF INTEGER -- Must have at least 2 elements, but is otherwise unbounded.

There’s a few more constraints available but… they’re mostly pretty complex ones that I don’t want to have to think about.

It’s really cool to see that ASN.1 has a feature like this though, considering the only other langauge I’ve personally encountered that has a similar feature is Ada.

ASN.1’s versioning system

ASN.1 generally uses the OBJECT IDENTIFIER type in order to, well, identify specific things, e.g. extensions found within x.509 certificates.

OBJECT IDENTIFIERs are also used to provide versions to modules, for example:

    iso(1) identified-organization(3) dod(6) internet(1)
    security(5) mechanisms(5) pkix(7) id-mod(0) id-pkix1-implicit(19)
} DEFINITIONS IMPLICIT TAGS ::= BEGIN -- .. -- END

Everything between the curly brackets is an OBJECT IDENTIFIER for this exact module – technically no other ASN.1 module in existance should ever use this specific OBJECT IDENTIFIER. The optional labels (e.g. iso) have no meaning beyond aiding human comprehension, it’s the values (e.g. (0)) that are actually used to create the identifier.

As a great example of this versioning system, it just so happens that this specific module has a more modern version that has this specific OBJECT IDENTIFIER instead:

    {iso(1) identified-organization(3) dod(6) internet(1) security(5)
    mechanisms(5) pkix(7) id-mod(0) id-mod-pkix1-implicit-02(59)}
DEFINITIONS IMPLICIT TAGS ::= BEGIN -- .. -- END

This updated version doesn’t change how data is encoded to/from DER but instead it simply uses more modern syntax and features.

This is important because older specifications will be using PKIX1Implicit88 whereas newer ones will likely be using PKIX1Implicit-2009 instead, and so there needs to be a more clear-cut way to distinguish between these two versions of the PKIX1Implicit module other than going by its name – and this is where OBJECT IDENTIFIERS come in handy.

When importing modules within ASN.1 notation you can (and should) specify an OBJECT IDENTIFIER as well:

-- There's 0 room for ambiguity or naming clashes when OBJECT IDENTIFIERs come into play
    id-pkix FROM PKIX1Implicit88 {
        iso(1) identified-organization(3) dod(6) internet(1)
        security(5) mechanisms(5) pkix(7) id-mod(0) id-pkix1-implicit(19)
    SignatureAlgs FROM PKIX1Implicit-2009 {
        iso(1) identified-organization(3) dod(6) internet(1) security(5)
        mechanisms(5) pkix(7) id-mod(0) id-mod-pkix1-implicit-02(59)

Maybe I’m just a nerd, but I find this to almost be a thing of beauty with how simple yet effective it is.

D is easy to generate code for

D has several quality of life features that makes it surprisingly easy to generate code for – features that would definitely make the compiler more annoying to work with when targeting other languages.

These features on their own aren’t exactly rare to see, but the specific combination is what makes everything work together so well.

Static imports & fully qualified names

static import in D means “import this module, but ONLY allow it to be used via its fully qualified name”:

std.stdio.writeln("Hello, world!");
writeln("Hello, world!");

You can even override the module name, as strange as that sounds!

static import io = std.stdio;
io.writeln("Hello, world!");

This feature is a godsend for preserving the original names of ASN.1 types. For example, Juptune provides an error type called Result which comes from the juptune.core.util.result module.

Without static imports I’d have to be careful of ASN.1 code that defines a Result type as it’d otherwise come into conflict with Juptune’s own Result type.

However, with static imports, I can basically just generate code that looks like this:

static import jres = juptune.core.util.result;
// From ASN.1 definition: Result ::= SEQUENCE { -- yada yada -- }
    jres.Result set(/*...*/) @nogc nothrow
        return jres.Result.noError;

Completely removing the need of me having to worry about symbol name conflicts.

Module-local lookups

On a similar vein D allows you to specify that instead of looking up a symbol from any available symbol table (e.g. local vars; non-static imports, etc.) it should instead perform a lookup using the current module’s top-level symbols.

For example:

/* Given this ASN.1 notation:
    Type1 ::= SEQUENCE { -- yada yada -- }
    Type2 ::= SEQUENCE { type2 Type2 }
// The following types are generated
struct Type1 { /* yada yada */ }

The leading . in .Type1 is what causes the module-local lookup.

Essentially, this feature compliments the static import feature to help make it much harder for ASN.1 types to accidentally refer to the wrong symbol when converted into D code.

typeof()

In short: this feature allowed me to be really really lazy with certain parts of the compiler 😀

As the name suggests, typeof() allows you to retrieve the type of any particular symbol you pass into it – this is great when dealing with code generation since sometimes it can be kind of annoying to structure your code in a way where you can easily preserve the type name of some symbol you’re working with.

In other words “this let’s me write bad code and make it still work”.

First example is around how some getters and setters for SEQEUENCE fields are generated. Instead of doing the correct thing and preserving the type name for each field, I got lazy and just used typeof(_field):

// Heavily omitted example
    jres.Result setP(typeof(_p) value) @nogc nothrow
    typeof(_p) getP() @nogc nothrow

The second example is around error messages. Instead of needing to keep track of the current type’s name when generating error messages… I could just use typeof(this) to get the type instead:

// Heavily omitted example
    jres.Result fromDecoding(/* .. */) @nogc
        result = asn1.asn1DecodeComponentHeader!ruleset(memory, componentHeader);
                "when decoding header of field 'p' in type "
                ~__traits(identifier, typeof(this))

What’s even better is that because the entire string is composed of compile-time constants, it doesn’t actually require an allocation + concat at runtime since the compiler will constant fold it for you. This allows fromDecoding to still be marked as @nogc!

D allows trailing commas in almost every context

Generating a parameter list and don’t want to have to care about whether there’s an extra comma or not?

// Trailing commas are allowed!
void func(int param1, int param2,)

Enum options?

Array values?

static immutable ubyte[] mainValue__value = [

D’s got your back! (Except for specifying multiple modules in a single import statement, then for some reason you’re not allowed, but shh about that).

Utilise metaprogramming so your compiler can stay dumb/poorly made

For a while a lot of the types being generated (and some of the core decoding types) didn’t have a toString implementation. This’d normally mean that I couldn’t just use .toString willy-nilly but instead the compiler would need knowledge about which types had a toString or not.

However, as is the common theme now D allows us to be very lazy – instead of keeping track of this ourselves in dasn1, we can instead just generate code where it’s the D compiler’s concern instead of our’s:

    void toString(SinkT)(scope SinkT sink, int depth = 0,)
        static if(__traits(hasMember, typeof(_p), "toString"))
            _p.toString(sink, depth+1);

Job sorted (and future proofed!).

You could definitely utilise D’s metaprogramming for more complicated stuff, but it’s also good for silly little things like this.

Interesting D-specific parts of the implementation

Naturally I’ve tried to use whatever D features that I could in order to implement dasn1, so I thought I’d pick a few parts of the code that rely on D’s features quite heavily as a small showcase.

Mixin templates for AST nodes

Mixin templates are a fairly quirky feature of D – it allows you to define a normal template (essentially a compile-time collection of symbols) and then copy-paste them wherever you like, whether that’s inside a class, struct, the top-level module etc.

Since the ASN.1 grammar only had a handful of node “types”, I decided to use mixin templates to model each specific “type”:

// I've included the mixin template for the `List` type in its entirety, plus a few nodes that reference other mixin templates not shown.
// This is just to give a general idea on how it all works, without diving into many details.
private mixin template List(Asn1NodeType MyType, ItemT)
    import juptune.core.ds : Array;
    enum _MustBeDtored = true; // A compile-time flag that some other metaprogramming in the compiler uses to handle memory management!
    ref typeof(_items) items() => this._items;
    DefinitiveObjIdComponentList ::=
        | DefinitiveObjIdComponent DefinitiveObjIdComponentList
final class Asn1DefinitiveObjIdComponentListNode : Asn1BaseNode
    mixin List!(Asn1NodeType.DefinitiveObjIdComponentList, Asn1DefinitiveObjIdComponentNode);
final class Asn1ModuleDefinitionNode : Asn1BaseNode
    mixin Container!(Asn1NodeType.ModuleDefinition,
        Asn1ModuleIdentifierNode,
        Asn1ExtensionDefaultNode,
        "{" DefinitiveObjIdComponentList "}"
final class Asn1DefinitiveIdentifierNode : Asn1BaseNode
    mixin OneOf!(Asn1NodeType.DefinitiveIdentifier,
        Asn1DefinitiveObjIdComponentListNode,

I probably could’ve gotten away with just using templated base classes instead, but there’s a few differences that actually make that kind of annoying. Namely it’d create some bloated symbol names which would make reading compiler errors even more painful than it already ended up being.

Templates can provide really natural APIs while still catching errors at compile time

Let’s look at one of the AST nodes again:

final class Asn1ModuleDefinitionNode : Asn1BaseNode
    mixin Container!(Asn1NodeType.ModuleDefinition,
        Asn1ModuleIdentifierNode,
        Asn1ExtensionDefaultNode,

This is a node that contains several other nodes. Container itself supports an unbounded amount of node types it can store, since D supports variadic template parameters. You may be asking what the API for this even looks like, and I’ll be glad to show you a quick snippet:

Asn1ModuleDefinitionNode node = /* parse from somewhere */;
// We don't have to work with named functions when we can just work with types!
auto tagDefault   = node.getNode!Asn1TagDefaultNode;
auto modReference = node.getNode!Asn1ModuleIdentifierNode
                        .getNode!Asn1ModuleReferenceTokenNode;
// Since each `Container` node knows what types are available, it can catch errors at compile time still.
node.getNode!Asn1EmptyNode; // Error: "Invalid node type: Asn1EmptyNode"

Let’s have a look at a OneOf node instead now:

final class Asn1TagDefaultNode : Asn1BaseNode
    mixin OneOf!(Asn1NodeType.TagDefault,

This node has a similar template-based API for most of its operations:

// It generates a constructor for each possible type.
// Pretend the `cast(xyz)null`s are actually constructed objects.
auto node = new Asn1TagDefaultNode(cast(Asn1ExplicitTagsNode)null); // Node is for EXPLICIT TAGS
     node = new Asn1TagDefaultNode(cast(Asn1ImplicitTagsNode)null); // Node is for IMPLICIT TAGS
// General getter/checker functions.
bool _                 = node.isNode!Asn1ImplicitTagsNode;
Asn1ImplicitTagsNode _ = node.asNode!Asn1ImplicitTagsNode; // Runtime error if the node isn't storing an `Asn1ImplicitTagsNode`
Asn1ImplicitTagsNode _ = node.maybeNode!Asn1ImplicitTagsNode; // Null if the node isn't storing an `Asn1ImplicitTagsNode`

However the main feature of the OneOf node is its match function. This function requires the user to pass in a handler function for each possible node type that the OneOf can store, and this requirement is enforced at compile-time so that changes to the node type list will immediately require all appropriate match functions to be updated (i.e. no silent breakage).

This is surprisingly easy to implement with D due to its first-class metaprogramming features, I’ll try my best to be brief with how this all works:

// Relatively well omitted
private mixin template OneOf(
    private int _oneOfIndex = -1;
    private template oneOfHandlerFuncTuple()
        import std.meta : staticMap;
        alias ToFuncHandler(alias NodeT) = Result delegate(NodeT) @nogc nothrow;
        alias oneOfHandlerFuncTuple = staticMap!(ToFuncHandler, NodeTypes);
    Result match(scope oneOfHandlerFuncTuple!() handlers)
            static foreach(i, NodeT; NodeTypes)
                    return handlers[i](this.asNode!NodeT);
                assert(false, "bug: oneOfIndex isn't a valid value?");

In essence:

NodeTypes... is the template parameter containing a compile-time tuple of all possible types that this OneOf can store.
oneOfHandlerFuncTuple is a template that generates a new compile-time tuple, where each NodeTypes is mapped into a function pointer type.
match uses the result of oneOfHandlerFuncTuple as its main parameter. Since this is a compile-time tuple of types it automagically gets expanded into multiple parameters under the hood.
static foreach within match’s body allows us to iterate over a compile-time collection (in this case, NodeTypes) and duplicate the foreach’s body for each item. In this case, so we can make a case statement per item in NodeTypes.

So:

If NodeTypes... is (Node1, Node2).
oneOfHandlerFuncTuple results in (Result delegate(Node1), Result delegate(Node2))
And match’s parameters expand into match(scope delegate(Node1) handler_0, scope delegate(Node2) handler_1)

Which means that we could use this example match function like so:

    (Node1 child){ return Result.noError; },
    (Node2 child){ return Result.noError; },

I know that’s a lot to take in especially since I have to be briefer than usual, but TL;DR D makes the hard stuff easy while still being relatively easy on the eyes. I would make a snarky comparison with C++ but literally no one expects C++ metaprogramming to be readable at this point.

D Snark: The forever-experimental allocator package

10 years ago (October 2015) D’s standard library was given an experimental package called std.experimental.allocator. It has a pretty neat but kind of janky way of composing a bunch of allocation building blocks together, in order to “easily” make custom allocators.

I use it for the ASN.1 stuff since it makes it easy to construct and dispose classes within @nogc code, and it looks kind of cool to boot:

import std.experimental.allocator.mallocator                        : Mallocator;
import std.experimental.allocator.building_blocks.allocator_list    : AllocatorList;
import std.experimental.allocator.building_blocks.region            : Region;
import std.experimental.allocator.building_blocks.stats_collector   : StatsCollector, Stats = Options;
private alias NodeAllocator = StatsCollector!(
        (n) => Region!Mallocator(1024 * 1024),

The issue is this package is still experimental 10 years later and I wouldn’t be surprised if it gets removed sooner or later, especially with the Phobos v2 work that’ll hopefully exist in some form before I retire (I’m 26).

😀 The sign of someone who loves this damn language is that they can’t help but provide some level of historical snark. I have no further comments, I just miss the days I had hope for D’s future xD

alias this – a very occasionally useful feature

Situation: I need to store IR nodes using a base class rather than a specific concrete implementation class, but I’d still like to limit the potential options without having to go down the SumType route.

Solution: This short but sweet struct (note: this is a different OneOf struct for IR purposes, not AST purposes).

private struct OneOf(BaseIrT : Asn1BaseIr, IrTypes...) // @suppress(dscanner.suspicious.incomplete_operator_overloading)
    import std.meta : anySatisfy;
    this(IrT : BaseIrT)(IrT ir)
        enum ErrorMsg = "Invalid IR node was passed in. Is not one of: "~IrTypes.stringof;
        static if(is(IrT == BaseIrT))
            static foreach(TargetIrT; IrTypes)
                if(auto casted = cast(TargetIrT) ir)
            enum isInputT(T) = is(T == IrT);
            static assert(anySatisfy!(isInputT, IrTypes), ErrorMsg);

We can initialise this struct like so:

alias ItemT = OneOf!(Asn1BaseIr, Asn1ValueReferenceIr, Asn1TypeReferenceIr);
auto item = ItemT(cast(Asn1ValueReferenceIr)null); // Fine (if we ignore it's null for this example)
auto item = ItemT(cast(Asn1ImportsIr)null); // Not fine - compile-time error since we know the original type already
Asn1BaseIr ir = cast(Asn1ValueReferenceIr)null; // Pretend its not null
auto item = ItemT(ir); // Fine - it looks like an Asn1BaseIr so we have to dynamically cast it at runtime to perform the type check, which passes.
Asn1BaseIr ir = cast(Asn1ImportsIr)null;
auto item = ItemT(ir); // Runtime error - dynamic casting failed.

Now the fun part comes from this weird alias ir this; line. Normally when working with a wrapper struct like this you’d have do something like:

if(Asn1ValueReferenceIr casted = cast(Asn1ValueReferenceIr)item.getWrappedIrNode())

With alias ir this; anytime we try to perform an operation (e.g. casting; function calls, etc.) that the OneOf struct itself does not support, the compiler will instead try to use it on the OneOf.ir field instead:

alias ItemT = OneOf!(Asn1BaseIr, Asn1ValueReferenceIr, Asn1TypeReferenceIr);
// Since `OneOf` doesn't overload the `opCast`, the following would normally fail.
cast(Asn1ValueReferenceIr)item; // -> item.opCast!Asn1ValueReferenceIr wouldn't work since its not overloaded by OneOf.
// The compiler sees the `alias ir this`, and so tries casting that instead.
cast(Asn1ValueReferenceIr)item.ir; // This now works!

It’s a very weird, niche feature which might even get removed or at least deprecated in the future, but it allows for some mild syntax cleanup as shown above.

version(unittest)

Some of the IR types try to strictly limit the way that user code can query and interact with their data, mainly to help prevent potential memory corruption… at least that was my original, flawed reasoning.

This can be awkward when writing unittests, as sometimes you just need to query a very particular part of a type’s data without having to go through all of its hurdles.

And so by simply slapping version(unittest) onto a funciton definition, you now have an escape hatch that won’t make its way out into real code:

// Only compiles when unittests are also compiled.
version(unittest) IrT getByName(IrT : Asn1ValueIr)(const(char)[] name)
    return cast(IrT)this._namedBits[name];

Templates + with() = terse-ish test harnesses

There’s a few examples of this within the codebase. Sometimes unittests are for the most part identical except:

They need to tweak a few types here and there.
They need an “initialiser” function that returns a different type from other unittests.
They need to change what the function-to-be-tested is (and thus what some of the types being used are).
Sometimes the test case type itself needs to have a few types changed.
Everything else is the same though between unittests – it’s mainly just types getting in the way.

It’s one of those things where you kind of just have to use it and do it before you “get it”, so I apologise for the really poor explanation, but this is essentially something you can do with templates.

Here’s one of the templated “test harnesses” I used – this one in particular is for testing the AST -> IR converter functions.

private template GenericTestHarness(NodeToIrT, ActualIrT, alias ParseFunc, alias Converter = asn1AstToIr)
        Asn1SemanticError expectedError;
    void run(T[string] cases)
        import std.traits : EnumMembers;
        foreach(name, test; cases)
                auto node = ParseFunc(parser);
                auto result = Converter(node, irFromNode, context, Asn1NullErrorHandler.instance);
            catch(Throwable err) // @suppress(dscanner.suspicious.catch_em_all)
                assert(false, "\n["~name~"]:\n"~err.msg);

It can be used like so:

@("Asn1Ir - one off edge cases")
    alias Harness = GenericTestHarness!(Asn1ModuleIr, Asn1ModuleIr, (ref parser){
        Asn1ModuleDefinitionNode node;
        parser.ModuleDefinition(node).resultAssert;
        "ensure that default values can lookup type-scoped references": Harness.T(`
            Unittest DEFINITIONS ::= BEGIN
        `, Asn1SemanticError.none),

One main issue, especially for the larger tests, is that specifying Harness.T (and more minorly Harness.run) can start to make the code look chunky and a bit harder to read.

So by using the magical with() statement, instead of writing Harness.run and Harness.T, we can just write run and T and the compiler will know how to lookup these otherwise missing/undefined symbols:

@("Constraints - ensuring value references are handled")
    alias Harness = GenericTestHarness!(Asn1ModuleIr, Asn1ModuleIr, (ref parser){
        Asn1ModuleDefinitionNode node;
        parser.ModuleDefinition(node).resultAssert;
        "BIT STRING - SingleValue": T(`
            Unittest DEFINITIONS ::= BEGIN
        `, Asn1SemanticError.none),
        "BIT STRING - Size - SingleValue": T(`
            Unittest DEFINITIONS ::= BEGIN
                B ::= BIT STRING (SIZE (a))
        `, Asn1SemanticError.none),
        "BIT STRING - Size - ValueRange": T(/* .. */),
        "BOOLEAN - SingleValue": T(/* .. */),

Again this is one of those things that on paper sounds really stupid (and impossible to easily describe), but grows on you really fast when you give it a try.

Pain points

While ASN.1’s basic syntax looks pretty easy from an initial glance, that illusion shatters once you start getting into it more deeply.

Value sequence syntax

ASN.1 has various separate value forms that start with a left bracket ({), a lot of these forms are ambiguous due to a variety of factors and can only be distinguished with semantic context.

Given that dans1 has a clean split between syntax and semantic analysis, “this does not spark joy” as the kids would say.

I’ll let this comment from the parser code explain itself:

// If a left parenthesis shows up directly after any identifier, then it's an OBJECT IDENTIFIER sequence,
// as no other sequence-looking value syntax allows for NameAndNumberForm.
//      { iso-yada-123 asn1(123) }
// If no commas show up and there's only 1 value, then it's ambiguous, so will default to
// (Values in the form of `a { yada }` are ambiguous between a named Sequence value and a
// If no commas show up and there's 1 ambiguous value, then assume it's a NamedValueList.
// If a comma is found; multiple non-named values exist, and any number
// of ambiguous values exist then it's a ValueList.
// If a comma is found, and only ambiguous values exists, assume it's a NamedValueList.
//      { ambiguous {}, twobiguous {} }
// If a comma is found, and any amount of non-ambiguous named values exist, it's a NamedValueList.
//      { ambiguous {}, except this }
// DefinedValue allows for a ParameterizedValue, which uses `{}` to define parameters,
// so we need to keep track of whether we're in a parameter list or not and ignore everything inside one.
//      { some { template, params }, here }
// This loop also keeps track of how many identifiers show up side-by-side, but it's
// currently (and probably never) needed as a way to sort out ambiguity.
// Semantic Analysis will perform the rest of the validation, e.g. sometimes what looks like a
// NamedValueList is also a valid OBJECT IDENTIFIER sequence, so type information will be used to

😀 Fun times.

It’s hard to find important info in the specs

Example: I can’t even remember the exact conditions, but I remember having to debug some generated decoder code since it was failing to decode a specific field. It turned out that this field was under certain “exact conditions” that meant its tag was supposed to be treated as EXPLICIT instead of the module-default IMPLICIT.

I still have no idea where in the spec this behaviour gets mentioned and so I basically had to wing a fix and hope it works going forward.

More generally this feeling and scenario has happened quite a few times – the information is scattered (sometimes across different specs) and is hard to keep track of.

Another example is around module versions. The spec makes absolutely zero mention (that I can see) on how to version modules for non ISO/ITU purposes, and I would greatly appreciate if anyone could help me find this information.

I’d be amazed if there’s a 100% spec compliant implementation out there, even commercially.

You need to implement constraints 3 separate times

The first implementation is to type-check constraints, e.g. UTF8String (SIZE ("yagababa")) doesn’t make sense.
The second implementation is to confirm that ASN.1 notation values are correct, e.g. myInt INTEGER (1) ::= 2 needs to trigger an error.
The third implementation is to generate runtime checks when you generate code from the ASN.1 notation.

It’s tedious and not very fun, but there’s no real way around it.

For a newbie to compiler programming like me I also found it really hard to deal with useful error messages. I ended up running the checks twice: one time to see if there’s even an error at all, and the second time to build up the error string. This is mainly complicated by the existance of UNION and (especially) INTERSECTION constraints.

The dream of immutable IR nodes

I foolishly made the mistake of believing that once I converted the generic AST nodes into the more specific IR nodes that I wouldn’t have to make any major changes to the underlying data (beyond setting up things like symbol tables).

😀 Unfortunately that wonderfully naive thought was quickly crushed as ASN.1 requires the semantic stage to perform certain transformations, certain transformations (e.g. AUTOMATIC TAGS) that ended up/are going to be really annoying due to the way I’ve structured the code.

But that’s future Brad’s problem.

ASN.1 has an all-or-nothing level of complexity

I am extremely thankful that x.509 is an old enough specification that the ASN.1 notation only uses the older syntax of x.680.

The alternative is that you’d need an implementation of the x.681, x.682, and x.683 specs to use any of the newer stuff – this is absolutely non-trivial to implement, and I imagine this is one of the many reasons ASN.1 hasn’t ever really taken off outside of historical and commercialised spaces.

ANY DEFINED BY

There is one exception to the above however and that is ANY DEFINED BY.

It’d basically be used to define a type who’s contents could be any other type conditioned by some other field:

    extension-type OBJECT IDENTIFIER,
    extension-value ANY DEFINED BY extension-type

You then have to piece together what identifier matches which type. Dasn1 doesn’t actually implement ANY DEFINED BY as-is since even by the 2003 revision it was deprecated.

Instead, for better or for worse, dasn1 has a hacked together intrinsic called Dasn1-Any:

-- Small snippet from https://github.com/Juptune/juptune/blob/master/data/asn1/rfc5280-explicit.asn1
-- (yeah the module version is super messed up, I'll fix it eventually)
      Dasn1-Any FROM Dasn1-Intrinsics { iso(0) custom(0) dasn1(1) intrinsics(0) }
AttributeTypeAndValue   ::= SEQUENCE {

This essentially gets lowered down into the decoding code for OCTET STRING but without any sort of tag validation enabled. Unfortunately until/unless I want to implement Information Object Classes, I’m then stuck with having to manually call into the decoding code when I want to turn Dans1-Any fields into their actual types.

I can’t retain all the information I need to know

This is more of a personal one.

Between the various different aspects of the ASN.1 compiler, the x.68x specs, the x.690 spec, and all of the other projects building off of this ASN.1 work (x.509 certificate handling, TLS 1.3) I started to feel like a stranger in my own codebase, even just a week after I had last touched it.

It’ll definitely be interesting making future improvements/changes as my at-hand knowledge is constantly dwindling.

Writing a compiler is tedious work

From having to write 20,000 different node visitors for various reasons; to hand-rolling a syntax parser for a boring, drawn out grammar; to needing to write code that looks 95% the same as the last but that last 5% of difference ranges from drudge to mentally taxing, repeated 9000 times.

I think I can finally say I have some proper compiler experience under my belt ;(

But lord knows that each and every milestone has been so extremely rewarding (as long as I try not to think about the fact that almost no one will be using this code).

p.s. Don’t try to make a template-based parser combinator for the entire grammar of a language you don’t personally control unless you want to see symbol names that are 10Mb+ long and explode the binary size by over 100Mb. Don’t ask me how I know.

(I even hard crashed the D compiler I use once, since I guess the error message was literally too long. That endlessly scrolling console…)

Conclusion

A probably wasted year of my life later and there’s still an insane amount of work left on everything relating to this project (and Juptune) in general, but I think it’s making me a better programmer. Maybe.

The dream is that one day I can put “made an ASN.1 compiler + x.509 certificate handler + TLS 1.3 implementation” on my CV and still get told “sorry, you’re a good match except you don’t have 6 months of production experience in Ansible, we can’t hire you” by a recruiter. God I love this industry.

Don’t do ASN.1 kids, you’ll never be the same.

Source link