Working on the compiler

High-level architecture

Hacspec compiler architecture

The Rustspec compiler intervenes after the regular Rust typechecking, by translating the Rust AST into a stricter hacspec AST, yielding error messages if you're not in the subset.

The hacspec AST then undergoes a typechecking phase that replicates the formal typechecking judgment of hacspec, before being compiled to the proof backends like F* or Coq.

Code organization

The source code for the compiler is located in the language/ folder. main.rs is the file containing the driver for the different compiler passes.

Hacspec AST

The main file of the compiler is rustspec.rs and it contains the AST structure.

Types are usually enclosed into Spanned<...> blocks that attach a location information to an AST node, thereby providing a way to display beautiful error message.

Several nodes also contain a Fillable<...> node standing for information that is filled by the typechecking phase but that can be left to None when building the AST.

Translation from Rust AST

This phase is contained in ast_to_rustspec.rs. The trickyness of this translation is that it needs to be aware of certain special names contained in the structure SpecialNames. Indeed, while the Rust AST treats the application enum constructors like function applications, the hacspec AST considers them as proper injection so we need to distinguish them in the Rust AST. For that, we need to know all declared enums at this point of the program.

Enums and other SpecialNames are also defined in the ExternalData that contains the signatures and types imported in crates used by the hacspec program being compiled.

Name resolution

When the translation from Rust AST is finished, the identifiers for all variables inside function bodies are of the following type:

pub enum Ident {
    Unresolved(String),
    Local(LocalIdent),
    TopLevel(TopLevelIdent),
}

More precisely, they are still in the Ident::Unresolved case. The compiler pass in name_resolution.rs resolves the identifiers by linking them to local or global identifiers, each one having a unique ID. hacspec does not feature De Bruijn variable handling, instead relying on unique fresh IDs for differentiating local and global variables from each other.

External data

A hacspec file can never (in principal) be considered alone, as it usually imports at least several other crates like the hacspec standard library. These external crates must pre-populate the typechecking context with the types and function signatures that they define.

It's the job of hir_to_rustspec.rs to retrieve this data. The critical piece of code in this file is the following:

let num_def_ids = crate_store.num_def_ids_untracked(*krate_num);
let def_ids = (0..num_def_ids).into_iter().map(|id| DefId {
    krate: *krate_num,
    index: DefIndex::from_usize(id),
});

First, we retrieve the number of exported symbols by an external crate using num_def_ids_untracked, a function that is specifically labeled as critical to the hacspec compiler in the Rust compiler codebase. Then, we manufacture definition IDs for all these exported symbols, relying on the invariant that they are numbered from 0 to the number of exported symbols in Rust's compiled crate metadata format.

Then, we use those definition IDs (DefId) to query the Rust compiler via the central TyCtxt data structure. If the DefId corresponds to a type definition, we examine the type definition structurally and check whether it corresponds to a hacspec-compatible type definition. Notably, the type definitions generated by macros like array! or nat_mod! are only seen here in their expanded version, so we have to retro-engineer which expanded version corresponds to which macro expansion. This is a vulnerability of the compiler since it's possible to break the abstraction of the language by smuggling in a type not defined via a hacspec macro this way. That's why hacspec developers should be very careful about which dependencies they import in order to have a 100% safety guarantee.

For DefIds corresponding to functions, the signature of the function is analysed and if it fits the subset of types expected by hacspec, the function is imported along with its type in a pre-populated typechecking context.

Note that it is not possible any more at this point to retrieve the #[in_hacspec], #[unsafe_hacspec], etc. attributes that would tag the external definitions, since these attributes get erased by the Rust compiler before reaching the compiled crates metadata.

Typechecking

The typechecking is done in typechecker.rs and follows a very regular structure, making heavy use of immutable data structures to not mess up the various context manipulations.

Note that we need to perform a full typechecking complete with method resolution because the proof backends need very fine-grained typechecking information to generate correct code.

Be careful: types often need to be de-aliased with dealias_type before being matched on structurally. Forgetting to dealias will lead to bugs with type aliases.

Proof backends

The different proof backends (rustspec_to_fstar.rs, etc) all enjoy a similar structure that ought to be refactored to showcase their commonality. The backends don't use an intermediate AST to generate the code in the proof assistant but rather directly print code as string using the pretty pretty-printing library. If you want to start a new proof backend, the easiest solution is probably to copy an existing proof backend and tweak it until you get the right result.

The code generation has to be fine-tuned to interface with a replica of the hacspec standard library in the host proof assistant, whose correspondence with the original hacspec library in Rust is part of the trusted code base. More specially, clever solutions to encode sequences and array, as well as all the different types of public and secret machine integers, and the interaction between the two (seeing a double as a string of bytes) have to be implemented through proof assistant libraries.

Unit tests

The compiler has various unit tests that are controlled trough the language/tests files. Please enrich the unit tests bases in language-tests, negative-language-tests and test-crate as you implement new features for the compiler. The compiler can also be tested against all the hacspec cryptographic specifications by running examples/typecheck_examples.sh from the root of the repository.