Introduction

hacspec is a specification language for crypto primitives, protocols, and more. It is a subset of the Rust programming language, focused on functional programming and it avoids the use of mutable state as much as possible.

This book gives an overview of:

For a quick introduction you can also check out these slides from April 2021. An in-depth technical report is also available, and serves as a reference for the language formalization.

The hacspec language

This section gives an informal description of the hacspec language, whose goal is to provide hands-on documentation for users of the language. For a formal specification of the semantics of the language, please refer to the technical report.

hacspec is a domain-specific language embedded inside Rust. This means that all correct hacspec programs are correct Rust programs: you can use the usual Rust tooling for working on your hacspec developments. However, hacspec is a strict subset of Rust: this means that some features of Rust are forbidden inside hacspec. The goal of restricting the expressiveness of the language is twofold: first, help domain experts such as cryptographers convey their specifications in a fashion that should help them avoid mistakes; second, provide a way for Rust programmers to interact with theorem provers such as F* or Coq.

Syntax

P ::= [i]\*                     Program items
i ::= array!(t, μ, n)           Array type declaration where n is a natural number
    | nat_mod!(t, n)            Modular integer
    | fn f([d]+) -> μ b         Function declaration
    | type t = enum { [c(μ)]+ } Enum declaration (with constructors)
    | type t = struct([μ]+)     Struct declaration (without named fields)
    | const x: μ = e            Constant declaration
    | use crate::*              External crate import
    | use mod::*                Internal module import
d ::= x: τ                      Function argument
    | mut x: τ                  Mutable function argument
μ ::= unit|bool|u8|U8|i8|I8...  Base types
    | Seq<μ>                    Sequence
    | String                    Strings
    | ([μ]+)                    Tuple
    | unit                      Unit type
    | t                         Named type
τ ::= μ                         Plain type
    | &μ                        Immutable reference
b ::= {[s;]+}                   Block
p ::= x                         Variable pattern
    | ([p]+)                    Tuple pattern
    | t (p)                     Struct constructor pattern
    | _                         Wildcard pattern
s ::= let p: τ = e              Let binding
    | let mut p: τ = e          Mutable let binding
    | x = e                     Variable reassignment
    | x = e?                    Variable reassignment with result/option catching
    | let p: τ = e?             Let binding with result/option catching
    | if e then b (else b)      Conditional statements
    | c(e)                      Enum injection
    | match e { [c(p) => e]+ }  Pattern matching
    | for x in e..e b           For loop (integers only)
    | e                         Return expression
    | b                         Statement block
e ::= ()|true|false             Unit and boolean literals
    | n                         Integer literal
    | U8(e)|I8(e)|...           Secret integer classification
    | "..."                     String literal
    | t([e]+)                   Array literal
    | e as μ                    Integer casting
    | x                         Variable
    | ()                        Unit
    | f([a]+)                   Function call
    | if e then e else e        Conditional expression
    | e ⊙ e                     Binary operations
    | ⊘ e                       Unary operations
    | ([e]+)                    Tuple constructor
    | x[e]                      Array or sequence index
a ::= e                         Linear argument
    | &e                        Call-site borrowing
⊙ ::= + | - | * | / | &&
    | || | == | != | > | <
⊘ ::= - | ~

The core of hacspec

Crates and modules

hacspec only supports single-module crates, due to a technical limitation of the Rust compiler. Inside this single file, a hacspec program shall always start with:

use hacspec_lib::*;

// Optional: dependencies on other crates containing hacspec programs
use other_hacpsec_crate::*;

No other form of use is allowed in hacspec, because allowing Rust's complex import patterns would increase the complexity of the hacspec compiler and conflict with the module systems of most proof assistants.

Functions

hacspec is a functional language, and only supports the declaration of top-level functions:

fn hacspec_function(x: bool) -> () {
    ...
}

The functions can take any number of arguments, and may return a value (or not). Note that recursive functions are forbidden in hacspec.

The control flow inside hacspec functions is limited, as return statements are forbidden.

Basic Types

hacspec supports all the Rust primitive types: integers (signed and unsigned), booleans, unit, tuples. hacspec possesses some support for generic types, but only for primitive types defined by the language creators, and not for user-defined types.

Type aliases are allowed in hacspec:

type OneTypeAlias = u32;

Borrows

hacspec forbids mutable borrows in all places. Immutable borrows are allowed in hacspec, but only for function arguments. Indeed, you can declare a function argument as immutably borrowed:

fn hacspec_function(arg: &Seq<u8>) {
    ...
}

You can also immutably borrow a value at the call site of a function:

hacspec_function(&Seq::<u8>::new(64))

In particular, return types cannot contain references, and the same is true for types inside tuples or any data structure.

Constants

hacspec allows the declaration of constants:

const ONE_CONST : bool = false;

Assignments

Inside a function body, hacspec allows regular Rust let-bindings:

let x = ...;

hacspec also allows mutable let bindings, and subsequent reassignments:

let mut x = ...;
...
x = ...;

This allowing of mutable variable might come as a contradiction to hacspec's philosophy of forbidding mutable state. But in fact, mutable local variables in hacspec can be translated to a completely side-effect free form with a state-passing like monadic structure.

Left-hand sides of assignments support destructuring, which is currently the only way to access the members of a tuple:

let (x, y) = z;

Loops

Looping is severely restricted in hacspec compared to Rust, as the only accepted form is for looping with a counter over an integer range:

for i in low..hi {
    ... // The block can use i and reassign mutable variables
}

The motivation for this restriction is to ease the proof of termination of loops. break or continue statements are forbidden.

Conditionals

hacspec allows statement-like conditionals as well as expression-like conditionals:

if cond1 {
    ... // the block can modify mutable variables
} else { // else block is optional here
    ...
}
let x = if cond2 { ... } else { ... };

Statements and expressions

In regular Rust, statements and expressions can be mixed together freely. This not the case in hacspec that imposes a strict precedence of statements over expressions. For instance, the following code is not allowed in hacspec:

let x = if cond {
    y = true; // ERROR: the reassignment is a statement, which cannot
              // be contained in the expression to which x is assigned.
    3
} else {
    4
};

Visibility

hacspec allows for both pub and non-pub versions of item declarations (pub, non-pub, etc). You simply have to respect the Rust visibility rules. Note that these visibility distinctions might not be translated in the proof backends.

Sequences and arrays

Sequences

The staple Vec type is forbidden in hacspec. Instead, you have to use the type Seq, which is implemented as a wrapper around Vec.

The most notable differences between Seq and Vec is that Seq is not resizable, and does not support push and pop operations. Instead, the final length of the seq has to be provided at creation time. See the hacspec standard library documentation for more details.

Seq is a built-in generic type that always has to be indexed by the content of the cells: Seq<u8>, etc.

Arrays

The native Rust array types [<type of contents>, <size>] is forbidden in hacspec. Instead, you have to declare nominally at the top-level new array types for a specific cell content and size with:

array!(FooArray, u8, 64);
// This declares type FooArray as an array of u8 of size 64
bytes!(BarArray, 64);
// bytes! is a specialized version of array! with secret bytes
array!(BazArray, u8, 64, type_for_indexes:BazIndex);
// The additional argument type_for_indexes defines an alias of usize
// intended to spot which usizes are used to index BazArray (useful for
// verification)

Structs and enums

hacspec also supports user-defined structs and enums with some restrictions.

Structs

The only form of struct declaration currently allowed in hacspec is:

struct Foo(u32, Seq<u32>);

The struct thus declared can have one or more components. This form of struct declaration effectively corresponds to a single-case enum, and is implemented as such. Struct components can be accessed through let-binding destructuring:

let Foo(x, y) = z;

Note that you can't store borrowed types inside hacspec structs, hence there is no need for lifetime variables.

Enums

hacspec supports very restricted enum declarations:

enum Foo {
    CaseA,
    CaseB(u16),
    CaseC(Seq<bool>, u64)
}

These declaration don't support the basic Rust features such as C-style union declarations with assignments for each case.

Enumeration values can be pattern-matched in an expression:

match x {
    Foo::CaseA => ...,
    Foo::CaseB(y) => ...,
    Foo::CaseC(y,z) => ...
}

Note that you can't store borrowed types inside hacspec enums, hence there is no need for lifetime variables.

Option and Result

User-defined structs and enums presented above don't support generic type parameters yet. However, the built-in enums Option<T> and Result<T,U> support type parameters. Those type parameters have to be explicitly declared each time, as hacspec does not currently support type inference:

match x {
    Result::<Seq<u8>, bool>::Ok(y) => ...,
    Result::<Seq<u8>, bool>::Err(err) => ...
}

Such type parameter declaration is cumbersome; as a workaround we advise to declare a type alias as such:

type MyResult = Result::<Seq<u8>, bool>;

Error handling

Error handling in Rust is done via the Result type (see the structs and enums section). But on top of explicit pattern-matching, hacspec also supports the popular ? operator to quickly perform an early return and propagate the error case upwards.

? is only allowed at the very end of an expression in a let-binding or reassignment statement:

let x = foo(true)?; // GOOD
let y = foo(bar(0)?); // ERROR: the ? is not at the end of the statement

Currently, ? is the only way to return early in a hacspec function.

The hacspec std library

The hacspec standard library contains a host of functions, type generators and methods that define the base objects manipulated in classic cryptographic primitives.

Methods in the standard library can be divided into three categories:

  1. The not_hacspec methods whose signature and body does not belong to the hacspec fragment of Rust. They should not be used in hacspec code, but can be used as helpers for e.g. testing.
  2. The unsafe_hacspec methods whose signature belongs to hacspec but not the body. These methods can be used in hacspec programs but their body is part of the trusted codebase.
  3. The in_hacspec methods whose signatures and bodies belong to the hacspec fragment of Rust. These can be used safely in hacspec programs.

Arithmetic

hacspec overloads the arithmetic operators for a wide variety of types corresponding to mathematical values mentioned in cryptographic specifications.

The Numeric trait

All of these types implement the Numeric trait defined by the hacspec standard library. The arithmetic operators work for all kinds of integers, but also arrays and sequences (point-wise operations).

Note that the list of types implementing Numeric is hardcoded in the hacspec compiler, and as of this day cannot be extended by the user.

While the Rust compiler can infer the type of integer literals automatically, this feature is not implemented by the hacspec compiler:

let w: u32 = 0; // ERROR: an integer without a suffix will have type usize
let x: u64 = 0x64265u64; // GOOD
let y: u64 = 4u64; // GOOD

Public and secret integers

One of hacspec's goal is to enable users to quickly check whether their application does not obviously break the constant-time guarantee. Certain processor instructions take more or less time to complete depending on their inputs, which can cause secret leakage and break the security of an application. Hence, hacspec offers for each type of integer (u8, u32, etc.) a mirror secret integer type (U8, U32, etc.) for which operations that break constant-timedness are forbidden.

This public/private distinction can be found at a lot of places in the standard library, and is made to forbid functions and data structures from leaking secrets.

Conversions between public and secret integers are restricted to two functions: classify and declassify.

Abstract integers

Some cryptographic specifications talk about modular arithmetic in large fields, whose size overflows even u128. To ease the expression of such specifications, hacspec offers wrapper types around BigInt that can be declared using the following API:

abstract_nat_mod!(
    NameOfModularInts,
    NameOfUnderlyingByteRepresentation,
    256, // Number of bits for the representation of the big integer,
    "ffffffffffffffff00000000000065442", // Hex representation of the modulo value as hex
)

abstract_public_nat_mod!(
    ... // Public version of above
)

Integers as bytes

It is often useful to view an integer as a sequence of bytes that can be manipulated individually. The hacspec standard library provides a number of function to translate back and forth from integer to sequence of bytes:

pub fn u16_to_le_bytes(x: u16) -> u16Word;

pub fn u16_from_le_bytes(s: u16Word) -> u16;

pub fn U64_to_be_bytes(x: U64) -> U64Word;

pub fn U64_from_be_bytes(s: U64Word) -> U64;

...

Sequence and array operations

Operations common to sequences and arrays

Base traits

Some operations are common to both sequences and arrays by design, and can be used as the interoperability base between the two types of collections. These operations are the following:

  • len: gives the length of an array or sequence;
  • iter: iterates over the content of the array or sequence (unsafe in hacspec but can be used for implementing primitives)
  • create: creates a sequence or array and initializes the elements to the default value (0 for arithmetic types);
  • update_slice, update and update_start: produce a new sequence or array with modified contents.

Both sequences and arrays implement indexing with any type of unsigned public integer.

Chunking

Both arrays and sequences support chunking with methods like:

  • num_chunks and num_exact_chunks (whole or partial blocks);
  • get_chunk, get_exact_chunk and get_remainder_chunk;
  • set_chunk and set_exact_chunk.

The read operations borrow the sequence or array, but the write operations create a new sequence or array.

Conversions

Sequences and arrays can be created from other types via methods like:

  • from_public_slice and from_slice;
  • from_vec and from_native_slice;
  • from_public_seq and from_seq (to convert a seq into an array of the correct size);
  • from_string and from_hex for byte or hex strings (hex only for u8 sequences and arrays).

Secrecy

The methods prefixed by public performs an element-wise classification of the data under the hood.

Ownage

Some methods have two versions: an owned and a non-owned version, depending on whether the self argument is consumed or not by the method. This distinction is useful to avoid unnecessary copies and thus be more performant.

Array-specific operations

Since array length is known statically, new does not take any argument, same as length. slices of arrays become Seq.

Sequence-specific operations

Sequences can be extended (by creating a new sequence under the hood) with push or concat. Sequences can also be sliced with slice.

Examples

The main hacspec repository contains a set of example specifications. In this section we pull out some interesting bits to demonstrate the hacspec language.

There's also a provider that bundles the different cryptographic primitives into a single library. The provider implements the RustCrypto traits in order to facilitate interoperability.

Usage

Specifications

Verification

Coq

QuickCheck / QuickChick

You can test your hacspec code using QuickCheck (a Rust library for randomized property-based testing), by simply implementing quickcheck::Arbitrary for the type you want to generate tests for. For example:

impl Arbitrary for Fp {
    fn arbitrary(g: &mut Gen) -> Fp {
        let mut a: [u64; 6] = [0; 6];
        for i in 0..6 {
            a[i] = u64::arbitrary(g);
        }
        let mut b: [u8; 48] = [0; 48];
        for i in 0..6 {
            let val: u64 = a[i];
            b[(i*8)..((i+1)*8)].copy_from_slice(&(val.to_le_bytes()));
        }
        Fp::from_byte_seq_le(Seq::<U8>::from_public_slice(&b))
    }
}

then you can use the quickcheck attribute, to make QuickCheck do property based testing for this function:

#[cfg(test)]
#[quickcheck] //Using the fp arbitraty implementation from above to generate fp2 elements.
fn test_fp2_prop_add_neg(a: Fp2) -> bool {
    let b = fp2neg(a);
    fp2fromfp(Fp::ZERO()) == fp2add(a, b)
}

which will run when you do cargo test. If you then add the tag #[cfg(proof)] and export to Coq,

cargo hacspec -o coq/src/<output_file_name>.v <input_crate_name>

then you get corresponding QuickChick test,

Definition test_fp2_prop_add_neg (a_320 : fp2) : bool :=
  let b_321 :=
    fp2neg (a_320) in
  (fp2fromfp (nat_mod_zero )) =.? (fp2add (a_320) (b_321)).

QuickChick (forAll g_fp2 (fun a_320 : fp2 =>test_fp2_prop_add_neg a_320)).

and generators will be constructed for the types automatically as,

Instance show_fp : Show (fp) := Build_Show (fp) (fun x => show (GZnZ.val _ x)).
Definition g_fp : G (fp) := @bindGen Z (fp) (arbitrary) (fun x => returnGen (@Z_in_nat_mod _ x)).
Instance gen_fp : Gen (fp) := Build_Gen fp g_fp.

which you can then run through coq in the folder coq/

coqc -R src/ Hacspec src/<output_file_name>.v

Make sure you run:

coqc -R src/ Hacspec src/MachineIntegers.v
coqc -R src/ Hacspec src/Lib.v

or make to generate the .vo files used by <output_file_name>.v.

For more information:

Test Vectors

For Developers

This section contains a handy guide for hacspec developers working on the standard library or the compiler.

Working on the compiler

High-level architecture

Hacspec compiler architecture

The Rustspec compiler intervenes after the regular Rust typechecking, by translating the Rust AST into a stricter hacspec AST, yielding error messages if you're not in the subset.

The hacspec AST then undergoes a typechecking phase that replicates the formal typechecking judgment of hacspec, before being compiled to the proof backends like F* or Coq.

Code organization

The source code for the compiler is located in the language/ folder. main.rs is the file containing the driver for the different compiler passes.

Hacspec AST

The main file of the compiler is rustspec.rs and it contains the AST structure.

Types are usually enclosed into Spanned<...> blocks that attach a location information to an AST node, thereby providing a way to display beautiful error message.

Several nodes also contain a Fillable<...> node standing for information that is filled by the typechecking phase but that can be left to None when building the AST.

Translation from Rust AST

This phase is contained in ast_to_rustspec.rs. The trickyness of this translation is that it needs to be aware of certain special names contained in the structure SpecialNames. Indeed, while the Rust AST treats the application enum constructors like function applications, the hacspec AST considers them as proper injection so we need to distinguish them in the Rust AST. For that, we need to know all declared enums at this point of the program.

Enums and other SpecialNames are also defined in the ExternalData that contains the signatures and types imported in crates used by the hacspec program being compiled.

Name resolution

When the translation from Rust AST is finished, the identifiers for all variables inside function bodies are of the following type:

pub enum Ident {
    Unresolved(String),
    Local(LocalIdent),
    TopLevel(TopLevelIdent),
}

More precisely, they are still in the Ident::Unresolved case. The compiler pass in name_resolution.rs resolves the identifiers by linking them to local or global identifiers, each one having a unique ID. hacspec does not feature De Bruijn variable handling, instead relying on unique fresh IDs for differentiating local and global variables from each other.

External data

A hacspec file can never (in principal) be considered alone, as it usually imports at least several other crates like the hacspec standard library. These external crates must pre-populate the typechecking context with the types and function signatures that they define.

It's the job of hir_to_rustspec.rs to retrieve this data. The critical piece of code in this file is the following:

let num_def_ids = crate_store.num_def_ids_untracked(*krate_num);
let def_ids = (0..num_def_ids).into_iter().map(|id| DefId {
    krate: *krate_num,
    index: DefIndex::from_usize(id),
});

First, we retrieve the number of exported symbols by an external crate using num_def_ids_untracked, a function that is specifically labeled as critical to the hacspec compiler in the Rust compiler codebase. Then, we manufacture definition IDs for all these exported symbols, relying on the invariant that they are numbered from 0 to the number of exported symbols in Rust's compiled crate metadata format.

Then, we use those definition IDs (DefId) to query the Rust compiler via the central TyCtxt data structure. If the DefId corresponds to a type definition, we examine the type definition structurally and check whether it corresponds to a hacspec-compatible type definition. Notably, the type definitions generated by macros like array! or nat_mod! are only seen here in their expanded version, so we have to retro-engineer which expanded version corresponds to which macro expansion. This is a vulnerability of the compiler since it's possible to break the abstraction of the language by smuggling in a type not defined via a hacspec macro this way. That's why hacspec developers should be very careful about which dependencies they import in order to have a 100% safety guarantee.

For DefIds corresponding to functions, the signature of the function is analysed and if it fits the subset of types expected by hacspec, the function is imported along with its type in a pre-populated typechecking context.

Note that it is not possible any more at this point to retrieve the #[in_hacspec], #[unsafe_hacspec], etc. attributes that would tag the external definitions, since these attributes get erased by the Rust compiler before reaching the compiled crates metadata.

Typechecking

The typechecking is done in typechecker.rs and follows a very regular structure, making heavy use of immutable data structures to not mess up the various context manipulations.

Note that we need to perform a full typechecking complete with method resolution because the proof backends need very fine-grained typechecking information to generate correct code.

Be careful: types often need to be de-aliased with dealias_type before being matched on structurally. Forgetting to dealias will lead to bugs with type aliases.

Proof backends

The different proof backends (rustspec_to_fstar.rs, etc) all enjoy a similar structure that ought to be refactored to showcase their commonality. The backends don't use an intermediate AST to generate the code in the proof assistant but rather directly print code as string using the pretty pretty-printing library. If you want to start a new proof backend, the easiest solution is probably to copy an existing proof backend and tweak it until you get the right result.

The code generation has to be fine-tuned to interface with a replica of the hacspec standard library in the host proof assistant, whose correspondence with the original hacspec library in Rust is part of the trusted code base. More specially, clever solutions to encode sequences and array, as well as all the different types of public and secret machine integers, and the interaction between the two (seeing a double as a string of bytes) have to be implemented through proof assistant libraries.

Unit tests

The compiler has various unit tests that are controlled trough the language/tests files. Please enrich the unit tests bases in language-tests, negative-language-tests and test-crate as you implement new features for the compiler. The compiler can also be tested against all the hacspec cryptographic specifications by running examples/typecheck_examples.sh from the root of the repository.

Contributors

A list of contributors to the hacspec project.

If you feel you're missing from this list, feel free to add yourself in a PR.