Introduction
hacspec is a specification language for crypto primitives, protocols, and more. It is a subset of the Rust programming language, focused on functional programming and it avoids the use of mutable state as much as possible.
This book gives an overview of:
- The hacspec language;
- The different parts of the project:
- How to use hacspec to write specifications for standards and verification;
- How to use hacspec to generate test vectors;
- Work on the hacspec compiler and tooling itself.
For a quick introduction you can also check out these slides from April 2021. An in-depth technical report is also available, and serves as a reference for the language formalization.
The hacspec language
This section gives an informal description of the hacspec language, whose goal is to provide hands-on documentation for users of the language. For a formal specification of the semantics of the language, please refer to the technical report.
hacspec is a domain-specific language embedded inside Rust. This means that all correct hacspec programs are correct Rust programs: you can use the usual Rust tooling for working on your hacspec developments. However, hacspec is a strict subset of Rust: this means that some features of Rust are forbidden inside hacspec. The goal of restricting the expressiveness of the language is twofold: first, help domain experts such as cryptographers convey their specifications in a fashion that should help them avoid mistakes; second, provide a way for Rust programmers to interact with theorem provers such as F* or Coq.
Syntax
P ::= [i]\* Program items
i ::= array!(t, μ, n) Array type declaration where n is a natural number
| nat_mod!(t, n) Modular integer
| fn f([d]+) -> μ b Function declaration
| type t = enum { [c(μ)]+ } Enum declaration (with constructors)
| type t = struct([μ]+) Struct declaration (without named fields)
| const x: μ = e Constant declaration
| use crate::* External crate import
| use mod::* Internal module import
d ::= x: τ Function argument
| mut x: τ Mutable function argument
μ ::= unit|bool|u8|U8|i8|I8... Base types
| Seq<μ> Sequence
| String Strings
| ([μ]+) Tuple
| unit Unit type
| t Named type
τ ::= μ Plain type
| &μ Immutable reference
b ::= {[s;]+} Block
p ::= x Variable pattern
| ([p]+) Tuple pattern
| t (p) Struct constructor pattern
| _ Wildcard pattern
s ::= let p: τ = e Let binding
| let mut p: τ = e Mutable let binding
| x = e Variable reassignment
| x = e? Variable reassignment with result/option catching
| let p: τ = e? Let binding with result/option catching
| if e then b (else b) Conditional statements
| c(e) Enum injection
| match e { [c(p) => e]+ } Pattern matching
| for x in e..e b For loop (integers only)
| e Return expression
| b Statement block
e ::= ()|true|false Unit and boolean literals
| n Integer literal
| U8(e)|I8(e)|... Secret integer classification
| "..." String literal
| t([e]+) Array literal
| e as μ Integer casting
| x Variable
| () Unit
| f([a]+) Function call
| if e then e else e Conditional expression
| e ⊙ e Binary operations
| ⊘ e Unary operations
| ([e]+) Tuple constructor
| x[e] Array or sequence index
a ::= e Linear argument
| &e Call-site borrowing
⊙ ::= + | - | * | / | &&
| || | == | != | > | <
⊘ ::= - | ~
The core of hacspec
Crates and modules
hacspec only supports single-module crates, due to a technical limitation of the Rust compiler. Inside this single file, a hacspec program shall always start with:
use hacspec_lib::*;
// Optional: dependencies on other crates containing hacspec programs
use other_hacpsec_crate::*;
No other form of use
is allowed in hacspec, because allowing Rust's
complex import patterns would increase the complexity of the hacspec compiler
and conflict with the module systems of most proof assistants.
Functions
hacspec is a functional language, and only supports the declaration of top-level functions:
fn hacspec_function(x: bool) -> () {
...
}
The functions can take any number of arguments, and may return a value (or not). Note that recursive functions are forbidden in hacspec.
The control flow inside hacspec functions is limited, as return
statements
are forbidden.
Basic Types
hacspec supports all the Rust primitive types: integers (signed and unsigned), booleans, unit, tuples. hacspec possesses some support for generic types, but only for primitive types defined by the language creators, and not for user-defined types.
Type aliases are allowed in hacspec:
type OneTypeAlias = u32;
Borrows
hacspec forbids mutable borrows in all places. Immutable borrows are allowed in hacspec, but only for function arguments. Indeed, you can declare a function argument as immutably borrowed:
fn hacspec_function(arg: &Seq<u8>) {
...
}
You can also immutably borrow a value at the call site of a function:
hacspec_function(&Seq::<u8>::new(64))
In particular, return types cannot contain references, and the same is true for types inside tuples or any data structure.
Constants
hacspec allows the declaration of constants:
const ONE_CONST : bool = false;
Assignments
Inside a function body, hacspec allows regular Rust let-bindings:
let x = ...;
hacspec also allows mutable let bindings, and subsequent reassignments:
let mut x = ...;
...
x = ...;
This allowing of mutable variable might come as a contradiction to hacspec's philosophy of forbidding mutable state. But in fact, mutable local variables in hacspec can be translated to a completely side-effect free form with a state-passing like monadic structure.
Left-hand sides of assignments support destructuring, which is currently the only way to access the members of a tuple:
let (x, y) = z;
Loops
Looping is severely restricted in hacspec compared to Rust, as the only accepted form is
for
looping with a counter over an integer range:
for i in low..hi {
... // The block can use i and reassign mutable variables
}
The motivation for this restriction is to ease the proof of termination of
loops. break
or continue
statements are forbidden.
Conditionals
hacspec allows statement-like conditionals as well as expression-like conditionals:
if cond1 {
... // the block can modify mutable variables
} else { // else block is optional here
...
}
let x = if cond2 { ... } else { ... };
Statements and expressions
In regular Rust, statements and expressions can be mixed together freely. This not the case in hacspec that imposes a strict precedence of statements over expressions. For instance, the following code is not allowed in hacspec:
let x = if cond {
y = true; // ERROR: the reassignment is a statement, which cannot
// be contained in the expression to which x is assigned.
3
} else {
4
};
Visibility
hacspec allows for both pub
and non-pub
versions of item declarations
(pub, non-pub, etc). You simply have to respect the Rust visibility rules. Note
that these visibility distinctions might not be translated in the proof
backends.
Sequences and arrays
Sequences
The staple Vec
type is forbidden in hacspec. Instead, you have to use the
type Seq
, which is implemented as a wrapper around Vec
.
The most notable differences between Seq
and Vec
is that Seq
is not
resizable, and does not support push
and pop
operations. Instead, the
final length of the seq has to be provided at creation time. See the
hacspec standard library documentation for more details.
Seq
is a built-in generic type that always has to be indexed by the content of the
cells: Seq<u8>
, etc.
Arrays
The native Rust array types [<type of contents>, <size>]
is forbidden in
hacspec. Instead, you have to declare nominally at the top-level new array types
for a specific cell content and size with:
array!(FooArray, u8, 64);
// This declares type FooArray as an array of u8 of size 64
bytes!(BarArray, 64);
// bytes! is a specialized version of array! with secret bytes
array!(BazArray, u8, 64, type_for_indexes:BazIndex);
// The additional argument type_for_indexes defines an alias of usize
// intended to spot which usizes are used to index BazArray (useful for
// verification)
Structs and enums
hacspec also supports user-defined structs and enums with some restrictions.
Structs
The only form of struct declaration currently allowed in hacspec is:
struct Foo(u32, Seq<u32>);
The struct thus declared can have one or more components. This form of struct declaration effectively corresponds to a single-case enum, and is implemented as such. Struct components can be accessed through let-binding destructuring:
let Foo(x, y) = z;
Note that you can't store borrowed types inside hacspec structs, hence there is no need for lifetime variables.
Enums
hacspec supports very restricted enum
declarations:
enum Foo {
CaseA,
CaseB(u16),
CaseC(Seq<bool>, u64)
}
These declaration don't support the basic Rust features such as C-style union declarations with assignments for each case.
Enumeration values can be pattern-matched in an expression:
match x {
Foo::CaseA => ...,
Foo::CaseB(y) => ...,
Foo::CaseC(y,z) => ...
}
Note that you can't store borrowed types inside hacspec enums, hence there is no need for lifetime variables.
Option and Result
User-defined structs and enums presented above don't support generic type
parameters yet. However, the built-in enums Option<T>
and Result<T,U>
support type parameters. Those type parameters have to be explicitly declared
each time, as hacspec does not currently support type inference:
match x {
Result::<Seq<u8>, bool>::Ok(y) => ...,
Result::<Seq<u8>, bool>::Err(err) => ...
}
Such type parameter declaration is cumbersome; as a workaround we advise to declare a type alias as such:
type MyResult = Result::<Seq<u8>, bool>;
Error handling
Error handling in Rust is done via the Result
type (see the structs and
enums section). But on top of explicit pattern-matching, hacspec also
supports the popular ?
operator to quickly perform an early return and
propagate the error case upwards.
?
is only allowed at the very end of an expression in a let-binding or
reassignment statement:
let x = foo(true)?; // GOOD
let y = foo(bar(0)?); // ERROR: the ? is not at the end of the statement
Currently, ?
is the only way to return early in a hacspec function.
The hacspec std library
The hacspec standard library contains a host of functions, type generators and methods that define the base objects manipulated in classic cryptographic primitives.
Methods in the standard library can be divided into three categories:
- The
not_hacspec
methods whose signature and body does not belong to the hacspec fragment of Rust. They should not be used in hacspec code, but can be used as helpers for e.g. testing. - The
unsafe_hacspec
methods whose signature belongs to hacspec but not the body. These methods can be used in hacspec programs but their body is part of the trusted codebase. - The
in_hacspec
methods whose signatures and bodies belong to the hacspec fragment of Rust. These can be used safely in hacspec programs.
Arithmetic
hacspec overloads the arithmetic operators for a wide variety of types corresponding to mathematical values mentioned in cryptographic specifications.
The Numeric
trait
All of these types implement the Numeric
trait defined by the hacspec standard
library. The arithmetic operators work for all kinds of integers, but also
arrays and sequences (point-wise operations).
Note that the list of types implementing Numeric
is hardcoded in the
hacspec compiler, and as of this day cannot be extended by the user.
While the Rust compiler can infer the type of integer literals automatically, this feature is not implemented by the hacspec compiler:
let w: u32 = 0; // ERROR: an integer without a suffix will have type usize
let x: u64 = 0x64265u64; // GOOD
let y: u64 = 4u64; // GOOD
Public and secret integers
One of hacspec's goal is to enable users to quickly check whether their
application does not obviously break the constant-time guarantee.
Certain processor instructions take more or less time to complete depending
on their inputs, which can cause secret leakage and break the security of
an application. Hence, hacspec offers for each type of integer (u8
, u32
, etc.)
a mirror secret integer type (U8
, U32
, etc.) for which operations
that break constant-timedness are forbidden.
This public/private distinction can be found at a lot of places in the standard library, and is made to forbid functions and data structures from leaking secrets.
Conversions between public and secret integers are restricted to two functions:
classify
and declassify
.
Abstract integers
Some cryptographic specifications talk about modular arithmetic in large
fields, whose size overflows even u128
. To ease the expression of such
specifications, hacspec offers wrapper types around BigInt
that can be
declared using the following API:
abstract_nat_mod!(
NameOfModularInts,
NameOfUnderlyingByteRepresentation,
256, // Number of bits for the representation of the big integer,
"ffffffffffffffff00000000000065442", // Hex representation of the modulo value as hex
)
abstract_public_nat_mod!(
... // Public version of above
)
Integers as bytes
It is often useful to view an integer as a sequence of bytes that can be manipulated individually. The hacspec standard library provides a number of function to translate back and forth from integer to sequence of bytes:
pub fn u16_to_le_bytes(x: u16) -> u16Word;
pub fn u16_from_le_bytes(s: u16Word) -> u16;
pub fn U64_to_be_bytes(x: U64) -> U64Word;
pub fn U64_from_be_bytes(s: U64Word) -> U64;
...
Sequence and array operations
Operations common to sequences and arrays
Base traits
Some operations are common to both sequences and arrays by design, and can be used as the interoperability base between the two types of collections. These operations are the following:
len
: gives the length of an array or sequence;iter
: iterates over the content of the array or sequence (unsafe in hacspec but can be used for implementing primitives)create
: creates a sequence or array and initializes the elements to the default value (0 for arithmetic types);update_slice
,update
andupdate_start
: produce a new sequence or array with modified contents.
Both sequences and arrays implement indexing with any type of unsigned public integer.
Chunking
Both arrays and sequences support chunking with methods like:
num_chunks
andnum_exact_chunks
(whole or partial blocks);get_chunk
,get_exact_chunk
andget_remainder_chunk
;set_chunk
andset_exact_chunk
.
The read operations borrow the sequence or array, but the write operations create a new sequence or array.
Conversions
Sequences and arrays can be created from other types via methods like:
from_public_slice
andfrom_slice
;from_vec
andfrom_native_slice
;from_public_seq
andfrom_seq
(to convert a seq into an array of the correct size);from_string
andfrom_hex
for byte or hex strings (hex only foru8
sequences and arrays).
Secrecy
The methods prefixed by public
performs an element-wise classification of the
data under the hood.
Ownage
Some methods have two versions: an owned
and a non-owned
version, depending
on whether the self
argument is consumed or not by the method. This distinction
is useful to avoid unnecessary copies and thus be more performant.
Array-specific operations
Since array length is known statically, new
does not take any argument,
same as length
. slice
s of arrays become Seq
.
Sequence-specific operations
Sequences can be extended (by creating a new sequence under the hood) with
push
or concat
. Sequences can also be sliced with slice
.
Examples
The main hacspec repository contains a set of example specifications. In this section we pull out some interesting bits to demonstrate the hacspec language.
There's also a provider that bundles the different cryptographic primitives into a single library. The provider implements the RustCrypto traits in order to facilitate interoperability.
Usage
Specifications
Verification
Coq
QuickCheck / QuickChick
You can test your hacspec code using QuickCheck (a Rust library for randomized property-based testing), by simply implementing quickcheck::Arbitrary
for the type you want to generate tests for. For example:
impl Arbitrary for Fp {
fn arbitrary(g: &mut Gen) -> Fp {
let mut a: [u64; 6] = [0; 6];
for i in 0..6 {
a[i] = u64::arbitrary(g);
}
let mut b: [u8; 48] = [0; 48];
for i in 0..6 {
let val: u64 = a[i];
b[(i*8)..((i+1)*8)].copy_from_slice(&(val.to_le_bytes()));
}
Fp::from_byte_seq_le(Seq::<U8>::from_public_slice(&b))
}
}
then you can use the quickcheck
attribute, to make QuickCheck do property based testing for this function:
#[cfg(test)]
#[quickcheck] //Using the fp arbitraty implementation from above to generate fp2 elements.
fn test_fp2_prop_add_neg(a: Fp2) -> bool {
let b = fp2neg(a);
fp2fromfp(Fp::ZERO()) == fp2add(a, b)
}
which will run when you do cargo test
. If you then add the tag #[cfg(proof)]
and export to Coq,
cargo hacspec -o coq/src/<output_file_name>.v <input_crate_name>
then you get corresponding QuickChick test,
Definition test_fp2_prop_add_neg (a_320 : fp2) : bool :=
let b_321 :=
fp2neg (a_320) in
(fp2fromfp (nat_mod_zero )) =.? (fp2add (a_320) (b_321)).
QuickChick (forAll g_fp2 (fun a_320 : fp2 =>test_fp2_prop_add_neg a_320)).
and generators will be constructed for the types automatically as,
Instance show_fp : Show (fp) := Build_Show (fp) (fun x => show (GZnZ.val _ x)).
Definition g_fp : G (fp) := @bindGen Z (fp) (arbitrary) (fun x => returnGen (@Z_in_nat_mod _ x)).
Instance gen_fp : Gen (fp) := Build_Gen fp g_fp.
which you can then run through coq in the folder coq/
coqc -R src/ Hacspec src/<output_file_name>.v
Make sure you run:
coqc -R src/ Hacspec src/MachineIntegers.v
coqc -R src/ Hacspec src/Lib.v
or make
to generate the .vo
files used by <output_file_name>.v
.
For more information:
- on QuickCheck (in rust): BurntSushi/quickcheck
- on QuickChick: Software foundations book on QuickChick
Test Vectors
For Developers
This section contains a handy guide for hacspec developers working on the standard library or the compiler.
Working on the compiler
High-level architecture
The Rustspec compiler intervenes after the regular Rust typechecking, by translating the Rust AST into a stricter hacspec AST, yielding error messages if you're not in the subset.
The hacspec AST then undergoes a typechecking phase that replicates the formal typechecking judgment of hacspec, before being compiled to the proof backends like F* or Coq.
Code organization
The source code for the compiler is located in the language/
folder.
main.rs
is the file containing the driver for the different compiler
passes.
Hacspec AST
The main file of the compiler is rustspec.rs
and it contains the AST
structure.
Types are usually enclosed into Spanned<...>
blocks that attach a location
information to an AST node, thereby providing a way to display beautiful error
message.
Several nodes also contain a Fillable<...>
node standing for information
that is filled by the typechecking phase but that can be left to None
when
building the AST.
Translation from Rust AST
This phase is contained in ast_to_rustspec.rs
. The trickyness of this
translation is that it needs to be aware of certain special names contained
in the structure SpecialNames
. Indeed, while the Rust AST treats the application
enum constructors like function applications, the hacspec AST considers them as
proper injection so we need to distinguish them in the Rust AST. For that, we
need to know all declared enums at this point of the program.
Enums and other SpecialNames
are also defined in the ExternalData
that
contains the signatures and types imported in crates used by the hacspec
program being compiled.
Name resolution
When the translation from Rust AST is finished, the identifiers for all variables inside function bodies are of the following type:
pub enum Ident {
Unresolved(String),
Local(LocalIdent),
TopLevel(TopLevelIdent),
}
More precisely, they are still in the Ident::Unresolved
case. The compiler
pass in name_resolution.rs
resolves the identifiers by linking them to local or global identifiers,
each one having a unique ID. hacspec does not feature De Bruijn variable
handling, instead relying on unique fresh IDs for differentiating local
and global variables from each other.
External data
A hacspec file can never (in principal) be considered alone, as it usually imports at least several other crates like the hacspec standard library. These external crates must pre-populate the typechecking context with the types and function signatures that they define.
It's the job of hir_to_rustspec.rs
to retrieve this data. The critical
piece of code in this file is the following:
let num_def_ids = crate_store.num_def_ids_untracked(*krate_num);
let def_ids = (0..num_def_ids).into_iter().map(|id| DefId {
krate: *krate_num,
index: DefIndex::from_usize(id),
});
First, we retrieve the number of exported symbols by an external crate using
num_def_ids_untracked
, a function that is specifically labeled
as critical to the hacspec compiler in the Rust compiler codebase. Then,
we manufacture definition IDs for all these exported symbols, relying on the
invariant that they are numbered from 0 to the number of exported symbols
in Rust's compiled crate metadata format.
Then, we use those definition IDs (DefId
) to query the Rust compiler
via the central TyCtxt
data structure. If the DefId
corresponds to a type definition, we examine the
type definition structurally and check whether it corresponds to a hacspec-compatible
type definition. Notably, the type definitions generated by macros like array!
or nat_mod!
are only seen here in their expanded version, so we have to retro-engineer
which expanded version corresponds to which macro expansion. This is a vulnerability
of the compiler since it's possible to break the abstraction of the language
by smuggling in a type not defined via a hacspec macro this way. That's why hacspec
developers should be very careful about which dependencies they import in order
to have a 100% safety guarantee.
For DefId
s corresponding to functions, the signature of the function is analysed
and if it fits the subset of types expected by hacspec, the function is imported
along with its type in a pre-populated typechecking context.
Note that it is not possible any more at this point to retrieve the #[in_hacspec]
,
#[unsafe_hacspec]
, etc. attributes that would tag the external definitions,
since these attributes get erased by the Rust compiler before reaching the
compiled crates metadata.
Typechecking
The typechecking is done in typechecker.rs
and follows a very regular structure,
making heavy use of immutable data structures to not mess up the various
context manipulations.
Note that we need to perform a full typechecking complete with method resolution because the proof backends need very fine-grained typechecking information to generate correct code.
Be careful: types often need to be de-aliased with dealias_type
before
being matched on structurally. Forgetting to dealias will lead to bugs with
type aliases.
Proof backends
The different proof backends (rustspec_to_fstar.rs
, etc) all enjoy a similar
structure that ought to be refactored to showcase their commonality. The backends
don't use an intermediate AST to generate the code in the proof assistant but
rather directly print code as string using the pretty
pretty-printing library. If you want to start a new proof backend, the easiest
solution is probably to copy an existing proof backend and tweak it until
you get the right result.
The code generation has to be fine-tuned to interface with a replica of the hacspec standard library in the host proof assistant, whose correspondence with the original hacspec library in Rust is part of the trusted code base. More specially, clever solutions to encode sequences and array, as well as all the different types of public and secret machine integers, and the interaction between the two (seeing a double as a string of bytes) have to be implemented through proof assistant libraries.
Unit tests
The compiler has various unit tests that are controlled trough the language/tests
files. Please enrich the unit tests bases in language-tests
,
negative-language-tests
and test-crate
as you implement new features for
the compiler. The compiler can also be tested against all the hacspec cryptographic
specifications by running examples/typecheck_examples.sh
from the root of
the repository.
Contributors
A list of contributors to the hacspec project.
- Franziskus Kiefer (franziskuskiefer)
- Denis Merigoux (denismerigoux)
- Karthik Bhargavan
- Tomer Yavor (TomerHawk)
- Tanmay Garg (tanmay2004)
- Peter Schwabe (cryptojedi)
- Kaspar Schleiser (kaspar030)
- Kasserne
- Tony Arcieri (tarcieri)
If you feel you're missing from this list, feel free to add yourself in a PR.