Our cute logo!

Introduction

This book contains our training material, divided up into individual slide decks. Each deck is a lesson. Those lessons combine to form a module, which is taught during a series of sessions within a training. See the glossary for more details.

This is the book version of our material. You can also see the lessons in slide form at https://rust-training.ferrous-systems.com/latest/slides.

We have a standard grouping of lessons into modules, but this can be customised according to customer needs. The modules have dependencies - that is, pre-requisite knowledge required to get the most out of a particular module. The dependencies are shown in the following graphic.

Most of our modules are available now (shown in green), but some are still in development and will be available in the future (shown in grey). We also have stand-alone courses (shown in blue).

Ferrous Systems' Rust Training Modules
%3AdvancedAdvanced RustAppliedApplied RustApplied->AdvancedBareMetalBare-Metal RustApplied->BareMetalNoStdNo-Std RustApplied->NoStdWasmRust and WebAssemblyApplied->WasmAsyncAsync RustApplied->AsyncEmbassyUsing EmbassyBareMetal->EmbassyFerroceneUsing FerroceneFundamentalsRust FundamentalsFundamentals->AppliedNoStd->FerroceneWhyFerroceneWhy Ferrocene?WhyRustWhy Rust?Async->Embassy

  • Why Rust?: A (stand-alone) half-day tour of Rust for decision-makers, technical leads and managers.
  • Why Ferrocene?: A (stand-alone) 60 minute introduction to Ferrocene.
  • Rust Fundamentals: Covers the basics - types, writing functions, using iterators.
  • Applied Rust: Using Rust with Windows, macOS or Linux.
  • Advanced Rust: Deep-dives into specific topics.
  • No-Std Rust: Rust without the Standard Library.
  • Bare-Metal Rust: Rust on a microcontroller.
  • Async Rust: Futures, Polling, Tokio, and all that jazz.
  • Rust and WebAssembly: Using Rust to build WASM binaries, run in a sandbox or in an HTML page
  • Ferrocene: Working with our qualified toolchain.
  • Using Embassy: Async-Rust on a microcontroller.

Glossary

These are some of the terms we will be using throughout our training.

TermDefinition
Training half-day4 hour block of training
Training day8 hour block of training (only for non-remote trainings)
LessonOne set of slides on a particular topic
SessionA block of content between breaks
ExercisesMini Rust projects to be completed during the training
ModuleBlock of consecutive sessions on a fixed set of subject(s), can have different lengths
TrainingConsists of different modules over a series of days or half-days
Wash-upLast 15 minutes of a training day or half-day for recap, open questions, outlook for next day
OpeningFirst 15 minutes of a training day or half-day, with an ice-breaker and recaps, day's plan
QuizzesMini-tests of the training material
Ice BreakersBrief warm-up activities to get the training started, usually short Questions
Training MaterialThese training materials

Overview


fn main() {
    let random_number = generate_random_number();
    let mut my_choice = 10;
    my_choice += random_number;
    println!("{my_choice}");
}

fn generate_random_number() -> i32 {
    4 // chosen by dice roll, guaranteed to be random
}

What is Rust?

Rust is an empathic systems programming language that is determined to not let you shoot yourself in the foot.

A Little Bit of History

  • Rust began around 2006
  • An experimental project by Graydon Hoare
  • Adopted by Mozilla
  • Presented to the general public as version 0.4 in 2012
  • Looked a bit Go-like back then

Focus

  • Rust lost many features leading up to 1.0:
    • Garbage collector
    • evented runtime
    • complex error handling
    • ~T syntax
  • Orientation towards a usable systems programming language

Development

  • Always together with a larger project (e.g. Servo)
  • Early adoption of regular releases, deprecations and an RFC process

Release Method

  • Nightly releases
  • experimental features are only present on nightly releases
  • Every 6 weeks, the current nightly is promoted to beta
  • After 6 weeks of testing, beta becomes stable
  • Guaranteed backwards-compatibility
  • Makes small iterations easier

Note:

  • Cargo's "stabilization" section https://doc.crates.io/contrib/process/unstable.html#stabilization
  • Crater tool
  • Editions

Goals

  • Explicit over implicit
  • Predictable runtime behaviour
  • Supporting stable software development for programming at large
  • Pragmatism and easy integration
  • Approachable project

Many examples in this course are very small, which is why we will also spend time discussing the impact of many features on large projects.

The Three Words

  • Safety
  • Performance
  • Productivity

Safety

  • Rust is memory-safe and thread-safe
    • Buffer overflows, use-after-free, double free: all impossible
    • Unless you tell the compiler you know what you're doing
  • De-allocation is automated
    • Great for files, mutexes, sockets, etc

Performance

  • These properties are guaranteed at compile time and have no runtime cost!
  • Optimizing compiler based on LLVM
  • Features with runtime cost are explicit and hard to activate "by accident"
  • Zero-cost abstractions
  • Use threads with confidence

Productive

  • User-focused tooling
  • Comes with a build-system, dependency manager, formatter, etc
  • Compiler gives helpful error messages
  • FFI support to interface with existing systems

Where do Rustaceans come from?

From diverse backgrounds:

  • Dynamic languages (JS, Rubyists and Pythonistas)
  • Functional languages like Haskell and Scala
  • C/C++
  • Safety critical systems

Installation

Rustup

Rustup installs and manages Rust compiler toolchains

https://rust-lang.org/tools/install

It is not the Rust compiler!

Important commands

 # Installation of a toolchain (here: the stable release channel)
rustup install stable

 # Selection of a default toolchain
rustup default stable

 # Display documentation in browser
rustup doc [--std]

 # Override the default toolchain in your directory
rustup override set stable

 # List supported targets
rustup target list

 # Add and install a target to the toolchain (here: to cross-compile for an ARMv6-M target)
rustup target add thumbv6m-none-eabi

For up-to-date information, please see Rust Component History

Contents of the toolchain
%3cargocargorustcrustccargo->rustcrustdocrustdoccargo->rustdocrustfmtrustfmtcargo->rustfmtclippyclippycargo->clippylibstdlibstdrustc->libstdlibcorelibcorerustc->libcorelibstd->libcore

Hello, world! with Cargo

$ cargo new hello-world
$ cd hello-world
$ cat src/main.rs
    fn main() {
        println!("Hello, world!");
    }
$ cargo build
    Compiling hello-world v0.1.0 (file:///Users/skade/Code/rust/scratchpad/hello-world)
    Finished debug [unoptimized + debuginfo] target(s) in 0.35 secs
$ cargo run
    Finished debug [unoptimized + debuginfo] target(s) in 0.0 secs
    Running `target/debug/hello-world`
Hello, world!

A Little Look Around

  • What is in Cargo.toml?
  • What is in Cargo.lock?

For details, check the Cargo Manifest docs.

IDEs

Basic Types

Integers

Rust comes with all standard int types, with and without sign

  • i8, u8
  • i16, u16
  • i32, u32
  • i64, u64
  • i128, u128

Kinds of variable

#![allow(unused)]
fn main() {
static X: i32 = 42;
const Y: i32 = 42;

fn some_function() {
    let x = 42;
    let x: i32 = 42;
    let mut x = 42;
    let mut x: i32 = 42;
}
}

Note:

The expression used to initialise a static or const must be evaluatable at compile time. This includes calling const fn functions. A let binding doesn't have this restriction.

The static occupies some memory at run-time and get a symbol in the symbol table. The const does not, and is only used to initialise other values (or e.g. as an argument to a function) - it acts a bit like a C pre-processor macro.

Syntactic clarity in specifying numbers

#![allow(unused)]
fn main() {
let x = 123_456;   // underscore as separator
let x = 0x12;      // prefix 0x to indicate hex value
let x = 0o23;      // prefix 0o to indicate octal value
let x = 0b0001;    // prefix 0b to indicate binary value
let x = b'a';      // A single u8
}

Architecture-dependent Numbers

Rust comes with two architecture-dependent number types:

  • isize, usize

Casts

Casts between number are possible, also shortening casts:

fn main() {
    let foo = 3_i64;
    let bar = foo as i32;
}

If the size isn’t given, or cannot be inferred, ints default to i32.

Overflows

Overflows trigger a trap in Debug mode, but not in release mode. This behaviour can be configured.

Floats

Rust also comes with floats of all standard sizes: f32, f64

fn main() {
    let float: f64 = 1.0;
}

Boolean

Boolean in Rust is represented by either of two values: true or false

Character

char is a Unicode Scalar Value being represented as a "single character"

  • A literal in single quotes: 'r'
  • Four (4) bytes in size
  • More than just ASCII: glyphs, emoji, accented characters, etc.

Character Literals

fn main() {
    // U+0072 LATIN SMALL LETTER R
    let ascii_char = 'r';
    // U+03BC GREEK SMALL LETTER MU
    let special_char = 'ΞΌ';
    // U+0154 LATIN CAPITAL LETTER R WITH ACUTE
    let accented_char = 'Ε”';
    // U+1F60E SMILING FACE WITH SUNGLASSES
    let emoji_char = '😎';
}

Character Literals

fn main() {
    // U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F467
    let seven_chars_emoji = 'πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘§'; // Error: char must be one codepoint long
}

Arrays

  • Arrays have multiple elements of the same type.
  • They are of fixed size (it's part of the type).
fn main() {
    let arr: [i32; 4] = [1, 2, 3, 4];
}

Slices

  • Slices are like arrays, but with a run-time specified size.
  • Slices carry a pointer to some other array, and a length.
  • Slices cannot be resized but can be subsliced.
fn main() {
    let slice: &[i32] = &[1, 2, 3, 4];
    let sub: &[i32] = &slice[0..1];
}

Note:

  • Use .get() method on the slice to avoid panics instead of accessing via index.
  • The range syntax include the first value but excludes the last value. Use 0..=1 to include both ends.

String Slices

  • Strings Slices (&str) are a special kind of &[u8]
  • They are guaranteed to be a valid UTF-8 encoded Unicode string
  • It is undefined behaviour to create one that isn't valid UTF-8
  • Slicing must be done on character boundaries
fn main() {
    let hello_world: &str = "Hello πŸ˜€";
    println!("Start = {}", &hello_world[0..5]);
    // println!("End = {}", &hello_world[7..]);
}

Note:

Use std::str::from_utf8 to make an &str from a &[u8] Let trainees know that Strings are covered over many slides in the training and that an Advanced Strings slides exist for completeness' sake

Control Flow

Control Flow primitives

  • if expressions
  • loop and while loops
  • match expressions
  • for loops
  • break and continue
  • return and ?

Using if as a statement

  • Tests if a boolean expression is true
  • Parentheses around the conditional are not necessary
  • Blocks need brackets, no shorthand
fn main() {
    if 1 == 2 {
        println!("integers are broken");
    } else if 'a' == 'b' {
        println!("characters are broken");
    } else {
        println!("that's what I thought");
    }
}

Using if as an expression

  • Every block is an expression
  • Note the final ; to terminate the let statement.
fn main() {
    let x = if 1 == 2 {
        100
    } else if 'a' == 'b' {
        200
    } else {
        300
    };
}

Using if as the final expression

Now the if expression is the result of the function:

#![allow(unused)]
fn main() {
fn some_function() -> i32 {
    if 1 == 2 {
        100
    } else if 'a' == 'b' {
        200
    } else {
        300
    }
}
}

Looping with loop

loop is used for (potentially) infinite loops

fn main() {
    let mut i = 0;
    loop {
        i += 1;
        if i > 100 { break; }
    }
}

Looping with loop

loop blocks are also expressions...

fn main() {
    let mut i = 0;
    let loop_result = loop {
        i += 1;
        if i > 10 { break 6; }
        println!("i = {}", i);
    };
    println!("loop_result = {}", loop_result);
}

while

  • while is used for conditional loops.
  • Loops while the boolean expression is true
fn main() {
    let mut i = 0;
    while i < 10 {
        i += 1;
        println!("i = {}", i);
    }
}

Control Flow with match

  • The match keyword does pattern matching
  • You can use it a bit like an if/else if/else expression
  • The first arm to match, wins
  • _ means match anything
    fn main() {
        let a = 4;
        match a % 3 {
            0 => { println!("divisible by 3") }
            _ => { println!("not divisible by 3") }
        }
    }

for loops

  • for is used for iteration
  • Here 0..10 creates a Range, which you can iterate
fn main() {
    for num in 0..10 {
        println!("{}", num);
    }
}

for loops

Lots of things are iterable

fn main() {
    for ch in "Hello".chars() {
        println!("{}", ch);
    }
}

for under the hood

  • What Rust actually does is more like...
  • (More on this in the section on Iterators)
fn main() {
    let mut iter = "Hello".chars().into_iter();
    loop {
        match iter.next() {
            Some(ch) => println!("{}", ch),
            None => break,
        }
    }
}

Break labels

If you have nested loops, you can label them to indicate which one you want to break out of.

fn main() {
    'cols: for x in 0..5 {
        'rows: for y in 0..5 {
            println!("x = {}, y = {}", x, y);
            if x + y >= 6 {
                break 'cols;
            }
        }
    }
}

Continue

Means go around the loop again, rather than break out of the loop

fn main() {
    'cols: for x in 0..5 {
        'rows: for y in 0..5 {
            println!("x = {}, y = {}", x, y);
            if x + y >= 4 {
                continue 'cols;
            }
        }
    }
}

return

  • return can be used for early returns
  • The result of the last expression of a function is always returned
#![allow(unused)]
fn main() {
fn get_number(x: bool) -> i32 {
    if x {
        return 42;
    }
    -1
}
}

Compound Types

Structs

A struct groups and names data of different types.

Definition

#![allow(unused)]
fn main() {
struct Point {
    x: i32,
    y: i32,
}
}

Note:

The fields may not be laid out in memory in the order they are written (unless you ask the compiler to ensure that they are).

Construction

  • there is no partial initialization
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let p = Point { x: 1, y: 2 };
}

Construction

  • but you can copy from an existing variable of the same type
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let p = Point { x: 1, y: 2 };
    let q = Point { x: 4, ..p };
}

Field Access

struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let p = Point { x: 1, y: 2 };
    println!("{}", p.x);
    println!("{}", p.y);
}

Tuples

  • Holds values of different types together.
  • Like an anonymous struct, with fields numbered 0, 1, etc.
fn main() {
    let p = (1, 2);
    println!("{}", p.0);
    println!("{}", p.1);
}

()

  • the empty tuple
  • represents the absence of data
  • we often use this similarly to how you’d use void in C
#![allow(unused)]
fn main() {
fn prints_but_returns_nothing(data: &str) -> () {
    println!("passed string: {}", data);
}
}

Tuple Structs

  • Like a struct, with fields numbered 0, 1, etc.
struct Point(i32,i32);

fn main() {
    let p = Point(1, 2);
    println!("{}", p.0);
    println!("{}", p.1);
}

Enums

  • An enum represents different variations of the same subject.
  • The different choices in an enum are called variants

enum: Definition and Construction

enum Shape {
    Square,
    Circle,
    Rectangle,
    Triangle,
}

fn main() {
    let shape = Shape::Rectangle;
}

Enums with Values

enum Movement {
    Right(i32),
    Left(i32),
    Up(i32),
    Down { speed: i32, excitement: u8 },
}

fn main() {
    let movement = Movement::Left(12);
    let movement = Movement::Down { speed: 12, excitement: 5 };
}

Enums with Values

  • An enum value is the same size, no matter which variant is picked
  • It will be the size of the largest variant (plus a tag)

Note:

The tag in an enum specifies which variant is currently valid, and is stored as the smallest integer the compiler can get away with - it depends how many variants you have. Of course, if none of the variants have any data, the enum is just the tag.

If you have a C background, you can think of this as being a struct containing an int and a union.

Doing a match on an enum

  • When an enum has variants, you use match to extract the data
  • New variables are created from the pattern (e.g. radius)
#![allow(unused)]
fn main() {
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
}

fn check_shape(shape: Shape) {
    match shape {
        Shape::Circle(radius) => {
            println!("It's a circle, with radius {}", radius);
        }
        _ => {
            println!("Try a circle instead");
        }
    }
}
}

Doing a match on an enum

  • There are two variables called radius
  • The binding of radius in the pattern on line 9 hides the radius variable on line 7
#![allow(unused)]
fn main() {
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
}

fn check_shape(shape: Shape) {
    let radius = 10;
    match shape {
        Shape::Circle(radius) => {
            println!("It's a circle, with radius {}", radius);
        }
        _ => {
            println!("Try a circle instead");
        }
    }
}
}

Match guards

Match guards allow further refining of a match

#![allow(unused)]
fn main() {
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
}

fn check_shape(shape: Shape) {
    match shape {
        Shape::Circle(radius) if radius > 10 => {
            println!("It's a BIG circle, with radius {}", radius);
        }
        _ => {
            println!("Try a big circle instead");
        }
    }
}
}

Combining patterns

  • You can use the | operator to join patterns together
#![allow(unused)]
fn main() {
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
    Square(i32),
}

fn test_shape(shape: Shape) {
    match shape {
        Shape::Circle(size) | Shape::Square(size) => {
            println!("Shape has single size field {}", size);
        }
        _ => {
            println!("Not a circle, nor a square");
        }
    }
}
}

Shorthand: if let conditionals

  • You can use if let if only one case is of interest.
  • Still pattern matching
#![allow(unused)]
fn main() {
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
}

fn test_shape(shape: Shape) {
    if let Shape::Circle(radius) = shape {
        println!("Shape is a Circle with radius {}", radius);
    }
}
}

Shorthand: let else conditionals

  • If you expect it to match, but want to handle the error...
  • The else block must diverge
#![allow(unused)]
fn main() {
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
}

fn test_shape(shape: Shape) {
    let Shape::Circle(radius) = shape else {
        println!("I only like circles");
        return;
    };
    println!("Shape is a Circle with radius {}", radius);
}
}

Shorthand: while let conditionals

  • Keep looping whilst the pattern still matches
enum Shape {
    Circle(i32),
    Rectangle(i32, i32),
}

fn main() {
    while let Shape::Circle(radius) = make_shape() {
        println!("got circle, radius {}", radius);
    }
}

fn make_shape() -> Shape {
    todo!()
}

Foreshadowing! πŸ‘»

Two very important enums

#![allow(unused)]
fn main() {
enum Option<T> {
    Some(T),
    None,
}

enum Result<T, E> {
    Ok(T),
    Err(E)
}
}

We'll come back to them after we learn about error handling.

Ownership and Borrowing

Ownership

Ownership is the basis for the memory management of Rust.

Rules

  • Every value has exactly one owner
  • Ownership can be passed on, both to functions and other types
  • The owner is responsible for removing the data from memory
  • The owner always has full control over the data and can mutate it

These Rules are

  • fundamental to Rust’s type system
  • enforced at compile time
  • important for optimizations

Example

fn main() {
    let s = String::from("Hello πŸ˜€");
    print_string(s);
    // s cannot be used any more - you gave it away
}

fn print_string(s: String) {
    println!("The string is {s}")
}

Note:

The statement let s = ...; introduces a variable binding called s and gives it a value which is of type String. This distinction is important when it comes to transferring ownership.

The function String::from is an associated function called from on the String type.

The println! call is a macro, which is how we are able to do to Python-style {} string interpolation.

Does this compile?

fn main() {
    let s = String::from("Hello πŸ˜€");
    print_string(s);
    print_string(s);
}

fn print_string(s: String) {
    println!("The string is {s}")
}

It does not!

error[E0382]: use of moved value: `s`
 --> src/main.rs:4:18
  |
2 |     let s = String::from("Hello πŸ˜€");
  |         - move occurs because `s` has type `String`, which does not implement the `Copy` trait
3 |     print_string(s);
  |                  - value moved here
4 |     print_string(s);
  |                  ^ value used here after move
  |
note: consider changing this parameter type in function `print_string` to borrow instead if owning the value isn't necessary
 --> src/main.rs:7:20
  |
7 | fn print_string(s: String) {
  |    ------------    ^^^^^^ this parameter takes ownership of the value
  |    |
  |    in this function
help: consider cloning the value if the performance cost is acceptable
  |
3 |     print_string(s.clone());
  |                   ++++++++
For more information about this error, try `rustc --explain E0382`.

Background

  • When calling print_string with s, the value in s is transferred into the arguments of print_string.
  • At that moment, ownership passes to print_string. We say the function consumed the value.
  • The variable binding s ceases to exist, and thus main is not allowed to access it any more.

Mutability

  • The variable binding can be immutable (the default) or mutable.
  • If you own it, you can rebind it and change this.
fn main() {
    let x = 6;
    // x += 1; ❌
    let mut x = x;
    x += 1; // βœ…
}

Borrowing

  • Transferring ownership back and forth would get tiresome.
  • We can let other functions borrow the values we own.
  • The outcome of a borrow is a reference
  • There are two kinds of reference - Shared/Immutable and Exclusive/Mutable

Shared References

  • Also called an immutable reference.
  • Use the & operator to borrow (i.e. to make a reference).
  • It's like a C pointer but with special compile-time checks.
  • Rust also allows type-conversion functions to be called when you take a reference.

Note:

C pointers are convertible to/from integers. Rust references are not, and Rust pointers may or may not be, depending on what they point at.

Making a Reference

fn main() {
    let s = String::from("Hello πŸ˜€");
    // A reference to a String
    let _string_ref: &String = &s;
    // The special string-slice type (could also be a reference
    // to a string literal)
    let _string_slice: &str = &s;
}

Note:

The _ prefix just stops a warning about us not using the variable.

Taking a Reference

  • We can also say a function takes a reference
  • We use a type like &SomeType:
#![allow(unused)]
fn main() {
fn print_string(s: &String) {
    println!("The string is {s}")
}
}

Full Example

fn main() {
    let s = String::from("Hello πŸ˜€");
    print_string(&s);
    print_string(&s);
}

fn print_string(s: &String) {
    println!("The string is {s}")
}

Exclusive References

  • Also called a mutable reference
  • Use the &mut operator to borrow (i.e. to make a reference)
  • Even stricter rules than the & references
  • Only a mutable binding can make a mutable reference

Exclusive Reference Rules

  • Must be only one exclusive reference to an object at any one time
  • Cannot have shared and exclusive references alive at the same time
  • => the compiler knows an &mut reference cannot alias anything

Rust forbids shared mutability

Making an Exclusive Reference

fn main() {
    let mut s = String::from("Hello πŸ˜€");
    let s_ref = &mut s;
}

Note:

The binding for s now has to be mutable, otherwise we can't take a mutable reference to it.

Taking an Exclusive Reference

  • We can also say a function takes an exclusive reference
  • We use a type like &mut SomeType:
#![allow(unused)]
fn main() {
fn add_excitement(s: &mut String) {
    s.push_str("!");
}
}

Full Example

fn main() {
    let mut s = String::from("Hello πŸ˜€");
    add_excitement(&mut s);
    println!("The string is {s}");
}

fn add_excitement(s: &mut String) {
    s.push_str("!");
}

Note:

Try adding more excitement by calling add_excitement multiple times.

A Summary

BorrowedMutably BorrowedOwned
Type T&T&mut TT
Type i32&i32&mut i32i32
Type String&String or &str&mut StringString
  • Mutably Borrowing gives more permissions than Borrowing
  • Owning gives more permissions than Mutably Borrowing

Note:

Why are there two types of Borrowed string types (&String and &str)? The first is a reference to a struct (std::string::String, specifically), and the latter is a built-in slice type which points at some bytes in memory which are valid UTF-8 encoded characters.

An aside: Method Calls

  • Rust supports Method Calls
  • The first argument of the method is either self, &self or &mut self
  • They are converted to function calls by the compiler
fn main() {
    let mut s = String::from("Hello πŸ˜€");
    // This method call...
    s.push_str("!!");
    // is the same as...
    // String::push_str(&mut s, "!!");
    println!("The string is {s}");
}

Note:

We use Type::function() for associated functions, and variable.method() for method calls, which are just Type::method(&variable) or Type::method(&mut variable), or Type::method(variable), depending on how the method was declared).

Avoiding Borrowing

If you want to give a function their own object, and keeps yours separate, you have two choices:

  • Clone
  • Copy

Clone

Some types have a .clone() method.

It makes a new object, which looks just like the original object.

fn main() {
    let s = String::from("Hello πŸ˜€");
    let mut s_clone = s.clone();
    s_clone.push_str("!!");
    println!("s = {s}");
    println!("s_clone = {s_clone}");
}

Making things Cloneable

You can mark your struct or enum with #[derive(Clone)]

(But only if every value in your struct/enum itself is Clone)

#[derive(Clone)]
struct Square {
    width: i32
}

fn main() {
    let sq = Square { width: 10 };
    let sq2 = sq.clone();
}

Copy

  • Some types, like integers and floats, are Copy
  • Compiler copies these objects automatically
  • If cloning is very cheap, you could make your type Copy
fn main() {
    let x = 6;
    do_stuff(x);
    do_stuff(x);
}

fn do_stuff(x: i32) {
    println!("Do I own x, with value {x}?");
}

Note:

If your type represents ownership of something, like a File, or a DatabaseRecord, you probably don't want to make it Copy!

Cleaning up

A value is cleaned up when its owner goes out of scope.

We call this dropping the value.

Custom Cleaning

You can define a specific behaviour to happen on drop using the Drop trait (cf. std::ops::Drop).

For example, the memory used by a String is freed when dropped:

fn main() {
    // String created here (some memory is allocated on the heap)
    let s = String::from("Hello πŸ˜€");
} // String `s` is dropped here and heap memory is freed

More drop implementations:

  • MutexGuard unlocks the appropriate Mutex when dropped
  • File closes the file handle when dropped
  • TcpStream closes the connection when dropped
  • Thread detaches the thread when dropped
  • etc...

Error Handling

There are no exceptions

Rust has two ways of indicating errors:

  • Returning a value
  • Panicking

Returning a value

fn parse_header(data: &str) -> bool {
    if !data.starts_with("HEADER: ") {
        return false;
    }

    true
}

It would be nice if we could return data as well as ok, or error...

Foretold enums strike back! 🀯

Remember these? They are very important in Rust.

#![allow(unused)]
fn main() {
enum Option<T> {
    Some(T),
    None,
}

enum Result<T, E> {
    Ok(T),
    Err(E)
}
}

I can't find it

If you have an function where one outcome is "can't find it", we use Option:

#![allow(unused)]
fn main() {
fn parse_header(data: &str) -> Option<&str> {
    if !data.starts_with("HEADER: ") {
        return None;
    }
    Some(&data[8..])
}
}

Note:

It's so important, it is special-cased within the compiler so you can say None instead of Option::None, as you would with any other enum.

That's gone a bit wrong

When the result of a function is either Ok, or some Error value, we use Result:

#![allow(unused)]
fn main() {
enum MyError {
    BadHeader
}

// Need to describe both the Ok type and the Err type here:
fn parse_header(data: &str) -> Result<&str, MyError> {
    if !data.starts_with("HEADER: ") {
        return Err(MyError::BadHeader);
    }
    Ok(&data[8..])
}
}

Note:

It's so important, it is special-cased within the compiler so you can say Ok and Err instead of Result::Ok and Result::Err, as you would with any other enum.

Handling Results by hand

You can handle Result like any other enum:

#![allow(unused)]
fn main() {
use std::io::prelude::*;

fn read_file(filename: &str) -> Result<String, std::io::Error> {
    let mut file = match std::fs::File::open("data.txt") {
        Ok(f) => f,
        Err(e) => {
            return Err(e);
        }
    };
    let mut contents = String::new();
    if let Err(e) = file.read_to_string(&mut contents) {
        return Err(e);
    }
    Ok(contents)
}
}

Handling Results with ?

It is idiomatic Rust to use ? to handle errors.

#![allow(unused)]
fn main() {
use std::io::prelude::*;

fn read_file(filename: &str) -> Result<String, std::io::Error> {
    let mut file = std::fs::File::open("data.txt")?;
    let mut contents = String::new();
    file.read_to_string(&mut contents)?;
    Ok(contents)
}
}

Note:

This was added in Rust 1.39.

The ? operator will evaluate to the Ok value if the Result is Ok, and it will cause an early return with the error value if it is Err. It will also call .into() to perform a type conversion if necessary (and if possible).

What kind of Error?

You can put anything in for the E in Result<T, E>:

#![allow(unused)]
fn main() {
fn literals() -> Result<(), &'static str> {
    Err("oh no")
}

fn strings() -> Result<(), String> {
    Err(String::from("oh no"))
}

fn enums() -> Result<(), Error> {
    Err(Error::BadThing)
}

enum Error { BadThing, OtherThing }
}

Using String Literals as the Err Type

Setting E to be &'static str lets you use "String literals"

  • It's cheap
  • It's expressive
  • But you can't change the text to include some specific value
  • And your program can't tell what kind of error it was

Using Strings as the Err Type

Setting E to be String lets you make up text at run-time:

  • It's expressive
  • You can render some values into the String
  • But it costs you a heap allocation to store the bytes for the String
  • And your program still can't tell what kind of error it was

Using enums as the Err Type

An enum is ideal to express one of a number of different kinds of thing:

#![allow(unused)]
fn main() {
/// Represents the ways this module can fail
enum Error {
    /// An error came from the underlying transport
    Io,
    /// During an arithmetic operation a result was produced that could not be stored
    NumericOverflow,
    /// etc
    DiskFull,
    /// etc
    NetworkTimeout,
}
}

Enum errors with extra context

An enum can also hold data for each variant:

#![allow(unused)]
fn main() {
/// Represents the ways this module can fail
enum Error {
    /// An error came from the underlying transport
    Io(std::io::Error),
    /// During an arithmetic operation a result was produced that could not
    /// be stored
    NumericOverflow,
    /// Ran out of disk space
    DiskFull,
    /// Remote system did not respond in time
    NetworkTimeout(std::time::Duration),
}
}

The std::error::Error trait

Helper Crates

So, people created helper crates like thiserror

use thiserror::Error;

#[derive(Error, Debug)]
pub enum DataStoreError {
    #[error("data store disconnected")]
    Disconnect(#[from] io::Error),
    #[error("the data for key `{0}` is not available")]
    Redaction(String),
    #[error("invalid header (expected {expected:?}, found {found:?})")]
    InvalidHeader { expected: String, found: String },
    #[error("unknown data store error")]
    Unknown,
}

Something universal

Exhaustively listing all the ways your dependencies can fail is hard.

One solution:

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let _f = std::fs::File::open("hello.txt")?; // IO Error
    let _s = std::str::from_utf8(&[0xFF, 0x65])?; // Unicode conversion error
    Ok(())
}

Anyhow

The anyhow crate gives you a nicer type:

fn main() -> Result<(), anyhow::Error> {
    let _f = std::fs::File::open("hello.txt")?; // IO Error
    let _s = std::str::from_utf8(&[0xFF, 0x65])?; // Unicode conversion error
    Ok(())
}

Note:

  • Use anyhow if you do not care what error type your function returns, just that it captures something.
  • Use thiserror if you must design your own error types but want easy Error trait impl.

Panicking

The other way to handle errors is to generate a controlled, program-ending, failure.

  • You can panic!("x too large ({})", x);
  • You can call an API that panics on error (like indexing, e.g. s[99])
  • You can convert a Result::Err into a panic with .unwrap() or .expect("Oh no")

Collections

Using Arrays

Arrays ([T; N]) have a fixed size.

fn main() {
    let array = [1, 2, 3, 4, 5];
    println!("array = {:?}", array);
}

%3array:array:array12345

Building the array at runtime.

How do you know how many 'slots' you've used?

fn main() {
    let mut array = [0u8; 10];
    for idx in 0..5 {
        array[idx] = idx as u8;
    }
    println!("array = {:?}", array);
}

%3array:array:array0123400000

Slices

A view into some other data. Written as &[T] (or &mut [T]).

fn main() {
    let mut array = [0u8; 10];
    for idx in 0..5 {
        array[idx] = idx as u8;
    }
    let data = &array[0..5];
    println!("data = {:?}", data);
}

%3data:data:array:array:data:->array:array0123400000dataptrlen = 5data:p0->array:f0

Note: Slices are unsized types and can only be access via a reference. This reference is a 'fat reference' because instead of just containing a pointer to the start of the data, it also contains a length value.

Vectors

Vec is a growable, heap-allocated, array-like type.

fn process_data(input: &[u32]) {
    let mut vector = Vec::new();
    for value in input {
        vector.push(value * 2);
    }
    println!("vector = {:?}, first = {}", vector, vector[0]);
}

fn main() { process_data(&[1, 2, 3]); }

%3vector:vector:_inner2460vectorptrlen = 3cap = 4vector:p0->_inner:f0

Note:

The green block of data is heap allocated.

There's a macro short-cut too...

fn main() {
    let mut vector = vec![1, 2, 3, 4];
}

Check out the docs!

Features of Vec

  • Growable (will re-allocate if needed)
  • Can borrow it as a &[T] slice
  • Can access any element (vector[i]) quickly
  • Can push/pop from the back easily

Downsides of Vec

  • Not great for insertion
  • Everything must be of the same type
  • Indices are always usize

String Slices

The basic string types in Rust are all UTF-8.

A String Slice (&str) is an immutable view on to some valid UTF-8 bytes

fn main() {
    let bytes = [0xC2, 0xA3, 0x39, 0x39, 0x21];
    let s = std::str::from_utf8(&bytes).unwrap();
    println!("{}", s);
}

%3s:s:bytes:bytes:s:->bytes:bytes0xC20xA30x390x390x21sptrlen = 5s:p0->bytes:f0

Note:

A string slice is tied to the lifetime of the data that it refers to.

String Literals

  • String Literals produce a string slice "with static lifetime"
  • Points at some bytes that live in read-only memory with your code
fn main() {
    let s: &'static str = "Hello!";
    println!("s = {}", s);
}

%3s:s:bytes0x480x650x6c0x6c0x6f0x21sptrlen = 5s:p0->bytes:f0

Note:

The lifetime annotation of 'static just means the string slice lives forever and never gets destroyed. We wrote out the type in full so you can see it - you can emit it on variable declarations.

There's a second string literal in this program. Can you spot it?

(It's the format string in the call to println!)

Strings (docs)

  • A growable collection of char
  • Actually stored as a Vec<u8>, with UTF-8 encoding
  • You cannot access characters by index (only bytes)
    • But you never really want to anyway
fn main() {
    let string = String::from("Hello!");
}

%3string:string:_inner0x480x650x6c0x6c0x6f0x21stringptrlen = 6cap = 6string:p0->_inner:f0

Note:

The green block of data is heap allocated.

Making a String

fn main() {
    let s1 = "String literal up-conversion".to_string();
    let s2: String = "Into also works".into();
    let s3 = String::from("Or using from");
    let s4 = format!("String s1 is {:?}", s1);
    let s5 = String::new(); // empty
}

Appending to a String

fn main() {
    let mut start = "Mary had a ".to_string();
    start.push_str("little");
    let rhyme = start + " lamb";
    println!("rhyme = {}", rhyme);
    // println!("start = {}", start);
}

Joining pieces of String

fn main() {
    let pieces = ["Mary", "had", "a", "little", "lamb"];
    let rhyme = pieces.join(" ");
    println!("Rhyme = {}", rhyme);
}

VecDeque (docs)

A ring-buffer, also known as a Double-Ended Queue:

use std::collections::VecDeque;
fn main() {
    let mut queue = VecDeque::new();
    queue.push_back(1);
    queue.push_back(2);
    queue.push_back(3);
    println!("first: {:?}", queue.pop_front());
    println!("second: {:?}", queue.pop_front());
    println!("third: {:?}", queue.pop_front());
}

Features of VecDeque

  • Growable (will re-allocate if needed)
  • Can access any element (queue[i]) quickly
  • Can push/pop from the front or back easily

Downsides of VecDeque

  • Cannot borrow it as a single &[T] slice without moving items around
  • Not great for insertion in the middle
  • Everything must be of the same type
  • Indices are always usize

HashMap (docs)

If you want to store Values against Keys, Rust has HashMap<K, V>.

Note that the keys must be all the same type, and the values must be all the same type.

use std::collections::HashMap;
fn main() {
    let mut map = HashMap::new();
    map.insert("Triangle", 3);
    map.insert("Square", 4);
    println!("Triangles have {:?} sides", map.get("Triangle"));
    println!("Triangles have {:?} sides", map["Triangle"]);
    println!("map {:?}", map);
}

Note: The index operation will panic if the key is not found, just like with slices and arrays if the index is out of bounds. Get returns an Option.

If you run it a few times, the result will change because it is un-ordered.

The Entry API

What if you want to update an existing value OR add a new value if it's not there yet?

HashMap has the Entry API:

enum Entry<K, V> {
    Occupied(...),
    Vacant(...),
}

fn entry(&mut self, key: K) -> Entry<K, V> {
    ...
}

Entry API Example

use std::collections::HashMap;
 
fn update_connection(map: &mut HashMap<i32, u64>, id: i32) {
    map.entry(id)
        .and_modify(|v| *v = *v + 1)
        .or_insert(1);
}
 
fn main() {
    let mut map = HashMap::new();
    update_connection(&mut map, 100);
    update_connection(&mut map, 200);
    update_connection(&mut map, 100);
    println!("{:?}", map);
}

Features of HashMap

  • Growable (will re-allocate if needed)
  • Can access any element (map[i]) quickly
  • Great at insertion
  • Can choose the Key and Value types independently

Downsides of HashMap

  • Cannot borrow it as a single &[T] slice
  • Everything must be of the same type
  • Unordered

BTreeMap (docs)

Like a HashMap, but kept in-order.

use std::collections::BTreeMap;
fn main() {
    let mut map = BTreeMap::new();
    map.insert("Triangle", 3);
    map.insert("Square", 4);
    println!("Triangles have {:?} sides", map.get("Triangle"));
    println!("Triangles have {:?} sides", map["Triangle"]);
    println!("map {:?}", map);
}

Features of BTreeMap

  • Growable (will re-allocate if needed)
  • Can access any element (map[i]) quickly
  • Great at insertion
  • Can choose the Key and Value types independently
  • Ordered

Downsides of BTreeMap

  • Cannot borrow it as a single &[T] slice
  • Everything must be of the same type
  • Slower than a HashMap

Sets

We also have HashSet and BTreeSet.

Just sets the V type parameter to ()!


TypeOwnsGrowIndexSliceCheap Insert
Arrayβœ…βŒusizeβœ…βŒ
Slice❌❌usizeβœ…βŒ
Vecβœ…βœ…usizeβœ…β†©
String SliceβŒβŒπŸ€”βœ…βŒ
Stringβœ…βœ…πŸ€”βœ…β†©
VecDequeβœ…βœ…usizeπŸ€”β†ͺ / ↩
HashMapβœ…βœ…TβŒβœ…
BTreeMapβœ…βœ…TβŒβœ…

Note:

The πŸ€” for indexing string slices and Strings is because the index is a byte offset and the system will panic if you try and chop a UTF-8 encoded character in half.

The πŸ€” for indexing VecDeque is because you might have to get the contents in two pieces (i.e. as two disjoint slices) due to wrap-around.

Technically you can insert into the middle of a Vec or a String, but we're talking about 'cheap' insertions that don't involve moving too much stuff around.

Iterators

What is Iterating?

iterate (verb): to repeat a process, especially as part of a computer program (Cambridge English Dictionary)

To iterate in Rust is to produce a sequence of items, one at a time.

How do you Iterate?

  • With an Iterator
  • Commonly .into_iter(), .iter_mut() or .iter() on some collection
  • There's also an IntoIterator trait for automatically creating an Iterator

What is an Iterator?

  • An object with a .next() method
    • The method provides Some(data), or None once the data has run out
    • The object holds the iterator's state
  • Some Iterators will take data from a collection (e.g. a Slice)
  • Some Iterators will calculate each item on-the-fly
  • Some Iterators will take data from another iterator, and then calculate something new

Note:

Technically, all iterators calculate things on-the-fly. Some own another iterator and use that as input to their calculation, and some have an internal state that they can use for calculation. fn next(&mut self) -> Self::Item can only access Self so it is about what Self contains.

  • struct SomeIter<T> where T: Iterator { inner: T }
  • struct SomeOtherIter { random_seed: u32 }

Important to note

  • Iterators are lazy
  • Iterators are used all over the Rust Standard Library
  • Iterators have hidden complexity that you can mostly ignore
  • Iterators cannot be invalidated (unlike, say, C++)
  • Some Iterators can wrap other Iterators

Basic usage

  1. You need to make an iterator
  2. You need to pump it in a loop
fn main() {
    let data = vec![1, 2, 3, 4, 5];
    let mut iterator = data.iter();
    loop {
        if let Some(item) = iterator.next() {
            println!("Got {}", item);
        } else {
            break;
        }
    }
}

Basic usage

Same thing, but with while let.

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    let mut iterator = data.iter();
    while let Some(item) = iterator.next() {
        println!("Got {}", item);
    }
}

Basic usage

Same thing, but with for

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    // for <variable> in <iterator>
    for item in data.iter() {
        println!("Got {}", item);
    }
}

Basic usage

Same thing, but we let for call .into_iter() for us.

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    // for <variable> in <implements IntoIterator>
    for item in &data {
        println!("Got {}", item);
    }
}

Three kinds of Iterator
%3Vecstruct Vecstart: *mut Tlen: usizecap: usizefn iter(&self) -> VecIterfn iter_mut(&mut self) -> VecIterMutfn into_iter(self) -> VecIntoIterVecIntoIterstruct VecIntoIterstart: *mut Tlen: usizecap: usizeItem=TVec:ii->VecIntoIterVecIterMutstruct VecIterMutparent: &mut [T]Item=&mut TVec:im->VecIterMutVecIterstruct VecIterparent: &[T]Item=&TVec:i->VecIter

Three kinds of Iterator

  • Borrowed (data.iter())
  • Mutably Borrowed (data.iter_mut())
  • Owned (data.into_iter())

But how did that for-loop work?

If a for loop calls .into_iter() how did we get a borrowed iterator?

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    for item in &data {
        // item is a &i32
        println!("Got {}", item);
    }
}

But how did that for-loop work?

The & is load-bearing...

fn main() {
    let data = vec![1, 2, 3, 4, 5];
    let temp = &data;
    // This is .into_iter() on a `&Vec` not a `Vec`!
    let iter = temp.into_iter();
    for item in iter {
        println!("Got {}", item);
    }
}

Note:

  • IntoIterator is actually dependent on the context. Depending on the context it will produce an iterator with owned elements, with references to elements, with mutable references to elements.
  • e.g. impl<T, A> IntoIterator for Vec<T, A> for owned
  • impl<'a, T, A> IntoIterator for &'a Vec<T, A> for refs
  • impl<'a, T, A> IntoIterator for &'a mut Vec<T, A> for mut refs

Things you can make iterators from

Note:

Technically a Range is an Iterator. Some people consider this to be a mistake. Especially as Range<T> where T: Copy is not itself Copy.

How does this work?

  • Rust has some traits which describe how iterators work.
  • We'll talk more about traits later!

You can still enjoy it without knowing how it works

Useful Iterator methods (1)

These consume the old Iterator and return a new Iterator:

  • skip(N)
  • take(N)
  • cloned()
  • map(func)
  • filter(func_returns_bool)
  • filter_map(func_returns_option)
  • zip(second_iterator)

Note:

  • skip(N) will skip the first N items from the underlying iterator, then just pass every other item through
  • take(N) will take the first N items from the underlying iterator, then just tell you there is nothing left
  • cloned takes an iterator that gives you references, and calls .clone() on each reference to create a new object
  • map(func) will give you a new iterator that fetches an item from the underlying iterator, calls func with it, and gives you the result
  • filter(func) will give you a new iterator that fetches an item from the underlying iterator, calls func with it, and if it's not true, refuses to give it to you and tries the next item instead
  • filter_map(func) is both a filter and a map - the func should return an Option<T> and anything None is filtered out
  • zip will take this iterator, and the given iterator, and produce a new iterator that produces two-tuples ((itemA, itemB))

Useful Iterator methods (2)

These actively fetch every item from the old Iterator and produce a single value:

  • sum()
  • count()
  • collect()
  • max() and min()
  • fold(initial, func)
  • partition(func)

Note:

  • sum will add up every item, assuming they are numeric
  • count will tell you how many items the iterator produced
  • collect will take every item from the iterator and stuff it into a new collection (e.g. a Vec<T>)
  • max and min find the largest/smallest item
  • fold will maintain an accumulator, and call func with each item and the current value of the accumulator
  • partition will create two new collections by taking every item from the iterator and stuffing it into one of two new collections

Call chaining (1)

This style of code is idiomatic in Rust:

/// Sum the squares of the even numbers given
fn process_data(data: &[u32]) -> u32 {
    data.iter()
        .cloned()
        .filter(|n| n % 2 == 0)
        .map(|n| n * n)
        .sum()
}

fn main() {
    let data = [1, 2, 3, 4];
    println!("result = {}", process_data(&data));
}

Note:

  • Point out the type inference where Rust figures out data is an array of u32 and not the default i32s.

Call chaining (2)

What really happened:

/// Sum the squares of the even numbers given
fn process_data(data: &[u32]) -> u32 {
    let ref_iter = data.iter();
    let value_iter = ref_iter.cloned();
    let evens_iter = value_iter.filter(|n| n % 2 == 0);
    let squares_iter = evens_iter.map(|n| n * n);
    squares_iter.sum()
}

fn main() {
    let data = [1, 2, 3, 4];
    println!("result = {}", process_data(&data));
}

Note:

For the more advanced students, this mini quiz is a good one: https://dtolnay.github.io/rust-quiz/26

Imports and Modules

Namespaces

  • A namespace is simply a way to distinguish two things that have the same name.
  • It provides a scope to the identifiers within it.

Rust supports namespacing in two ways:

  1. Crates for re-usable software libraries
  2. Modules for breaking up your crates

Crates

  • A crate is the unit of Rust software suitable for shipping.
  • Yes, it's a deliberate pun.
  • The Rust Standard Library is a crate.
  • Binary Crates and Library Crates

There's no build file

  • Have you noticed that Cargo.toml says nothing about which files to compile?
  • Cargo starts with lib.rs for a library or the relevant main.rs for a binary
  • It then finds all the modules

Modules

  • A module is block of source code within a crate
  • It qualifies the names of everything in it
  • It has a parent module (or it is the crate root)
  • It can have child modules
  • The crate is therefore a tree

Standard Library

We've been using modules from the Rust Standard Library...

use std::fs;
use std::io::prelude::*;

fn main() -> std::io::Result<()> {
    let mut f = fs::File::create("hello.txt")?;
    f.write(b"hello")?;
    Ok(())
}

Note:

Prelude modules, like std::io::prelude, usually contain important traits and you usually want to import all of it with a * wildcard.

In-line modules

You can declare a module in-line:

mod animals {
    pub struct Cat { name: String }

    impl Cat {
        pub fn new(name: &str) -> Cat {
            Cat { name: name.to_owned() }
        }
    }
}

fn main() {
    let c = animals::Cat::new("Mittens");
    // let c = animals::Cat { name: "Mittens".to_string() };
}

Modules in a file

You can also put modules in their own file on disk.

This will load from either ./animals/mod.rs or ./animals.rs:

mod animals;

fn main() {
    let c = animals::Cat::new("Mittens");
    // let c = animals::Cat { name: "Mittens".to_string() };
}

Modules can be nested...

~/probe-run $ tree src
src
β”œβ”€β”€ backtrace
β”‚Β Β  β”œβ”€β”€ mod.rs
β”‚Β Β  β”œβ”€β”€ pp.rs
β”‚Β Β  β”œβ”€β”€ symbolicate.rs
β”‚Β Β  └── unwind.rs
β”œβ”€β”€ canary.rs
β”œβ”€β”€ cli.rs
β”œβ”€β”€ cortexm.rs
β”œβ”€β”€ dep
β”‚Β Β  β”œβ”€β”€ cratesio.rs
β”‚Β Β  β”œβ”€β”€ mod.rs
β”‚Β Β  β”œβ”€β”€ rust_repo.rs
β”‚Β Β  β”œβ”€β”€ rust_std
β”‚Β Β  β”‚Β Β  └── toolchain.rs
β”‚Β Β  β”œβ”€β”€ rust_std.rs
β”‚Β Β  └── rustc.rs
β”œβ”€β”€ elf.rs
β”œβ”€β”€ main.rs
β”œβ”€β”€ probe.rs
β”œβ”€β”€ registers.rs
β”œβ”€β”€ stacked.rs
└── target_info.rs

Note:

The choice about foo.rs vs foo/mod.rs often depends on whether mod foo itself has any child modules.

The example is from the Knurling tool probe-run.

What kind of import?

Choosing whether to import the parent module, or each of the types contained within, is something of an art form.

#![allow(unused)]
fn main() {
use std::fs;
use std::collections::VecDeque;
use std::io::prelude::*;
}

Standard Library

There's also a more compact syntax for imports.

use std::{fs, io::prelude::*};

fn main() -> std::io::Result<()> {
    let mut f = fs::File::create("hello.txt")?;
    f.write(b"hello")?;
    Ok(())
}

Good Design Practices

Two types of Rust crates

  • binary - a program you can run directly
  • library - a collection of useful code that you can re-use in a binary

Binary crate

cargo new my_app
my_app/
β”œβ”€β”€ src/
β”‚   └── main.rs
└── Cargo.toml

Library crate

cargo new --lib my_library
my_library/
β”œβ”€β”€ src/
β”‚   └── lib.rs
└── Cargo.toml

How to run the code in a library?

Use tests!

#![allow(unused)]
fn main() {
pub fn add(left: usize, right: usize) -> usize {
    left + right
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn it_works() {
        let result = add(2, 2);
        assert_eq!(result, 4);
    }
}
}

Testing

  • mark your function with #[test]
  • use assert!, assert_eq!, assert_ne! for assertions
    • assert_eq!, assert_ne! will show you the difference between left and right arguments
    • all assertions take an optional custom error message argument
  • first failed assertion in a test function will stop the current test, other tests will still run
  • cargo test will run all tests

Assertions for your own types:

struct Point(i32, i32);

fn main() {
    let p = Point (1, 2);
    assert_eq!(p, Point(1, 2));
}

Errors:

  • "binary operation == cannot be applied to type Point"
    • can't compare two Points
  • "Point doesn't implement Debug"
    • can't print out a Point in error messages

Derives - adding behavior to your types

#[derive(Debug, PartialEq)]
struct Point(i32, i32);

fn main() {
    let p = Point (1, 2);
    assert_eq!(p, Point(1, 2));
}

Debug

Allows printing of values with debug formatting

#[derive(Debug)]
struct Point { x: i32, y: i32 }

#[derive(Debug)]
struct TuplePoint(i32, i32);

fn main() {
    let p = Point { y: 2, x: 1 };
    let tp = TuplePoint (1, 2);
    println!("{:?}", p);  // Point { x: 1, y: 2 }
    println!("{:?}", tp); // TuplePoint (1, 2)
}

PartialEq

  • Allows checking for equality (== and !=)
  • For complex types does a field-by-field comparison
  • For references it compares data that references observe
  • Can compare arrays and slices if their elements are PartialEq, too

PartialEq and Eq

Eq means strict mathematical equality:

  1. a == a should always be true
  2. a == b means b == a
  3. a == b and b == c means a == c

IEEE 754 floating point numbers (f32 and f64) break the first rule (NaN == NaN is always false). They are PartialEq and not Eq.

PartialOrd and Ord

  • Same as PartialEq and Eq, but they also allow other comparisons (<, <=, >=, >).
  • Generally, everything is Ord, except f32 and f64.
  • Characters are compared by their code point numerical values
  • Arrays and slices are compared element by element. Length acts as a tiebreaker.
    • "aaa" < "b", but "aaa" > "a"
    • elements themselves have to be PartialOrd or Ord

How derives work?

  • Debug, PartialEq, Eq, etc. are simultaneously names of "Traits" and names of "derive macros".
  • If a trait has a corresponding derive macro it can be "derived":
    • Rust will generate a default implementation.
  • Not all traits have a corresponding derive macros
    • these traits have to be implemented manually.

Debug and Display

  • a pair of traits.
  • Debug is for debug printing
    • can be derived
  • Display is for user-facing printing
    • cannot be derived, and must be implemented manually
println!("{:?}", value); // uses `Debug`
println!("{:#?}", value); // uses `Debug` and pretty-prints structures
println!("{}", value); // uses `Display`

Traits dependencies

Traits can depend on each other.

  • Eq and PartialOrd both require PartialEq.
  • Ord requires both Eq and PartialOrd
#[derive(Debug, Ord)] // will give an error

#[derive(Debug, PartialEq, Eq, PartialOrd, Ord)] // Ok

Other useful traits:

  • Hash - a type can be used as a key for HashMap
  • Default - a type gets a default() method to produce a default value
    • 0 is used for numbers, "" for strings
    • collections starts as empty
    • Option fields will be None
  • Clone adds a clone() method to produce a deep copy of a value

derive lists can get be pretty long.

Documentation

Formatting and Linting

rustfmt is a default Rust formatter

cargo fmt

Clippy is a linter for Rust code

cargo clippy

Methods and Traits

Methods

Methods

  • Methods in Rust, are functions in an impl block
  • They take self (or similar) as the first argument (the method receiver)
  • They can be called with the method call operator

Example

struct Square(f64);

impl Square {
    fn area(&self) -> f64 { self.0 * self.0 }
    fn double(&mut self) { self.0 *= 2.0; }
    fn destroy(self) -> f64 { self.0 }
}

fn main() {
    let mut sq = Square(5.0);
   
    sq.double();  // Square::double(&mut sq)
    println!("area is {}", sq.area()); // Square::area(&sq)
    sq.destroy(); // Square::destroy(sq)
}

Note:

You can always use the full function-call syntax. That is what the method call operator will be converted into during compilation.

For motivation for something that takes self, imagine an embedded device with a Uart object that owns two Pin objects - one for the Tx pin and one for the Rx pin. Whilst the Uart object exists, those pins are in UART mode. But if you destroy the Uart, you want to get the pins back so you can re-use them for something else (e.g. as GPIO pins). Equally you could destroy some HTTPRequest object and recover the TCPStream contained within, so you could use it for WebSocket traffic instead of HTTP traffic.

Method Receivers

  • &self means self: &Self
  • &mut self means self: &mut Self
  • self means self: Self
  • Self means whatever type this impl block is for

Method Receivers

struct Square(f64);

impl Square {
    fn by_value(self: Self) {}
    fn by_ref(self: &Self) {}
    fn by_ref_mut(self: &mut Self) {}
    fn by_box(self: Box<Self>) {}
    fn by_rc(self: Rc<Self>) {}
    fn by_arc(self: Arc<Self>) {}
    fn by_pin(self: Pin<&Self>) {}
    fn explicit_type(self: Arc<Example>) {}
    fn with_lifetime<'a>(self: &'a Self) {}
    fn nested<'a>(self: &mut &'a Arc<Rc<Box<Alias>>>) {}
    fn via_projection(self: <Example as Trait>::Output) {}
}

Notes:

This slide is only intended to show that there's lots of complexity behind the curtain, and we're ignoring almost all of it in this course. Come back for Advanced Rust if you want to know more!

Associated Functions

  • You can also just declare functions with no method receiver.
  • You call these with normal function call syntax.
  • Typically we provide a function called new
pub struct Square(f64);

impl Square {
    pub fn new(width: f64) -> Square {
        Square(width)
    }
}

fn main() {
    // Just an associated function - nothing special about `new`
    let sq = Square::new(5.0);
}

Note:

Question - can anyone just call Square(5.0) instead of Square::new(5.0)? Even from another module?

Associated Constants

impl blocks can also have const values:

#![allow(unused)]
fn main() {
pub struct Square(f64);

impl Square {
    const NUMBER_OF_SIDES: u8 = 4;

    pub fn perimeter(&self) -> f64 {
        self.0 * f64::from(Self::NUMBER_OF_SIDES)
    }
}
}

Traits

Traits

  • A trait is a list of methods and functions that a type must have.
  • A trait can provide default implementations if desired.
#![allow(unused)]
fn main() {
trait HasArea {
    /// Get the area, in mΒ².
    fn area_m2(&self) -> f64;

    /// Get the area, in acres.
    fn area_acres(&self) -> f64 {
        self.area_m2() / 4046.86
    }
}
}

An example

trait HasArea {
    fn area_m2(&self) -> f64;
}

struct Square(f64);

impl HasArea for Square {
    fn area_m2(&self) -> f64 {
        self.0 * self.0
    }
}

fn main() {
    let sq = Square(5.0);
    println!("{}", sq.area_m2());
}

Associated Types

A trait can also have some associated types, which are type aliases chosen when the trait is implemented.

#![allow(unused)]
fn main() {
trait Iterator {
    type Item;

    fn next(&mut self) -> Option<Self::Item>;
}

struct MyRange { start: u32, len: u32 }

impl Iterator for MyRange {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        todo!();
    }
}
}

Rules for Implementing

You can only implement a Trait for a Type if:

  • The Type was declared in this module, or
  • The Trait was declared in this module

You can't implement someone else's trait on someone else's type!

Note:

If this was allowed, how would anyone know about it?

Rules for Using

You can only use the trait methods provided by a Trait on a Type if:

  • The trait is in scope
  • (e.g. you add use Trait; in that module)

Traits

Note:

We walk the attendees through each of these examples. They are only listed in pairs for the pleasing symmetry - nothing in Rust says they have to come in pairs.

Sneaky Workarounds

If a trait method uses &mut self and you really want it to work on some &SomeType reference, you can:

impl SomeTrait for &SomeType {
    // ...
}

The I/O traits do this.

Using Traits Statically

  • One way to use traits is by using impl Trait as a type.
  • This is static-typing, and a new function is generated for every actual type passed.
  • You can also impl Trait in the return position.

Using Traits Statically: Example

#![allow(unused)]
fn main() {
trait HasArea {
    fn area_m2(&self) -> f64;
}

struct AreaCalculator {
    area_m2: f64
}

impl AreaCalculator {
    // Multiple symbols may be generated by this function
    fn add(&mut self, shape: impl HasArea) {
        self.area_m2 += shape.area_m2();
    }

    fn total(&self) -> impl std::fmt::Display {
        self.area_m2
    }
}
}

Note:

The total function says "I will give you a value you can display (with println), but I am not telling you what it is". You can look up "RPIT" (return position impl trait) for the history of this feature. APIT (argument position impl trait) is probably the less useful of the two.

Using Traits Dynamically

  • Rust also supports trait references
  • The types are given at run-time through a vtable
  • The reference is now a wide pointer

Using Traits Dynamically: Example

#![allow(unused)]
fn main() {
trait HasArea {
    fn area_m2(&self) -> f64;
}

struct AreaCalculator {
    area_m2: f64
}

impl AreaCalculator {
    // Only one symbol is generated by this function. The reference contains
    // a pointer to the table, *and* a pointer to a function table.
    fn add(&mut self, shape: &dyn HasArea) {
        self.area_m2 += shape.area_m2();
    }

    fn total(&self) -> &dyn std::fmt::Display {
        &self.area_m2
    }
}
}

Note:

In earlier editions, it was just &Trait, but it was changed to &dyn Trait

Which is better?

Monomorphisation? Or Polymorphism?

Requiring other Traits

  • Traits can also require other traits to also be implemented
#![allow(unused)]
fn main() {
trait Printable: std::fmt::Debug { 
    fn print(&self) {
        println!("I am {:?}", self);
    }
}
}

Special Traits

  • Some traits have no functions (Copy, Send, Sync, etc)
    • But code can require that the trait is implemented
    • More in this in generics!
  • Traits can be marked unsafe
    • Must use the unsafe keyword to implement
    • They're telling you to read the instructions!

Rust I/O Traits

There are two kinds of computer:

  • Windows NT based
  • POSIX based (macOS, Linux, QNX, etc)

Rust supports both.

Note:

We're specifically talking about libstd targets here. Targets that only have libcore have very little I/O support built-in - it's all third party crates.

They are very different:

HANDLE CreateFileW(
  /* [in]           */ LPCWSTR               lpFileName,
  /* [in]           */ DWORD                 dwDesiredAccess,
  /* [in]           */ DWORD                 dwShareMode,
  /* [in, optional] */ LPSECURITY_ATTRIBUTES lpSecurityAttributes,
  /* [in]           */ DWORD                 dwCreationDisposition,
  /* [in]           */ DWORD                 dwFlagsAndAttributes,
  /* [in, optional] */ HANDLE                hTemplateFile
);

int open(const char *pathname, int flags, mode_t mode);

Abstractions

To provide a common API, Rust offers some basic abstractions:

  • A Read trait for reading bytes
  • A Write trait for writing bytes
  • Buffered wrappers for the above (BufReader and BufWriter)
  • A Seek trait for adjusting the read/write offset in a file, etc
  • A File type to represent open files
  • Types for Stdin, Stdout and Stderr
  • The Cursor type to make a [u8] readable/writable

The Read Trait

https://doc.rust-lang.org/std/io/trait.Read.html

#![allow(unused)]
fn main() {
use std::io::Result;

pub trait Read {
    // One required method
    fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
    // Lots of provided methods, such as:
    fn read_to_string(&mut self, buf: &mut String) -> Result<usize> { todo!() }
}
}

Immutable Files

  • A File on POSIX is just an integer (recall open returns an int)
  • Do you need a &mut File to write?
    • No - the OS handles shared mutability internally
  • But the trait requires &mut self...

Implementing Traits on &Type

impl Read for File {

}

impl Read for &File {

}

See the std::io::File docs.

OS Syscalls

  • Remember, Rust is explicit
  • If you ask to read 8 bytes, Rust will ask the OS to get 8 bytes from the device
  • Asking the OS for anything is expensive!
  • Asking the OS for a million small things is really expensive...

Buffered Readers

  • There is a BufRead trait, for buffered I/O devices
  • There is a BufReader struct
    • Owns a R: Read, and impl BufRead
    • Has a buffer in RAM and reads in large-ish chunks
#![allow(unused)]
fn main() {
use std::io::BufRead;

fn print_file() -> std::io::Result<()> {
    let f = std::fs::File::open("/etc/hosts")?;
    let reader = std::io::BufReader::new(f);
    for line in reader.lines() {
        println!("{}", line?);
    }
    Ok(())
}
}

The write! macro

  • You can println! to standard output
  • You can format! to a String
  • You can also write! to any T: std::io::Write
use std::io::Write;

fn main() -> std::io::Result<()> {
    let filling = "Cheese and Jam";
    let f = std::fs::File::create("lunch.txt")?;
    write!(&f, "I have {filling} sandwiches")?;
    Ok(())
}

Networking

End of the Line

  • It's obvious when you've hit the end of a File
  • When do you hit the end of a TcpStream?
    • When either side does a shutdown

Note:

  • Read trait has a method read_to_end()

Binding Ports

  • TcpListener needs to know which IP address and port to bind
  • Rust has a ToSocketAddrs trait impl'd on many things
    • &str, (IpAddr, u16), (&str, u16), etc
  • It does DNS lookups automatically (which may return multiple addresses...)
fn main() -> Result<(), std::io::Error> {
    let listener = std::net::TcpListener::bind("127.0.0.1:7878")?;
    Ok(())
}

More Networking

Note:

Some current prominent examples of each -

Failures

  • Almost any I/O operation can fail
  • Almost all std::io APIs return Result<T, std::io::Error>
  • std::io::Result<T> is an alias
  • Watch out for it in the docs!

Generics


Generics are fundamental for Rust.

Generic Structs

Structs can have type parameters.

struct Point<Precision> {
    x: Precision,
    y: Precision,
}

fn main() {
    let point = Point { x: 1_u32, y: 2 };
    let point: Point<i32> = Point { x: 1, y: 2 };
}

Note:

The part <Precision> introduces a type parameter called Precision. Often people just use T but you don't have to!

Type Inference

  • Inside a function, Rust can look at the types and infer the types of variables and type parameters.
  • Rust will only look at other signatures, never other bodies.
  • If the function signature differs from the body, the body is wrong.

Generic Enums

Enums can have type parameters.

enum Either<T, X> {
    Left(T),
    Right(X),
}

fn main() {
    let alternative: Either<i32, f64> = Either::Left(123);
}

Note:

What happens if I leave out the <i32, f64> specifier? What would type parameter X be set to?

Generic Functions

Functions can have type parameters.

#![allow(unused)]
fn main() {
fn print_stuff<X>(value: X) {
    // What can you do with `value` here?
}
}

Note:

Default bounds are Sized, so finding the size of the type is one thing that you can do. You can also take a reference or a pointer to the value.

Generic Implementations

struct Vector<T> {
    x: T,
    y: T,
}

impl<T> Vector<T> {
    fn new(x: T, y: T) -> Vector<T> {
        Vector { x, y }
    }
}

impl Vector<f32> {
    fn magnitude(&self) -> f32 {
        ((self.x * self.x) + (self.y * self.y)).sqrt()
    }
}

fn main() {
    let v1 = Vector::new(1.0, 1.0);
    println!("{}", v1.magnitude());
    let v2 = Vector::new(1, 1);
    // println!("{}", v2.magnitude());
}

Note:

Can I call my_vector.magnitude() if T is ... a String? A Person? A TCPStream?

Are there some trait bounds we could place on T such that T + T -> T and T * T -> T and T::sqrt() were all available?

The error:

error[E0599]: no method named `magnitude` found for struct `Vector<{integer}>` in the current scope
  --> src/main.rs:23:23
   |
2  | struct Vector<T> {
   | ---------------- method `magnitude` not found for this struct
 ...
23 |     println!("{}", v2.magnitude());
   |                       ^^^^^^^^^ method not found in `Vector<{integer}>`
   |
   = note: the method was found for
           - `Vector<f32>`
For more information about this error, try `rustc --explain E0599`.

Adding Bounds

  • Generics aren't much use without bounds.
  • A bound says which traits must be implemented on any type used for that type parameter
  • You can apply the bounds on the type, or a function/method, or both.

Adding Bounds - Example

trait HasArea {
    fn area(&self) -> f32;
}

fn print_area<T>(shape: &T) where T: HasArea {
    let area = shape.area();
    println!("Area = {area:?}");
}

struct UnitSquare;

impl HasArea for UnitSquare {
    fn area(&self) -> f32 {
        1.0
    }
}

fn main() {
    let u = UnitSquare;
    print_area(&u);
}

Adding Bounds - Alt. Example

trait HasArea {
    fn area(&self) -> f32;
}

fn print_area<T: HasArea>(shape: &T) {
    let area = shape.area();
    println!("Area = {area:?}");
}

struct UnitSquare;

impl HasArea for UnitSquare {
    fn area(&self) -> f32 {
        1.0
    }
}

fn main() {
    let u = UnitSquare;
    print_area(&u);
}

Note:

This is exactly equivalent to the previous example, but shorter. However, if you end up with a large set of bounds, they are easier to format when at the end of the line.

General Rule

  • If you can, try and avoid adding bounds to structs.
  • Simpler to only add them to the methods.

Multiple Bounds

You can specify multiple bounds.

trait HasArea {
    fn area(&self) -> f32;
}

fn print_area<T: std::fmt::Debug + HasArea>(shape: &T) {
    println!("Shape {:?} has area {}", shape, shape.area());
}

#[derive(Debug)]
struct UnitSquare;

impl HasArea for UnitSquare {
    fn area(&self) -> f32 { 1.0 }
}

fn main() {
    let u = UnitSquare;
    print_area(&u);
}

impl Trait

  • The impl Trait syntax in argument position was just syntactic sugar.
  • (It does something special in the return position though)
#![allow(unused)]
fn main() {
trait HasArea {
    fn area_m2(&self) -> f64;
}

struct AreaCalculator {
    area_m2: f64
}

impl AreaCalculator {
    // Same: fn add(&mut self, shape: impl HasArea) {
    fn add<T: HasArea>(&mut self, shape: T) {
        self.area_m2 += shape.area_m2();
    }
}
}

Note:

Some types that cannot be written out, like the closure, can be expressed as return types using impl. e.g. fn score(y: i32) -> impl Fn(i32) -> i32.

Caution

  • Using Generics is Hard Mode Rust
  • Don't reach for it in the first instance...
    • Try and just use concrete types?

Generic over Constants

In Rust 1.51, we gained the ability to be generic over constant values too.

struct Polygon<const SIDES: u8> {
    colour: u32
}

impl<const SIDES: u8> Polygon<SIDES> {
    fn new(colour: u32) -> Polygon<SIDES> { Polygon { colour } }
    fn print(&self) { println!("{} sides, colour=0x{:06x}", SIDES, self.colour); }
}

fn main() {
    let triangle: Polygon<3> = Polygon::new(0x00FF00);
    triangle.print();
}

Note:

SIDES is a property of the type, and doesn't occupy any memory within any values of that type at run-time - the constant is pasted in wherever it is used.

Generic Traits

Traits themselves can have type parameters too!

trait HasArea<T> {
    fn area(&self) -> T;
}
 
// Here we only accept a shape where the `U` in `HasArea<Y>` is printable
fn print_area<T, U>(shape: &T) where T: HasArea<U>, U: std::fmt::Debug {
    let area = shape.area();
    println!("Area = {area:?}");
}

struct UnitSquare;

impl HasArea<f64> for UnitSquare {
    fn area(&self) -> f64 {
        1.0
    }
}
fn main() {
    let u = UnitSquare;
    print_area(&u);
}

Special Bounds

  • Some bounds apply automatically
  • Special syntax to turn them off
#![allow(unused)]
fn main() {
fn print_debug<T: std::fmt::Debug + ?Sized>(value: &T) {
    println!("value is {:?}", value);
}
}

Note:

This bound says "It must implement std::fmt::Debug, but I don't care if it has a size known at compile-time".

Things that don't have sizes known at compile time (but which may or may not implement std::fmt::Debug) include:

  • String Slices
  • Closures

Lifetimes

Rust Ownership

  • Every piece of memory in Rust program has exactly one owner at the time
  • Ownership changes ("moves")
    • fn takes_ownership(data: Data)
    • fn producer() -> Data
    • let people = [paul, john, emma];

Producing owned data

fn producer() -> String {
    String::new()
}

Producing references?

fn producer() -> &str {
    // ???
}
  • &str "looks" at some string data. Where can this data come from?

Local Data

Does this work?

fn producer() -> &str {
    let s = String::new();
    &s
}

Local Data

No, we can't return a reference to local data...

error[E0515]: cannot return reference to local variable `s`
 --> src/lib.rs:3:5
  |
3 |     &s
  |     ^^ returns a reference to data owned by the current function

Local Data

You will also see:

error[E0106]: missing lifetime specifier
 --> src/lib.rs:1:18
  |
1 | fn producer() -> &str {
  |                  ^ expected named lifetime parameter
  |

Static Data

#![allow(unused)]
fn main() {
fn producer() -> &'static str {
    "hello"
}
}
  • bytes h e l l o are "baked" into your program
  • part of static memory (not heap or stack)
  • a slice pointing to these bytes will always be valid
  • safe to return from producer function

Note:

You didn't need to specify 'static for the static variable - there's literally no other lifetime that can work here.

How big is a &'static str? Do you think the length lives with the string data, or inside the str-reference itself?

(It lives with the reference - so you can take sub-slices)

Static Data

It doesn't have to be a string literal - any reference to a static is OK.

#![allow(unused)]
fn main() {
static HELLO: [u8; 5] = [0x68, 0x65, 0x6c, 0x6c, 0x6f];

fn producer() -> &'static str {
    std::str::from_utf8(&HELLO).unwrap()
}
}

'static annotation

  • Rust never assumes 'static for function returns or fields in types
  • &'static T means this reference to T will never become invalid
  • T: 'static means that "if type T has any references inside they should be 'static"
    • T may have no references inside at all!
  • string literals are always &'static str

fn takes_and_returns(s: &str) -> &str {

}

Where can the returned &str come from?

  • can't be local data
  • is not marked as 'static
  • Conclusion: must come from s!

Multiple sources

fn takes_many_and_returns(s1: &str, s2: &str) -> &str {

}

Where can the returned &str come from?

  • is not marked as 'static
  • should it be s1 or s2?
  • Ambiguous. Should ask programmer for help!

Tag system

fn takes_many_and_returns<'a>(s1: &str, s2: &'a str) -> &'a str {

}

"Returned &str comes from s2"

'a

  • "Lifetime annotation"
  • often called "lifetime" for short, but that's a very bad term
    • every reference has a lifetime
    • annotation doesn't name a lifetime of a reference, but used to tie lifetimes of several references together
    • builds "can't outlive" and "should stay valid for as long as" relations
  • arbitrary names: 'a, 'b, 'c, 'whatever

Lifetime annotations in action

fn first_three_of_each(s1: &str, s2: &str) -> (&str, &str) {
    (&s1[0..3], &s1[0..3])
}

fn main() {
    let amsterdam = format!("AMS Amsterdam");

    let (amsterdam_code, denver_code) = {
        let denver = format!("DEN Denver");
        first_three_of_each(&amsterdam, &denver)
    };

    println!("{} -> {}", amsterdam_code, denver_code);
}

Annotate!

fn first_three_of_each<'a, 'b>(s1: &'a str, s2: &'b str) -> (&'a str, &'b str) {
    (&s1[0..3], &s1[0..3])
}

Annotations are used to validate function body

"The source you used in code doesn't match the tags"

error: lifetime may not live long enough
 --> src/lib.rs:2:5
  |
1 | fn first_three_of_each<'a, 'b>(s1: &'a str, s2: &'b str) -> (&'a str, &'b str) {
  |                        --  -- lifetime `'b` defined here
  |                        |
  |                        lifetime `'a` defined here
2 |     (&s1[0..3], &s1[0..3])
  |     ^^^^^^^^^^^^^^^^^^^^^^ function was supposed to return data with lifetime `'b` but it is returning data with lifetime `'a`
  |
  = help: consider adding the following bound: `'a: 'b`

Annotations are used to validate reference lifetimes at a call site

"Produced reference can't outlive the source"

error[E0597]: `amsterdam` does not live long enough
   --> src/main.rs:10:29
    |
6   |     let amsterdam = format!("AMS Amsterdam");
    |         --------- binding `amsterdam` declared here
  ...
10  |         first_three_of_each(&amsterdam, &denver)
    |         --------------------^^^^^^^^^^----------
    |         |                   |
    |         |                   borrowed value does not live long enough
    |         argument requires that `amsterdam` is borrowed for `'static`
  ...
14  | }
    | - `amsterdam` dropped here while still borrowed

Lifetime annotations help the compiler help you!

  • You give Rust hints
  • Rust checks memory access for correctness
fn first_three_of_each<'a, 'b>(s1: &'a str, s2: &'b str) -> (&'a str, &'b str) {
    (&s1[0..3], &s2[0..3])
}

fn main() {
    let amsterdam = format!("AMS Amsterdam");
    let denver = format!("DEN Denver");

    let (amsterdam_code, denver_code) = {
        first_three_of_each(&amsterdam, &denver)
    };

    println!("{} -> {}", amsterdam_code, denver_code);
}

What if multiple parameters can be sources?

fn pick_one(s1: &'? str, s2: &'? str) -> &'? str {
    if coin_flip() {
        s1
    } else {
        s2
    }
}

What if multiple parameters can be sources?

fn pick_one<'a>(s1: &'a str, s2: &'a str) -> &'a str {
    if coin_flip() {
        s1
    } else {
        s2
    }
}
  • returned reference can't outlive either s1 or s2
  • potentially more restrictive

Note:

This function body does not force the two inputs to live for the same amount of time. Variables live for as long as they live and we can't change that here. This just says "I'm going to use the same label for the lifetimes these two references have, so pick whichever is the shorter".

Example

fn coin_flip() -> bool { false }

fn pick_one<'a>(s1: &'a str, s2: &'a str) -> &'a str {
    if coin_flip() {
        s1
    } else {
        s2
    }
}

fn main() {
    let a = String::from("a");
    let b = "b";
    let result = pick_one(&a, b);
    // drop(a);
    println!("{}", result);
}

Lifetime annotations for types

struct Configuration {
    database_url: &str,
}

Where does the string data come from?

Generic lifetime parameter

struct Configuration<'a> {
    database_url: &'a str,
}

 

  • An instance of Configuration can't outlive a string
    that it refers to via database_url.
  • The string can't be dropped
    while
    an instance of Configuration still refers to it.

Lifetimes and Generics

  • Lifetime annotations act like generics from type system PoV.
  • Can be used to to add bounds to types: where T: Debug + 'a
    • Type T has to be printable with :?.
    • If T has references inside, they have to stay valid for as long as 'a tag requires.
  • Can be used to match lifetime generics in struct or enum with the annotations used in function signatures and in turn with exact lifetimes of references.

Complex example

fn select_peer<'a>(peers: &[&'a str]) -> Option<Cow<'a, str>> {
    for p in peers {
        if is_up(p) {
            return Some(Cow::Borrowed(p))
        }
    }
    None
}

fn main() {}

Compiler concludes:

Returned value will not be allowed to outlive any reference in peers list

let selected = select_peer(&peers);

Lifetime annotations in practice

  • Like generics, annotations make function signatures verbose and difficult to read
    • they often can be glossed over when reading code
  • T: 'static means "Owned data or static references", owned data can be very short-lived
  • Using owned data in your types helps avoid borrow checker difficulties

Cargo Workspaces

Cargo Workspaces

Allow you to split your project into several packages

  • further encourages modularity
  • develop multiple applications and libraries in a single tree
  • synchronized dependency management, release process, etc.
  • a way to parallelize compilation and speed up builds
  • your internal projects should likely be workspaces even if you don't use monorepos

Anatomy of Rust Workspace

my-app/
β”œβ”€β”€ Cargo.toml   # a special workspace file
β”œβ”€β”€ Cargo.lock   # notice that Cargo produces a common lockfile for all packages
β”œβ”€β”€ packages/      # can use any directory structure
β”‚   β”œβ”€β”€ main-app/
β”‚   β”‚   β”œβ”€β”€ Cargo.toml
β”‚   β”‚   └── src/
β”‚   β”‚       └── main.rs
β”‚   β”œβ”€β”€ admin-app/
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ common-data-model/
β”‚   β”‚   β”œβ”€β”€ Cargo.toml
β”‚   β”‚   └── src/
β”‚   β”‚       └── lib.rs
β”‚   β”œβ”€β”€ useful-macros
β”‚   β”œβ”€β”€ service-a
β”‚   β”œβ”€β”€ service-b
β”‚   └── ...
└── tools/       # packages don't have to be in the same directory
    β”œβ”€β”€ release-bot/
    β”‚   β”œβ”€β”€ Cargo.toml
    β”‚   └── src/
    β”‚       └── main.rs
    β”œβ”€β”€ data-migration-scripts/
    β”‚   β”œβ”€β”€ Cargo.toml
    β”‚   └── src/
    β”‚       └── main.rs
    └── ...

Workspace Cargo.toml

[workspace]
resolver = "2"
members = ["packages/*", "tools/*"]

[dependencies]
thiserror = "1.0.39"
...

using wildcards for members is very handy when you want to add new member packages, split packages, etc.

Cargo.toml for a workspace member

[package]
name = "main-app"

[dependencies]
thiserror = { workspace = true }
service-a = { path = "../service-a" }
...

Cargo commands for workspaces

  • cargo run --bin main-app
  • cargo test -p service-a

Creating a workspace

#!/usr/bin/env bash
function nw() {
  local name="$1"
  local work_dir="$PWD"
  mkdir -p "$work_dir/$name/packages"
  git init -q "$work_dir/$name"
  cat > "$work_dir/$name/Cargo.toml" << EOF
[workspace]
resolver = "2"
members = ["packages/*"]

[workspace.dependencies]
EOF
  cat > "$work_dir/$name/.gitignore" << EOF
target
EOF
  code "$work_dir/$name"
}

Example:

nw spaceship
cargo new --lib spaceship/packages/fuel-control

Heap Allocation (Box, Rc and Cow)

Where do Rust variables live?

struct Square {
    width: f32
}

fn main() {
    let x: u64 = 0;
    let y = Square { width: 1.0 };
    let mut z: String = "Hello".to_string();
    z.push_str(", world!");
}

Note:

  • The variable x is an 8-byte (64-bit) value, and lives on the stack.
  • The variable y is a 4-byte value, and also lives on the stack.
  • The variable z is a 3x4-byte value on 32-bit platforms, and a 3x8-byte value on 64-bit platforms. The String itself is a struct, and the bytes contained within the struct live on the heap.

Let's see some addresses...

struct Square {
    width: f32
}

fn main() {
    let x: u64 = 0;
    let y = Square { width: 1.0 };
    let mut z: String = "Hello".to_string();
    z.push_str(", world!");
    println!("x @ {:p}", &x);
    println!("y @ {:p}", &y);
    println!("z @ {:p}", &z);
    println!("z @ {:p}", z.as_str());
}

Note:

You expect to see something like:

x @ 0x7ffc2272c618
y @ 0x7ffc2272c624
z @ 0x7ffc2272c628
z @ 0x555829f269d0

The first z @ line is the struct String { ... } itself. The second z @ line are the bytes the String contains. They have a different addresses because they are in the heap and not on the stack.

If you run it multiple times, you will get different results. This is due to the Operating System randomizing the virtual addresses used for the stack and the heap, to make security vulnerabilities harder to exploit.

On macOS, you can run vmmap <pid> to print the addresses for each region. On Linux you can use pmap <pid>, or you could add something like:

#![allow(unused)]
fn main() {
if let Ok(maps) = std::fs::read_to_string(&format!("/proc/{}/maps", std::process::id())) {
    println!("{}", maps);
}
}

How does Rust handle the heap?

On three levels:

  • Talking to your Operating System (or its C Library)
  • A low-level API, called the Global Allocator
  • A high-level API, with Box, Rc, Vec, etc

What's in the Box?

  • A Box<T> in Rust, is a handle to a unique, owned, heap-allocated value of type T
  • The value is the size of a pointer
  • The contents of the Box can be any T (including unsized things)

Note:

Pointers can be 'thin' (one word in length) or 'wide' (two words in length). In a wide pointer, the second word holds the length of the thing being pointed to, or a pointer to the vtable if it's a dyn-trait pointer. The same applies to Boxes.

Why not raw pointers?

Because Box<T>:

  • doesn't let you do pointer arithmetic on it
  • will automatically free the memory when it goes out of scope
  • implements Deref<T> and DerefMut<T>

Making a Box

The Deref and DerefMut trait implementations let us use a Box quite naturally:

fn main() {
    let x: Box<f64> = Box::new(1.0_f64);
    let y: f64 = x.sin() * 2.0;
    let z: &f64 = &x;
    println!("x={x}, y={y}, z={z}");
}

When should I use a Box?

  • Not very often - friendlier containers (like Vec<T>) exist
  • If you have a large value that moves around a lot
    • Moving a Box<T> is cheap, because only the pointer moves, not the contents
  • To hide the size or type of a returned value...

Boxed Traits

fn make_stuff(want_integer: bool) -> Box<dyn std::fmt::Debug> {
    if want_integer {
        Box::new(42_i32)
    } else {
        Box::new("Hello".to_string())
    }
}

fn main() {
    println!("make_stuff(true): {:?}", make_stuff(true));
    println!("make_stuff(false): {:?}", make_stuff(false));
}

Note:

An i32 and a String are very different sizes, and a function must have a single fixed size for the return value. But it does - it returns a Box and the Box itself always has the same size. The thing that varies in size is the value inside the box and that lives somewhere else - on the heap in fact.

This trick is also useful for closures, where the type cannot even be said out loud because it's compiler-generated. But you can say a closure implements the FnOnce trait, for example.

Smarter Boxes

What if I want my Box to have multiple owners? And for the memory to be freed when both of the owners have finished with it?

We have the reference counted Rc<T> type for that!

Using Rc<T>

use std::rc::Rc;

struct Point { x: i32, y: i32 }

fn main() {
    let first_handle = Rc::new(Point { x: 1, y: 1});
    let second_handle = first_handle.clone();
    let third_handle = second_handle.clone();
}

Reference Counting

  • The Rc type is a handle to reference-counted heap allocation
  • When you do a clone() the count goes up by one
  • When you drop it, the count goes down by one
  • The memory isn't freed until the count hits zero
  • There's a Weak version which will not keep the allocation alive - to break cycles

Note:

A cycle would be if you managed to construct two Rc wrapped structs and had each one hold an Rc reference to the other. Now neither can ever be freed, because each will always have at least one owner (the other).

Thread-safety

  • Rc cannot be sent into a thread (or through any API that requires the type to be Send).
    • If in doubt, try it! Rust will save you from yourself.
  • The trade-off is that Rc is really fast!
  • There is an Atomic Reference Counted type, Arc if you need it.

Rc is not mutable

NB: Rc allows sharing, but not mutability...

use std::rc::{Rc, Weak};

struct Dog { name: String, owner: Weak<Human> }
struct Human { name: String, pet_dogs: Vec<Dog> }

fn main() {
    let mut sam = Rc::new(
        Human { name: "Sam".to_string(), pet_dogs: Vec::new() }
    );
    let rover = Dog { name: "Rover".to_string(), owner: Rc::downgrade(&sam) };
    // This is not allowed, because `sam` is actually immutable
    // sam.pet_dogs.push(rover);
}

Note:

You get an error like:

error[E0596]: cannot borrow data in an `Rc` as mutable
  --> src/main.rs:12:5
   |
12 |     sam.pet_dogs.push(rover);
   |     ^^^^^^^^^^^^ cannot borrow as mutable
   |
   = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Rc<Human>`
For more information about this error, try `rustc --explain E0596`.

Why do you want this structure? Because given some &Dog you might very well want to know who owns it!

Shared Mutability

We have more on this later...

use std::rc::{Rc, Weak};
use std::cell::RefCell;

struct Dog { name: String, owner: Weak<RefCell<Human>> }
struct Human { name: String, pet_dogs: Vec<Dog> }

fn main() {
    let mut sam = Rc::new(RefCell::new(
        Human { name: "Sam".to_string(), pet_dogs: Vec::new() }
    ));
    let rover = Dog { name: "Rover".to_string(), owner: Rc::downgrade(&sam) };
    // This is now allowed because `RefCell::borrow_mut` does a run-time borrow check
    sam.borrow_mut().pet_dogs.push(rover);
}

Maybe Boxed, maybe not?

Why is this function less than ideal?

/// Replaces all the ` ` characters with `_`
fn replace_spaces(input: &str) -> String {
    todo!()
}

fn main() {
    println!("{}", replace_spaces("Hello, world!"));
    println!("{}", replace_spaces("Hello!"));
}

Note:

Did the second call replace anything? Did you have to allocate a String and copy all the data anyway, even though nothing changed?

Copy-On-Write

Rust has the Cow type to handle this.

/// Replaces all the ` ` characters with `_`
fn replace_spaces(input: &str) -> std::borrow::Cow<str> {
    todo!()
}

fn main() {
    println!("{}", replace_spaces("Hello, world!"));
    println!("{}", replace_spaces("Hello!"));
}

Note:

Cow works on any T where there is both a Borrowed version and an Owned version.

For example, &[u8] and Vec<u8>.

Shared Mutability (Cell, RefCell)

Rust has a simple rule

ImmutableMutable
Exclusive&mut T&mut T
Shared&TπŸ”₯πŸ”₯πŸ”₯

These rules can be ... bent

(but not broken)

Why the rules exist...

  • Optimizations!
  • It is undefined behaviour (UB) to have multiple &mut references to the same object at the same time
  • You must avoid UB

Note:

If you have UB in your program (anywhere), it is entirely valid for the compiler to delete your entire program and replace it with an empty program.

Bending the rules

There is only one way to modify data through a &T reference:

UnsafeCell

UnsafeCell

use std::cell::UnsafeCell;

fn main() {
    let x: UnsafeCell<i32> = UnsafeCell::new(42);
    let (p1, p2) = (&x, &x);

    let p1_exclusive: &mut i32 = unsafe { &mut *p1.get() };
    *p1_exclusive += 27;
    drop(p1_exclusive);

    let p2_shared: &i32 = unsafe { &*p2.get() };
    assert_eq!(*p2_shared, 42 + 27);
    let p1_shared: &i32 = unsafe { &*p1.get() };
    assert_eq!(*p1_shared, *p2_shared);
}

Note:

The UnsafeCell::get(&self) -> *mut T method is safe, but dereferencing the pointer (or converting it to a &mut reference) is unsafe because a human must verify there is no aliasing.

Can we be safer?

A human must do a lot of manual checks here.

Can we make it nicer to use?

Cell

A Cell is safe to use. But you can only copy in and copy out.

A motivating example

We have some blog posts which have immutable content, and an incrementing view count.

Ideally, we would have a fn view(&self) -> &str to return the content, and increment the view count.

Without Cell s

#![allow(unused)]
fn main() {
#[derive(Debug, Default)]
struct Post {
    content: String,
    viewed_times: u64,
}

impl Post {
    // `mut` is a problem here!
    fn view(&mut self) -> &str {
        self.viewed_times += 1;
        &self.content
    }
}
}

Without Cell

This isn't ideal! view takes a &mut self, meaning this won't work:

fn main() {
    let post = Post { content: "Blah".into(), ..Post::default() };
    // This line is a compile error!
    // println!("{}", post.view());
}

// From before

#[derive(Debug, Default)]
struct Post {
    content: String,
    viewed_times: u64,
}

impl Post {
    // `&mut self` is the problem here!
    fn view(&mut self) -> &str {
        self.viewed_times += 1;
        &self.content
    }
}

Without Cell

fn main() {
    // We need to make the entire struct mutable!
    let mut post = Post { content: "Blah".into(), ..Post::default() };
    println!("{}", post.view());
    // Now this is allowed too...
    post.content.push_str(" - extra content");
}

// From before

#[derive(Debug, Default)]
struct Post {
    content: String,
    viewed_times: u64,
}

impl Post {
    fn view(&mut self) -> &str {
        self.viewed_times += 1;
        &self.content
    }
}

Using Cell instead

Let's see our previous example with Cell.

fn main() {
    let post = Post {
        content: "Blah".into(),
        ..Post::default()
    };
    println!("{}", post.view());
}

#[derive(Debug, Default)]
struct Post {
    content: String,
    viewed_times: std::cell::Cell<u64>,
}

impl Post {
    fn view(&self) -> &str {
        // Note how we are making a copy, then replacing the original.
        let current_views = self.viewed_times.get();
        self.viewed_times.set(current_views + 1);
        &self.content
    }
}

Note:

As an in-depth example of the borrow checker's limitations, consider the Splitting Borrows idiom, which allows one to borrow different fields of the same struct with different mutability semantics:

#![allow(unused)]
fn main() {
struct Foo {
    a: i32,
    b: i32,
    c: i32,
}

let mut x = Foo {a: 0, b: 0, c: 0};
let a = &mut x.a;
let b = &mut x.b;
let c = &x.c;
*b += 1;
let c2 = &x.c;
*a += 10;
println!("{} {} {} {}", a, b, c, c2);
}

The code works, but, once you have mutably borrowed a field you cannot mutably borrow the whole value (e.g. by calling a method on it) at the same time - otherwise you could get two mutable references to the same field at the same time.

Here's an example where tuple fields are special-cased for the borrow checker:

let mut z = (1, 2);
let r = &z.1;
z.0 += 1;
println!("{:?}, {}", z, r);

but fails on an equivalent array

let mut z = [1, 2];
let r = &z[1];
z[0] += 1;
println!("{:?}, {}", z, r);

RefCell

A RefCell is also safe, but lets you borrow or mutably borrow the contents.

The borrow checking is deferred to run-time

Using RefCell

use std::cell::RefCell;

fn main() {
    let x: RefCell<i32> = RefCell::new(42);
    let (p1, p2) = (&x, &x);

    let mut p1_exclusive = p1.borrow_mut();
    *p1_exclusive += 27;
    drop(p1_exclusive);

    let p2_shared = p2.borrow();
    assert_eq!(*p2_shared, 42 + 27);
    // This isn't allowed here:
    // let p2_mutable = p2.borrow_mut();
    let p1_shared = p1.borrow();
    assert_eq!(*p1_shared, *p2_shared);
}

Using RefCell instead

Let's see our previous example with RefCell.

fn main() {
    let post = Post { content: "Blah".into(), ..Post::default() };
    println!("{}", post.view());
}

#[derive(Debug, Default)]
struct Post {
    content: String,
    viewed_times: std::cell::RefCell<u64>,
}

impl Post {
    fn view(&self) -> &str {
        let mut view_count_ref = self.viewed_times.borrow_mut();
        *view_count_ref += 1;
        &self.content
    }
}

RefCell Tradeoffs

Moving the borrow checking to run-time:

  • Might make your program actually compile πŸ˜€
  • Might cause your program to panic 😒

interior mutability is something of a last resort

-- The Rust Documentation

Using with Rc

To get shared ownership and mutability you need two things:

  • Rc<RefCell<T>>
  • (Multi-threaded programs might use Arc<Mutex<T>>)

OnceCell for special cases

A OnceCell lets you initialise a value using &self, but not subsequently modify it.

fn main() {
    let post: Post = Post { content: "Blah".into(), ..Post::default() };
    println!("{:?}", post.first_viewed());
}

#[derive(Debug, Default)]
struct Post {
    content: String,
    first_viewed_at: std::cell::OnceCell<std::time::Instant>,
}

impl Post {
    fn first_viewed(&self) -> std::time::Instant {
        self.first_viewed_at.get_or_init(std::time::Instant::now).clone()
    }
}

Thread Safety (Send/Sync, Arc, Mutex)

Rust is thread-safe

But what does that mean?

An Example in C (or C++)

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

void *thread_function(void *p_arg) {
    int* p = (int*) p_arg;
    for(int i = 0; i < 1000000; i++) {
        *p += 1;
    }
    return NULL;
}

int main() {
    int value = 0;
    pthread_t thread1, thread2;
    pthread_create(&thread1, NULL, thread_function, &value);
    pthread_create(&thread2, NULL, thread_function, &value);
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
    printf("value = %d\n", value);
    exit(0);
}

What does that produce...

1000000 * 2 = 2000000, right?

$ ./a.out
value = 1059863

But there were no compiler errors!

(See https://godbolt.org/z/41x1dG6oY)

Let's try Rust

fn thread_function(arg: &mut i32) {
    for _ in 0..1_000_000 {
        *arg += 1;
    }
}

fn main() {
    let mut value = 0;
    std::thread::scope(|s| {
        s.spawn(|| thread_function(&mut value));
        s.spawn(|| thread_function(&mut value));
    });
    println!("value = {value}");
}

Oh!

error[E0499]: cannot borrow `value` as mutable more than once at a time
   --> src/main.rs:11:17
    |
9   |     std::thread::scope(|s| {
    |                         - has type `&'1 Scope<'1, '_>`
10  |         s.spawn(|| thread_function(&mut value));
    |         ---------------------------------------
    |         |       |                       |
    |         |       |                       first borrow occurs due to use of `value` in closure
    |         |       first mutable borrow occurs here
    |         argument requires that `value` is borrowed for `'1`
11  |         s.spawn(|| thread_function(&mut value));
    |                 ^^                      ----- second borrow occurs due to use of `value` in closure
    |                 |
    |                 second mutable borrow occurs here
For more information about this error, try `rustc --explain E0499`.

It's our old friend/enemy shared mutability!

How about a RefCell...

fn thread_function(arg: &std::cell::RefCell<i32>) {
    for _ in 0..1_000_000 {
        let mut p = arg.borrow_mut();
        *p += 1;
    }
}

fn main() {
    let mut value = std::cell::RefCell::new(0);
    std::thread::scope(|s| {
        s.spawn(|| thread_function(&value));
        s.spawn(|| thread_function(&value));
    });
    println!("value = {}", value.borrow());
}

Oh come on...

error[E0277]: `RefCell<i32>` cannot be shared between threads safely
   --> src/main.rs:11:17
    |
11  |         s.spawn(|| thread_function(&value));
    |           ----- ^^^^^^^^^^^^^^^^^^^^^^^^^^ `RefCell<i32>` cannot be shared between threads safely
    |           |
    |           required by a bound introduced by this call
    |
    = help: the trait `Sync` is not implemented for `RefCell<i32>`, which is required by `{closure@src/main.rs:11:17: 11:19}: Send`
    = note: if you want to do aliasing and mutation between multiple threads, use `std::sync::RwLock` instead
    = note: required for `&RefCell<i32>` to implement `Send`
note: required because it's used within this closure
   --> src/main.rs:11:17
    |
11  |         s.spawn(|| thread_function(&value));
    |                 ^^
note: required by a bound in `Scope::<'scope, 'env>::spawn`
   --> /home/mrg/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/scoped.rs:196:28
    |
194 |     pub fn spawn<F, T>(&'scope self, f: F) -> ScopedJoinHandle<'scope, T>
    |            ----- required by a bound in this associated function
195 |     where
196 |         F: FnOnce() -> T + Send + 'scope,
    |                            ^^^^ required by this bound in `Scope::<'scope, 'env>::spawn`
For more information about this error, try `rustc --explain E0277`.

What is Send?

  • It is a marker trait with no methods
  • We use it to mark types which are safe to send between threads
pub unsafe auto trait Send { }

What is Sync?

  • It is a marker trait with no methods
  • We use it to mark types where it is safe to send their references between threads
  • A type T is Sync if and only if &T is Send
pub unsafe auto trait Sync { }

Is there a Sync version of RefCell?

Yes, several - and the error message suggested one: std::sync::RwLock.

There's also the slightly simpler std::sync::Mutex.

Using a Mutex

fn thread_function(arg: &std::sync::Mutex<i32>) {
    for _ in 0..1_000_000 {
        let mut p = arg.lock().unwrap();
        *p += 1;
    }
}

fn main() {
    let value = std::sync::Mutex::new(0);
    std::thread::scope(|s| {
        s.spawn(|| thread_function(&value));
        s.spawn(|| thread_function(&value));
    });
    println!("value = {}", value.lock().unwrap());
}

Why the unwrap?

  • The Mutex is locked on lock()
  • It is unlocked when the value returned from lock() is dropped
  • What if you panic! whilst holding the lock?
  • -> The next lock() will return Err(...)
  • You can basically ignore it (the panic is a bigger issue...)

What about Rc<T>?

That's not thread-safe either. Use std::sync::Arc<T>.

fn thread_function(arg: &std::sync::Mutex<i32>) {
    for _ in 0..1_000_000 {
        let mut p = arg.lock().unwrap();
        *p += 1;
    }
}

fn main() {
    let value = std::sync::Arc::new(std::sync::Mutex::new(0));
    let t1 = std::thread::spawn({
        let value = value.clone();
        move || thread_function(&value)
    });
    let t2 = std::thread::spawn({
        let value = value.clone();
        move || thread_function(&value)
    });
    let _ = t1.join();
    let _ = t2.join();
    println!("value = {}", value.lock().unwrap());
}

Atomic Values

Methods on Atomics

  • We have AtomicBool, AtomicPtr, and 10 sizes of Atomic integer
  • load() and store()
  • fetch_add() and fetch_sub()
  • compare_exchange()
  • etc

Note:

  • load and store work as expected
  • fetch_add will add a value to the atomic, and return its old value
  • fetch_sub will subtract a value from the atomic, and return its old value
  • compare_exchange will swap an atomic for some new value, provided it is currently equal to some given existing value
  • All these functions require an Ordering, which explains whether you are only concerned about this value, or other operations in memory which should happen before or after this atomic access; e.g. when taking a lock.

An Example

We highly recommend "Rust Atomics and Locks" by Mara Bos for further details.

use std::sync::atomic::{Ordering, AtomicI32};

fn thread_function(arg: &AtomicI32) {
    for _ in 0..1_000_000 {
        arg.fetch_add(1, Ordering::Relaxed);
    }
}

fn main() {
    let value = AtomicI32::new(0);
    std::thread::scope(|s| {
        s.spawn(|| thread_function(&value));
        s.spawn(|| thread_function(&value));
    });
    println!("value = {}", value.load(Ordering::Relaxed));
}

Closures

Rust's Function Traits

  • trait FnOnce<Args>
  • trait FnMut<Args>: FnOnce<Args>
  • trait Fn<Args>: FnMut<Args>

Note:

  • Instances of FnOnce can only be called once.
  • Instances of FnMut can be called repeatedly and may mutate state.
  • Instances of Fn can be called repeatedly without mutating state.
  • Fn (a trait) and fn (a function pointer) are different!

These traits are implemented by:

  • Function Pointers
  • Closures

Function Pointers

fn add_one(x: usize) -> usize {
    x + 1
}

fn main() {
    let ptr: fn(usize) -> usize = add_one;
    println!("ptr(5) = {}", ptr(5));
}

Closures

  • Defined with |<args>|
  • Most basic kind, are just function pointers
fn main() {
    let clos: fn(usize) -> usize = |x| x + 5;
    println!("clos(5) = {}", clos(5));
}

Capturing

  • Closures can capture their environment.
  • Now it's an anonymous struct, not a fn
  • It implements Fn
fn main() {
    let increase_by = 1;
    let clos = |x| x + increase_by;
    println!("clos(5) = {}", clos(5));
}

The variable increase_by that is captured by the closure here is called an upvar or a free variable.

Capturing Mutably

  • Closures can capture their environment by mutable reference
  • Now it implements FnMut
fn main() {
    let mut total = 0;
    let mut update = |x| total += x;
    update(5);
    update(5);
    println!("total: {}", total);
}

Note:

The closure is dropped before the println!, making total accessible again (the &mut ref stored in the closure is now gone). If you try and call update() after the println! you get a compile error.

Capturing by transferring ownership

This closure implements FnOnce.

fn main() {
    let items = vec![1, 2, 3, 4];
    let update = move || {
        for item in items {
            println!("item is {}", item);
        }
    };
    update();
    // println!("items is {:?}", items);
}

But why?

  • But why is this useful?
  • It makes iterators really powerful!
fn main() {
    let items = [1, 2, 3, 4, 5, 6];
    let n = 2;
    for even_number in items.iter().filter(|x| (**x % n) == 0) {
        println!("{} is even", even_number);
    }
}

Cleaning up

It's also very powerful if you have something you need to clean up.

  1. You do some set-up
  2. You want do some work (defined by the caller)
  3. You want to clean up after.
#![allow(unused)]
fn main() {
fn setup_teardown<F, T>(f: F) -> T where F: FnOnce(&mut Vec<u32>) -> T {
    let mut state = Vec::new();
    println!("> Setting up state");
    let t = f(&mut state);
    println!("< State contains {:?}", state);
    t
}
}

Cleaning up

fn setup_teardown<F, T>(f: F) -> T where F: FnOnce(&mut Vec<u32>) -> T {
    let mut state = Vec::new();
    println!("> Setting up state");
    let t = f(&mut state);
    println!("< State contains {:?}", state);
    t
}

fn main() {
    setup_teardown(|s| s.push(1));
    setup_teardown(|s| {
        s.push(1);
        s.push(2);
        s.push(3);
    });
}

Note:

In release mode, all this code just gets inlined.

Spawning Threads and Scoped Threads

Platform Differences - Windows

  • On Windows, a Process is just an address space, and it has one Thread by default.
  • You can start more Threads
HANDLE CreateThread(
  /* [in, optional]  */ LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  /* [in]            */ SIZE_T                  dwStackSize,
  /* [in]            */ LPTHREAD_START_ROUTINE  lpStartAddress,  // <<-- function to run in thread
  /* [in, optional]  */ __drv_aliasesMem LPVOID lpParameter,     // <<-- context for thread function
  /* [in]            */ DWORD                   dwCreationFlags,
  /* [out, optional] */ LPDWORD                 lpThreadId
);

Platform Differences - POSIX

  • On POSIX, a Process includes one thread of execution.
  • You can start more Threads, typically using the POSIX Threads API
int pthread_create(
    pthread_t *restrict thread,
    const pthread_attr_t *restrict attr,
    void *(*start_routine)(void *),        // <<-- function to run in thread
    void *restrict arg                     // <<-- context for thread function
);     

Rusty Threads

The Rust thread API looks like this:

pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T + Send + 'static,
    T: Send + 'static,

Using spawn

  • You could pass a function to std::thread::spawn.
  • In almost all cases you pass a closure
use std::{thread, time};

fn main() {
    let thread_handle = thread::spawn(|| {
        thread::sleep(time::Duration::from_secs(1));
        println!("I'm a thread");
    });
    
    thread_handle.join().unwrap();
}

Why no context?

There's no void* p_context argument, because closures can close-over local variables.

use std::thread;

fn main() {
    let number_of_loops = 5; // on main's stack
    let thread_handle = thread::spawn(move || {
        for _i in 0..number_of_loops { // captured by value, not reference
            println!("I'm a thread");
        }
    });
    
    thread_handle.join().unwrap();
}

Note:

Try changing this move closure to a regular referencing closure.

Context lifetimes

However, the thread might live forever...

use std::{sync::Mutex, thread};

fn main() {
    let buffer: Mutex<Vec<i32>> = Mutex::new(Vec::new());
    let thread_handle = thread::spawn(|| {
        for i in 0..5 {
            // captured by reference, does not live long enough
            // buffer.lock().unwrap().push(i);
        }
    });
    thread_handle.join().unwrap();
    let locked_buffer = buffer.lock();
    println!("{:?}", &locked_buffer);
}

Making context live forever

If a thread can live forever, we need its context to live just as long.

use std::{sync::{Arc, Mutex}, thread};

fn main() {
    let buffer = Arc::new(Mutex::new(Vec::new()));
    let thread_buffer = buffer.clone();
    let thread_handle = thread::spawn(move || {
        for i in 0..5 {
            thread_buffer.lock().unwrap().push(i);
        }
    });
    thread_handle.join().unwrap();
    let locked_buffer = buffer.lock().unwrap();
    println!("{:?}", &locked_buffer);
}

Tidying up the handle

  • In Rust, functions take expressions
  • Blocks are expressions...
let thread_buffer = buffer.clone();
let thread_handle = thread::spawn(
    move || {
        for i in 0..5 {
            thread_buffer.lock().unwrap().push(i);
        }
    }
);

Tidying up the handle

  • In Rust, functions take expressions
  • Blocks are expressions...
let thread_handle = thread::spawn({
    let thread_buffer = buffer.clone();
    move || {
        for i in 0..5 {
            thread_buffer.lock().unwrap().push(i);
        }
    }
});

Note:

This clearly limits the visual scope of the thread_buffer variable, to match the logical scope caused by the fact it is transferred by value into the closure.

Scoped Threads

As of 1.63, we can say the threads will all have ended before we carry on our calling function.

use std::{sync::Mutex, thread};

fn main() {
    let buffer = Mutex::new(Vec::new());
    thread::scope(|s| {
        s.spawn(|| {
            for i in 0..5 {
                buffer.lock().unwrap().push(i);
            }
        });
    });
    let locked_buffer = buffer.lock().unwrap();
    println!("{:?}", &locked_buffer);
}

Advanced Strings


There are several different kinds of strings in Rust.

Most common are String and &str.

String

  • Owns the data it stores, and can be mutated freely
  • The bytes it points at exist on the heap
  • Does not implement Copy, but implements Clone


%3string:string:_inner0x480x650x6c0x6c0x6f0x210xFF0xFFstringptrlen = 6cap = 8string:p0->_inner:f0

&str

  • A "string slice reference" (or just "string slice")
  • Usually only seen as a borrowed value
  • The bytes it points at may be anywhere: heap, stack, or in read-only memory
    %3str:str:bytes0xC20xA30x390x390x21strptrlen = 5str:p0->bytes:f0

Creation

fn main() {
    // &'static str
    let this = "Hello";
    // String
    let that: String = String::from("Hello");
    // &str
    let other = that.as_str();
}

When to Use What?

  • String is the easiest to use when starting out. Refine later.
  • String owns its data, so works well as a field of a struct or enum.
  • &str is typically used in function arguments.

Deref Coercion

Just because multiple types exist doesn't mean they can't work in harmony.

fn main() {
    let part_one = String::from("Hello ");
    let part_two = String::from("there ");
    let whole = part_one + &part_two + "world!";
    println!("{}", whole);
}

This is because String s implement Deref<Target=str> .

Exotic String types

  • OsStr and OsString may show up when working with file systems or system calls.

  • CStr and CString may show up when working with FFI.

The differences between [Os|C]Str and [Os|C]String are generally the same as the normal types.

OsString & OsStr

These types represent platform native strings. This is necessary because Unix and Windows strings have different characteristics.

Behind the OsString Scenes

  • Unix strings are often arbitrary non-zero 8-bit sequences, usually interpreted as UTF-8.
  • Windows strings are often arbitrary non-zero 16-bit sequences, usually interpreted as UTF-16.
  • Rust strings are always valid UTF-8, and may contain NUL bytes.

OsString and OsStr bridge this gap and allow for conversion to and from String and str.

Note:

In particular, UNIX file paths are not required to be valid UTF-8 and you might encounter such paths when looking at someone's disk.

Windows file paths are also not required to be valid UTF-16 (i.e. might contain invalid surrogate pairs) and you might encounter such paths when looking at someone's disk.

CString & CStr

These types represent valid C compatible strings.

They are predominantly used when doing FFI with external code.

It is strongly recommended you read all of the documentation on these types before using them.

Common String Tasks

Splitting:

fn main() {
    let words = "Cow says moo";
    let each: Vec<_> = words.split(" ").collect();
    println!("{:?}", each);
}

Common String Tasks

Concatenation:

fn main() {
    let animal = String::from("Cow");
    let sound = String::from("moo");
    let words = [&animal, " says ", &sound].concat();
    println!("{:?}", words);
}

Common String Tasks

Replacing:

fn main() {
    let words = "Cow says moo";
    let replaced = words.replace("moo", "roar");
    println!("{}", replaced);
}

Accepting String or str

It's possible to accept either rather painlessly:

fn accept_either<S>(thing: S) -> String
where S: AsRef<str> {
    String::from("foo") + thing.as_ref()
}

fn main() {
    println!("{}", accept_either("blah"));
    println!("{}", accept_either(String::from("blah")));
}

Raw String Literals

  • Starts with r followed by zero or more # followed by "
  • Ends with " followed by the same number of #
  • Can span multiple lines, leading spaces become part of the line
  • Escape sequences are not processed
fn main () {
    let json = r##"
{
    "name": "Rust Analyzer",
    "brandColor": "#5bbad5"
}
"##;
    assert_eq!(r"\n", "\\n");
}

Byte String Literals

  • not really strings
  • used to declare static byte slices (have a &[u8] type)
fn main() {
    let byte_string: &[u8] = b"allows ASCII and \xF0\x9F\x98\x80 only";
    println!("Can Debug fmt but not Display fmt: {:?}", byte_string);
    if let Ok(string) = std::str::from_utf8(byte_string) {
        println!("Now can Display '{}'", string);
    }
}

Building Robust Programs with Kani

Rust Guarantees are Very Strong

  • No null-dereferencing
  • No uninitialized memory access
  • No use-after-free
  • No double-free
  • No data races

Some bits can still be tricky

  • Numbers, both integer and floating point
  • Some operations can panic!
  • FFI, unsafe code

IEEE 754 Floating point numbers

  • NaN
    • propagates through operations x + NaN => NaN
    • breaks equality symmetry (NaN != NaN)
  • Cancellation
    • subtraction of nearly equal operands may cause extreme loss of accuracy
  • Division Safety test is difficult
  • Limited exponent range leads to overflows and underflows

Integers in Rust

  • a + b can overflow
    • triggers a panic at runtime in debug mode
    • wraps around in release mode
    • this is customizable!
  • a.checked_add(b) produces an Option
  • a.overflowing_add(b) produces a (value, overflow_flag)
  • a.saturating_add(b) clamps the value within the MIN..=MAX range
  • a.wrapping_add(b) allows wraparounds without panic
  • most people would still prefer writing code using operators

Panics

  • "Does my program panic?" is a hard question in Rust
  • #[no_std]-only? panic-never
    • triggers a linker error if there's panicking code path in the binary
    • limited use: Standard Library and 3rd party crates have panicking code
  • clippy has lints against well-known panicking APIs in the Standard Library
  • No easy way to list all panicking call-sites across all dependencies

Unsafe Rust

  • ptr.as_ref() produces Option<&T>
    • can prevent null-dereferencing
    • cannot guarantee that the pointer is well-aligned / points to correct type
  • lifetime information can be lost

Verifying program's behavior

  • Static analysis tools: clippy
  • Testing

Generative testing

"Let's come up with many potential program inputs

and observe program behavior"

Fuzz testing

  • Produce essentially random inputs
    • Often context-aware.
  • Time budget
    • "run the test X times" (X is often in 10_000s)
  • Outcomes are non-deterministic

Property-based testing

  • Generate the complete set of potential input combinations.
    • Test time often grows non-linearly
    • Time limit can prevent it from finding bugs
  • Still selects values at random to try to observe different behaviors earlier.
  • Observe different behavior => explore related input combinations to produce minimal test case.

Model Checking

  • Aware of your code structure
    • Including hidden code paths like panics
  • Builds a model of all of your program's states
    • Uses SAT / SMT solver to prove the validity of program behavior
    • Building a model of your code may take a long time

Generative testing is a spectrum

  • Fuzzing
    • Easier to set up
    • May miss bugs
  • Property testing
    • Middle ground
  • Model Checking
    • Harder to apply
    • Proves correctness

Installing Kani

cargo install --locked kani-verifier
cargo kani setup

Note: Natively runs on x86-64 Linux, and Intel and Apple Silicon macOS Windows users can run the example in a dev container.

How to use Kani 1

cargo new --lib hello-kani
cd hello-kani
cargo add --dev kani-verifier

How to use Kani 2

#[cfg(kani)]
mod proofs {
    use super::*;

    #[kani::proof]
    fn verify_add() {
        let a: u64 = kani::any();
        let b: u64 = kani::any();
        let result = add(a, b);

        // Assert that the result does not overflow
        assert!(result >= a);
        assert!(result >= b);
    }
}

Note: while the word "proof" is used in code Kani calls its tests "harnesses" because technically the function verify_add acts as a test harness that runs generated tests.

How to use Kani 3

cargo kani
...
SUMMARY:
 ** 1 of 3 failed
Failed Checks: attempt to add with overflow
 File: "src/lib.rs", line 2, in add

VERIFICATION:- FAILED
...

How to use Kani 4

cargo kani -Z concrete-playback --concrete-playback=print
/// Test generated for harness `proofs::verify_add`
///
/// Check for `assertion`: "attempt to add with overflow"

#[test]
fn kani_concrete_playback_verify_add_7155943916565760311() {
    let concrete_vals: Vec<Vec<u8>> = vec![
        // 13835058055282163713ul
        vec![1, 0, 0, 0, 0, 0, 0, 192],
        // 9223372036854775804ul
        vec![252, 255, 255, 255, 255, 255, 255, 127],
    ];
    kani::concrete_playback_run(concrete_vals, verify_add);
}

How to use Kani 5

#![allow(unused)]
fn main() {
#[cfg(kani)]
mod proofs {
    use super::*;

    #[test]
    fn kani_concrete_playback_verify_add_7155943916565760311() {
        //
    }
}
}

 

&#35; run playback tests
cargo kani playback -Z concrete-playback

Rough edges

  • kani crate
    • not published on crates.io
    • the crate is injected in your binary when you run cargo kani
    • some of kani dependencies rely on nightly-only code
      • confuse Rust Analyzer / IntelliJ code assists
  • out-of-the-box Developer Experience is very painful
    • but can be fixed in VSCode!

Let's fix it! - VSCode

  • Rust Analyzer
  • Kani extension
  • CodeLLDB or Microsoft C/C++ (on Windows) for debugging
  • You can use Docker and DevContainers on unsupported platforms

Let's fix it! - Cargo.toml

[dev-dependencies]
kani-verifier = "0.56.0"

[dependencies]
&#35; enables autocomplete and code inspections for `kani::*` api
kani = { version = "0.56", git = "https://github.com/model-checking/kani", tag = "kani-0.56.0", optional = true }

&#35; removes warnings about unknown `cfg` attributes
[lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = ['cfg(rust_analyzer)', 'cfg(kani)'] }

Let's fix it! - .vscode/settings.json

{
    // tell Rust Analyzer that Kani features are active
    "rust-analyzer.cargo.features": ["kani"]
}

Let's fix it! - *.rs

Kani proc macros appear broken to Rust Analyzer

#[cfg_attr(not(rust_analyzer), cfg(kani))]
mod proofs {
    use super::*;

    #[cfg_attr(not(rust_analyzer), kani::proof)]
    fn verify_add() {
    }

    #[test]
    fn kani_concrete_playback_verify_add_7155943916565760311() {
    }
}

Full "hello world" example in our repository

See example-code/kani/kani-hello-world

Other Kani features

  • Functional contracts
  • VSCode extension to run (and debug!) playbacks
  • Ability to fine-tune tests:
    • #[kani::unwind(<number>)]
    • #[kani::stub(<original>, <replacement>)]
    • #[kani::solver(<solver>)]
    • kani::any_where(<predicate>)

Feature Highlight: Function contracts

  1. Define a contract for a function
  2. Verify the function behavior
  3. Optional: Let Kani stub out the function
    when checking larger body of code

Function contracts

// tell Kani what kind of values to generate
#[cfg_attr(kani, kani::requires(min != 0 && max != 0))]
// tell Kani about the expectations
#[cfg_attr(kani, kani::ensures(|&result: &u8| {
        result != 0
        && max % result == 0
        && min % result == 0
    };
))]
// only needed if the function is recursive
#[cfg_attr(kani, kani::recursion)]
pub fn gcd(max: u8, min: u8) -> u8 {

Verifying contracts

#[kani::proof_for_contract(gcd)]
fn check_gcd() {
    let max: u8 = kani::any();
    let min: u8 = kani::any();
    gcd(max, min);
}

Using of verified contracts in other proofs

#[kani::proof]
#[kani::stub_verified(gcd)]
fn check_reduce_fraction() {
    let numerator: u8 = kani::any();
    let denominator: u8 = kani::any();
    // uses `gcd`
    reduce_fraction(numerator, denominator);
}

Limitations

  • No multithreading support
    • No support for atomic operations
    • No support for async runtimes (but the syntax is supported)
  • No inline assembly
  • No use of panic!, catch_unwind, and resume_unwind for flow control
  • Loops and deep recursion balloon the number of states that require inspection
  • ...

Test-friendly code

  • Isolate IO
  • Isolate synchronization, message passing, await
  • Isolate target-dependent code

By making our code more test-friendly we make it Kani-friendly, too!

What code to test

  • Numerical code
  • Parsers, serialization and deserialization code
  • Decision trees, complex conditional logic
  • unsafe

Debugging Rust

tl;dr

VSCode + CodeLLDB

The best debugging experience on Windows, Linux, and macOS

Honorable Mentions

  • IntelliJ Rust
  • rr / Pernosco for time-traveling and postmortem debugging

How Debuggers Work

Debuggers use special metadata embedded into executable to correctly match bits of machine code to lines of source code, areas of memory to variables and their types, etc.

Kinda like Source Maps for JavaScript.

How Debuggers Work

Two things have to happen for a debugger to work and provide decent developer experience:

  • The compiler has to emit debug info.
  • The debugger has to be modified / extended to understand this information.

How Debuggers Work

Two things have to happen for a debugger to work and provide decent developer experience:

  • The compiler has to emit debug info.
  • The debugger has to be modified / extended to understand this information.

Compiler

rustc uses llvm which emits debug info in DWARF or PDB format.

  • PDB is produced by windows-msvc toolchains (like x86_64-pc-windows-msvc)
  • DWARF is used by all other toolchains, including GNU toolchains on Windows (like x86_64-pc-windows-gnu)

DWARF

  • Open standard.
  • Very C/C++ specific.
  • Has custom field types for other languages to use.
  • Rust tries to reuse existing C/C++ fields where possible, so many debuggers work out of the box.
  • A companion to ELF...

Extending DWARF

DWARF standard is growing organically over time and largely implementation driven.

Extending DWARF

  1. Come up with a new name for Rust-specific DWARF field.
  2. Change the compiler to emit new debug info and use this field.
  3. Change a debugger to understand this new field.
  4. Propose the new field to be standardized, so that other debuggers can reuse the field, too.

Standardizing takes almost no time due to how few people in the world actually work on DWARF.

PDB

  • Proprietary format with no documentation.
  • Like DWARF is very C/C++ centric.
  • Harder to extend.
  • Rust tries to reuse C/C++ fields as much as possible, so debugging is still reasonable.

You may have a better experience debugging Rust on macOS or Linux than on Windows, because of PDB.

How Debuggers Work

Two things have to happen for a debugger to work and provide decent developer experience:

  • The compiler has to emit debug info.
  • The debugger has to be modified / extended to understand this information.

Debuggers

IDEs and editors rely on these two to provide GUI debugging

GDB

  • Supports a lot of languages.
  • Adopts Rust-specific features quickly.
  • Harder to contribute in general.

LLDB

  • Default choice for Rust.
  • Part of LLVM that Rust uses for compilation.
  • Used to support many languages, but the team decided to focus on C, C++, and Objective C only.
  • Has extension API for supporting other languages, which is not enough for Rust.

LLDB

Rust project maintains a fork of LLDB with extended support for the language.

  • Part of overall LLVM fork.
  • Constantly updated and well-maintained.
  • Non-Rust-specific bug fixes get upstreamed to main LLVM repository

Wrappers

Rust comes with rust-gdb and rust-lldb wrappers around debuggers.

They improve visualizing Rust values printed in console.

Editors and IDEs

Rust-analyzer does not come with debugger support on its own.

Instead it relies on other editor / ide plugins for debugging support.

Prompts you to install one when you open a Rust project.

VSCode Extensions

CodeLLDB.

  • LLDB-only.
  • Maintains it's own fork of Rust's LLDB with even more Rust enhancements!
  • Downloads it on first installation.
  • Seamless debugging experience.

Both Microsoft C/C++ and Native Debug support GDB and LLDB.

Microsoft's extension offers better support for displaying PDB information on Windows.

IntelliJ-Rust

  • A plugin for IDEA and CLion
  • Produced by JetBrain.
  • Like CodeLLDB also maintains it's own fork of Rust's LLDB for better DX.
  • Requires a JB license.

What to use?

  • VSCode + CodeLLDB offer the best debugging experience across all platforms.
  • Microsoft recommends CodeLLDB even for Windows use.
  • IntelliJ-Rust is great if that's your IDE of choice.
  • Native Debug and Microsoft C/C++ extensions can work for you on platforms where only GDB is available.

rr

Things may not work well

  • PDB may result in subpar debugging experience.
    • If possible try debugging your code on OSes other than Windows
    • Or try using GNU-based toolchain on Windows.
  • Watch expressions are limited.
    • Can't use match or if expressions
    • Some method calls may not produce results.
  • Some values can't be shown: function preferences, closures.
  • Breakpoints may sometimes not work in closures and in async code.
  • Trait objects and trait methods may be difficult for debugger to resolve.

When debugger fails you

  • Try to isolate the code in question into smaller functions.
  • Add debug logging / tracing.
  • Tests.

Future

  • New Rust Debugging Working Group:
    • Unites people from Rust, GDB, and rr
    • people from LLVM, CodeLLDB, Rust-Analyzer, and IntelliJ Rust expressed interest in helping out.πŸŽ‰
  • Plans:
    • LLVM team is open to merge Rust-specific features into LLDB directly, may not need a Rust fork, or CodeLLDB / IntelliJ forks.
    • Further expand DWARF to cover tricky Rust features like trait object method references.

Deconstructing Send, Arc, and Mutex

thread::spawn Function

pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T,
    F: Send + 'static,
    T: Send + 'static,
{
    // ...
}

Quick Primer on Rust Closures

  • 3 categories of data
    • data the closure closes over / captures: Upvars
      • convenient compiler terminology
      • not represented by closure type signature
    • parameters
    • returned value
let upper_threshold = 20;
let outliers: Vec<_> = data.iter().copied().filter(|n| -> bool {
    // `n` is a parameter, `upper_threshold` is an *upvar*
    n >= upper_threshold
}).collect();

Spawn closure type

  • F: FnOnce() -> T
    • closure doesn't accept any parameters
    • closure can consume upvars ("FnOnce")
  • F: Send + 'static
    • applies to upvars
  • T: Send + 'static
    • applies to returned value

T: 'static

Two options allowed:

  • the type doesn't have any references inside ("Owned data")
    • struct User { name: String }
  • the references inside the type are 'static
    • struct Db { connection_string: &'static str }

Why F: 'static and T: 'static?

  • applies to data passed from parent thread to child thread or vice-versa
  • prevents passing references to local variables
    • one thread can finish before the other and such references may become invalid
    • + 'static avoids this by ensuring any references point to data that has the static lifetime (i.e. that lives forever)

T: Send

pub unsafe auto trait Send { }

  • auto means all types get this trait automatically
    • opt-out instead of opt-in
  • various types in standard library implement Send or !Send
  • unsafe means you have to put unsafe keyword in front of impl when implementing Send or !Send
    • precautionary measure

Why would one implement Send or !Send

  • Rust pointers (*const T, *mut T, NonNull<T>) are !Send
    • Use-case: what if the pointer comes from FFI library that assumes that all functions using this pointer are called from the same thread?
  • Arc has a NonNull<..> inside and becomes !Send automatically
    • to override this behavior Arc explicitly implements Send

Send in thread::spawn Function

F: Send and T: Send means that all data traveling from the parent thread to child thread has to be marked as Send

  • Rust compiler has no inherent knowledge of threads, but the use of marker traits and lifetime annotations let the type / borrow checker prevent data race errors

Sharing data between threads

Example: Message Log for TCP Echo Server

use std::{
    io::{self, BufRead as _, Write as _},
    net, thread,
};

fn handle_client(stream: net::TcpStream) -> Result<(), io::Error> {
    let mut writer = io::BufWriter::new(&stream);
    let reader = io::BufReader::new(&stream);
    for line in reader.lines() {
        let line = line?;
        writeln!(writer, "{}", line)?;
        writer.flush()?;
    }
    Ok(())
}

fn main() -> Result<(), io::Error> {
    let listener = net::TcpListener::bind("0.0.0.0:7878")?;

    for stream in listener.incoming() {
        let stream = stream?;
        thread::spawn(|| {
            let _ = handle_client(stream);
        });
    }
    Ok(())
}

Task

  • create a log of lengths of all lines coming from all streams
  • let mut log = Vec::<usize>::new();
  • log.push(line.len());

"Dream" API

fn handle_client(stream: net::TcpStream, log: &mut Vec<usize>) -> Result<(), io::Error> {
    // ...
    for line in ... {
        log.push(line.len());
        // ...
    }
    Ok(())
}

fn main() -> Result<(), io::Error> {
    let mut log = vec![];

    for stream in listener.incoming() {
        // ...
        thread::spawn(|| {
            let _ = handle_client(stream, &mut log);
        });
    }
    Ok(())
}

Errors

error[E0373]: closure may outlive the current function, but it borrows `log`, which is owned by the current function
  --> src/main.rs:26:23
   |
26 |         thread::spawn(|| {
   |                       ^^ may outlive borrowed value `log`
27 |             let _ = handle_client(stream.unwrap(), &mut log);
   |                                                         --- `log` is borrowed here
   |
  --> src/main.rs:26:23
   |
26 |         thread::spawn(|| {
   |                       ^^ may outlive borrowed value `log`
27 |             let _ = handle_client(stream.unwrap(), &mut log);
   |                                                         --- `log` is borrowed here
   |
note: function requires argument type to outlive `'static`

Lifetime problem

Problem:

  • local data may be cleaned up prematurely

Solution:

  • move the decision when to clean the data from compile-time to run-time
    • use reference-counting

Attempt 1: Rc

  • let mut log = Rc::new(vec![]);
  • let mut thread_log = log.clone() now doesn't clone the data, but simply increases the reference count
    • both variables now have owned type, and satisfy F: 'static requirement
error[E0277]: `Rc<​Vec<​usize​>​>` cannot be sent between threads safely

Rc in Rust Standard Library

  • uses usize for reference counting
  • explicitly marked as !Send
pub struct Rc<T> {
    ptr: NonNull<RcBox<T>>,
}

impl<T> !Send for Rc<T> {}

struct RcBox<T> {
    strong: Cell<usize>,
    weak: Cell<usize>,
    value: T,
}

Arc in Rust Standard Library

  • uses AtomicUsize for reference counting
  • explicitly marked as Send
pub struct Arc<T> {
    ptr: NonNull<ArcInner<T>>,
}

impl<T> Send for Arc<T> {}

struct ArcInner<T: ?Sized> {
    strong: atomic::AtomicUsize,
    weak: atomic::AtomicUsize,
    data: T,
}

Rc vs Arc

  • Arc uses AtomicUsize for reference counting
    • slower
    • safe to increment / decrement from multiple threads
  • With the help of marker trait Send and trait bounds on thread::spawn, the compiler forces you to use the correct type

Arc / Rc "transparency"

let mut log = Arc::new(Vec::new());
// how does this code work?
log.len();
// and why doesn't this work?
log.push(1);

Deref and DerefMut traits

pub trait Deref {
    type Target: ?Sized;
    fn deref(&self) -> &Self::Target;
}

pub trait DerefMut: Deref {
    fn deref_mut(&mut self) -> &mut Self::Target;
}

Deref coercions

  • Deref can convert a &self reference to a reference of another type
    • conversion function call can be inserted by the compiler for you automatically
    • in most cases the conversion is a no-op or a fixed pointer offset
    • deref functions can be inlined
  • Target is an associated type
    • can't deref() into multiple different types
  • DerefMut: Deref allows the DerefMut trait to reuse the same Target type
    • read-only and read-write references coerce to the references of the same type

Arc / Rc "transparency" with Deref

let mut log = Arc::new(Vec::new());
// Arc<T> implements `Deref` from `&Arc<T> into `&T`
log.len();
// the same as
Vec::len(<Arc<_> as Deref>::deref(&log));

// Arc<T> DOES NOT implement `DerefMut`
// log.push(1);

// the line above would have expanded to:
// Vec::push(<Arc<_> as DerefMut>::deref_mut(&mut log), 1);

Arc and mutability

  • lack of impl DerefMut for Arc prevents accidental creation of multiple &mut to underlying data
  • the solution is to move mutability decision to runtime
let log = Arc::new(Mutex::new(Vec::new()));

 

  • Arc guarantees availability of data in memory
    • prevents memory form being cleaned up prematurely
  • Mutex guarantees exclusivity of mutable access
    • provides only one &mut to underlying data simultaneously

Mutex in Action

  • log is passed as & and is deref-ed from Arc by the compiler
  • mutability is localized to a local guard variable
    • Mutex::lock method takes &self
  • MutexGuard implements Deref and DerefMut!
  • '_ lifetime annotation is needed only because guard struct has a &Mutex inside
fn handle_client(..., log: &Mutex<Vec<usize>>) -> ... {
    for line in ... {
        let mut guard: MutexGuard<'_, Vec<usize>> = log.lock().unwrap();
        guard.push(line.len());
        // line above expands to:
        // Vec::push(<MutexGuard<'_, _> as DerefMut>::deref_mut(&mut guard), line.len());
        writeln!(writer, "{}", line)?;
        writer.flush()?;
    }
}

Mutex locking and unlocking

  • we lock the mutex for exclusive access to underlying data at runtime
  • old C APIs used a pair of functions to lock and unlock the mutex
  • MutexGuard does unlocking automatically when is dropped
    • time between guard creation and drop is called critical section

Lock Poisoning

  • MutexGuard in its Drop implementation checks if it is being dropped normally or during a panic unwind
    • in later case sets a poison flag on the mutex
  • calling lock().unwrap() on a poisoned Mutex causes panic
    • if the mutex is "popular" poisoning can cause many application threads to panic, too.
  • PoisonError doesn't provide information about the panic that caused the poisoning

Critical Section "Hygiene"

  • keep it short to reduce the window when mutex is locked
  • avoid calling functions that can panic
  • using a named variable for Mutex guard helps avoiding unexpected temporary lifetime behavior

Critical Section Example

fn handle_client(..., log: &Mutex<Vec<usize>>) -> ... {
    for line in ... {
        {
            let mut guard: MutexGuard<'_, Vec<usize>> = log.lock().unwrap();
            guard.push(line.len());
        } // critical section ends here, before all the IO
        writeln!(writer, "{}", line)?;
        writer.flush()?;
    }
}

 

  • drop(guard) also works, but extra block nicely highlights the critical section

Lessons Learned

  • careful use of traits and trait boundaries lets the compiler detect problematic multi-threading code at compile time
  • Arc and Mutex let the program ensure data availability and exclusive mutability at runtime where the compiler can't predict the behavior of the program
  • Deref coercions make concurrency primitives virtually invisible and transparent to use
  • Make invalid state unrepresentable

Full Example

use std::{
    io::{self, BufRead as _, Write as _},
    net,
    sync::{Arc, Mutex},
    thread,
};

fn handle_client(stream: net::TcpStream, log: &Mutex<Vec<usize>>) -> Result<(), io::Error> {
    let mut writer = io::BufWriter::new(&stream);
    let reader = io::BufReader::new(&stream);
    for line in reader.lines() {
        let line = line?;
        {
            let mut guard = log.lock().unwrap();
            guard.push(line.len());
        }
        writeln!(writer, "{}", line)?;
        writer.flush()?;
    }
    Ok(())
}

fn main() -> Result<(), io::Error> {
    let log = Arc::new(Mutex::new(vec![]));
    let listener = net::TcpListener::bind("0.0.0.0:7878")?;

    for stream in listener.incoming() {
        let stream = stream?;
        let thread_log = log.clone();
        thread::spawn(move || {
            let _ = handle_client(stream, &thread_log);
        });
    }
    Ok(())
}

Dependency Management with Cargo

Cargo.toml - A manifest file

[package]
name = "tcp-mailbox"
version = "0.1.0"

[dependencies]
async-std = "1" # would also choose 1.5
clap = "2.2" # would also choose 2.3

Cargo.lock - A lock file

  • contains a list of all project dependencies, de-facto versions and hashes of downloaded dependencies
  • when a version is yanked from Crates.io but you have the correct hash for it in a lock file Cargo will still let you download it and use it
    • still gives you warning about that version being problematic
  • should be committed to your repository for applications

Dependency resolution

  • uses "Zero-aware" SemVer for versioning
    • 1.3.5 is compatible with versions >= 1.3.5 and < 2.0.0
    • 0.3.5 is compatible with versions >= 0.3.5 and < 0.4.0
    • 0.0.3 only allows 0.0.3
  • allows version-incompatible transitive dependencies
    • except C/C++ dependencies
  • combines dependencies with compatible requirements as much as possible
  • allows path, git, and custom registry dependencies

How a dependency version is selected

  • for every requirement Cargo selects acceptable version intervals
    • [1.1.0; 1.6.0), [1.3.5, 2.0.0), [2.0.0; 3.0.0)
  • Cargo checks for interval intersections to reduce the number of unique intervals
    • [1.3.5; 1.6.0), [2.0.0; 3.0.0)
  • for every unique interval it selects the most recent available version
    • =1.5.18, =2.7.11
  • selected versions and corresponding package hashes are written into Cargo.lock

Dependency resolution: Example

└── my-app                      May install:
    β”œβ”€β”€ A = "1"
    β”‚   β”œβ”€β”€ X = "1"             A = "1.0.17"
    β”‚   └── Y = "1.3"     =>    B = "1.5.0"
    └── B = "1"                 X = "2.0.3"
        β”œβ”€β”€ X = "2"             X = "1.2.14"
        └── Y = "1.5"           Y = "1.8.5"

Where do dependencies come from?

  • Crates.io
  • Private registries (open-source, self-hosted, or hosted)
  • Git and Path dependencies
  • dependencies can be vendored

Notes:

  • private registries
    • hosted: Shipyard, JFrog, CloudSmith
    • self-hosted: Kellnr
    • open-source: Ktra - pronounced ['KO-to-ra], Meuse - [MΓΈs]

Shipyard and Kellnr will also generate API docs for you

Crates.io

  • default package registry
    • 100k crates and counting
    • every Rust Beta release is tested against all of them every week
  • packages aren't deleted, but yanked
    • if you have a correct hash for a yanked version in your Cargo.lock your build won't break (you still get a warning)

Docs.rs

  • complete API documentation for the whole Rust ecosystem
  • automatically publishes API documentation for every version of every crate on Crates.io
  • documentation for old versions stays up, too. Easy to switch between versions.
  • links across crates just work

Other kinds of dependencies

  • git dependencies
    • both git+https and git+ssh are allowed
    • can specify branch, tag, commit hash
    • when downloaded by Cargo exact commit hash used is written into Cargo.lock
  • path dependencies
    • both relative and absolute paths are allowed
    • common in workspaces

C Libraries as dependencies

  • Rust can call functions from C libraries using unsafe code
    • integrate with operating system APIs, frameworks, SDKs, etc.
    • talk to custom hardware
    • reuse existing code (SQLite, OpenSSL, libgit2, etc.)
  • building a crate that relies on C libraries often requires customization
    • done using build.rs file

build.rs file

  • compiled and executed before the rest of the package
  • can manipulate files, execute external programs, etc.
    • download / install custom SDKs
    • call cc, cmake, etc. to build C++ dependencies
    • execute bindgen to generate Rust bindings to C libraries
  • output can be used to set Cargo options dynamically
    println!("cargo:rustc-link-lib=gizmo");
    println!("cargo:rustc-link-search=native={}/gizmo/", library_path);

-sys crates

  • often Rust libraries that integrate with C are split into a pair of crates:
    • library-name-sys
      • thin wrapper around C functions
      • often all code is autogenerated by bindgen
    • library-name
      • depends on library-name-sys
      • exposes convenient and idiomatic Rust API to users
  • examples:
    • openssl and openssl-sys
    • zstd and zstd-sys
    • rusqlite and libsqlite3-sys

Deref Coercions

Motivation

Why does the following work?

struct Point {
    x: i32,
    y: i32
}

fn main() {
    let boxed_p = Box::new(Point { x: 1, y: 2 });
    println!("{}", boxed_p.x);
}

Box doesn't have a field named "x"!

Auto-Dereferencing

Rust automatically dereferences in certain cases. Like everything else, it must be explicitly requested:

  • Through a call or field access using the . operator
  • By explicitly dereferencing through *
  • When borrowing through &
  • This sometimes leads to the ugly &*-Pattern

This makes wrapper types very ergonomic and easy to use!


Dereferencing is described by the Deref and DerefMut-Traits.

impl<T> std::ops::Deref for Box<T> {
    type Target = T;

    fn deref(&self) -> &T {
        todo!()
    }
}

This call is introduced when dereferencing is requested.

Important deref behaviours

  • String -> &str
  • Vec -> &[T]

Functions that don't modify the lengths of a String or a Vector should accept a slice instead. The memory layout is chosen so that this is cost free.


fn print_me(message: &str) { println!("{}", message); }

fn main() {
    print_me("Foo");
    let a_string = String::from("Bar");
    print_me(&a_string);
    print_me(a_string.as_str())
}

Basic Design Patterns

.clone() before Lifetime Annotations

  • As a beginner, use .clone() to overcome compiler struggle.
  • It is alright! Refactor later.

String before &str

  • Use "owned" types before references.
  • It is alright! Refactor later.

String concatenation: Use format!()

  • Owned type String can be generated easily.
  • let s: String = format!("No fear from {}", "Rust Strings")

Clippy is your friend in linting

  • A collection of lints to catch common mistakes and improve your Rust code.
  • Installation: rustup component add clippy
  • Run: cargo clippy
  • Documentation: https://rust-lang.github.io/rust-clippy/stable/index.html

Pattern: From<T>, Into<T>

Conversion of one Type into another.

If X is From<T>, then T is Into<X> automatically.

The usage depends on the context.

Pattern: From<T>, Into<T> - Example

fn main() {
    let string = String::from("string slice");
    let string2: String = "string slice".into();
}

Pattern: What does ? do?

use std::fs::File;
use std::io::{self, Write};

enum MyError {
    FileWriteError,
}

impl From<io::Error> for MyError {
    fn from(e: io::Error) -> MyError {
        MyError::FileWriteError
    }
}

fn write_to_file_using_q() -> Result<(), MyError> {
    let mut file = File::create("my_best_friends.txt")?;
    file.write_all(b"This is a list of my best friends.")?;
    println!("I wrote to the file");
    Ok(())
}
// This is equivalent to:
fn write_to_file_using_match() -> Result<(), MyError> {
    let mut file = File::create("my_best_friends.txt")?;
    match file.write_all(b"This is a list of my best friends.") {
        Ok(v) => v,
        Err(e) => return Err(From::from(e)),
    }
    println!("I wrote to the file");
    Ok(())
}

fn main() {}

Pattern: AsRef<T>

Reference-to-reference-conversion. Indicates that a type can easily produce references to another type.

Pattern: AsRef<T> - Example

use std::fs::File;
use std::path::Path;
use std::path::PathBuf;

fn main() {
    open_file(&"test");
    let path_buf = PathBuf::from("test");
    open_file(&path_buf);
}

fn open_file<P: AsRef<Path>>(p: &P) {
    let path = p.as_ref();
    let file = File::open(path);
}

Pattern: Constructor new()

  • No constructors, but there is a convention.
  • An associated function to construct new "instances".
  • Use Default trait. Try using #[derive(Default)] first.
#![allow(unused)]
fn main() {
pub struct Stuff {
    value: i64,
}

impl Stuff {
    /// constructor by convention
    fn new(value: i64) -> Self {
        Self { value: value }
    }
}
}

Pattern: NewType

  • Use Rust type system to convey meaning to the user.
  • Especially for Types that should be similar to other Types.
  • Also used to impl external Traits on external Types
#![allow(unused)]
fn main() {
struct MyString(String);

impl MyString {
    //... my implementations for MyString
}
}

Pattern: Extending external Types

  • Recall that at least one of Trait or Type should be local to impl.
  • This pattern allows you to extend external Type using a local Trait.
trait VecExt {
    fn magic_number(&self) -> usize;
}

impl<T> VecExt for Vec<T> {
    fn magic_number(&self) -> usize {
        42
    }
}

fn main() {
    let v = vec![1, 2, 3, 4, 5];
    println!("Magic Number = {}", v.magic_number());
}

Pattern: Narrowing variable's scope

  • Shadowing allows you to redefine a variable with let keyword again.
  • Use it to get the inner Type, say in Option.
  • Use it to your advantage to make variable immutable after it's served its purpose.
// Get the inner type from Option
let array = [1, 2, 3, 4];
let item = array.get(1);
if let Some(item) = item { 
    println!("{:?}", item);
}

// Use shadowing to make the variable immutable outside of 
// where it needs to be mutable
let mut data = 42;
// change the data 
data += 1;
// Shadow using `let` again
let data = data; 
// data is immutable from now on

Documentation

rustdoc

Rust provides a standard documentation tool called rustdoc. It is commonly used through cargo doc.

Because of this Rust code is almost always documented in a common format.

std Documentation

The standard library documentation is hosted at https://doc.rust-lang.org/std/.

A local, offline version can be opened with:

$ rustup doc --std

Crate Documentation

Documentation for crates hosted on http://crates.io/ can be found at https://docs.rs/.

Some crates may also have other documentation found via the "Documentation" link on their listing in http://crates.io/.

Example: A Module

https://doc.rust-lang.org/std/vec

This page documents the vec module.

It starts with some examples, then lists any structs, traits, or functions the module exports.

How is it Generated?

rustdoc can read Rust code and Markdown documents.

//! and /// comments are read as Markdown.

#![allow(unused)]
fn main() {
//! Module documentation. (e.g. the 'Examples' part of `std::vec`).

/// Document functions, structs, traits and values.
/// This documents a function.
fn function_with_documentation() {}

// This comment will not be shown as documentation.
// The function itself will be.
fn function_without_documentation() {}
}

Example: Components

https://doc.rust-lang.org/std/string/#structs

Example: Functions

https://doc.rust-lang.org/std/string/struct.String.html#method.new

Code Examples

By default code blocks in documentation are tested.

#![allow(unused)]
fn main() {
/// ```rust
/// assert_eq!(always_true(), true)
/// ```
fn always_true() -> bool { true }
}

No-Run Examples

This code is marked 'do not run', as it doesn't terminate.

#![allow(unused)]
fn main() {
/// ```rust,no_run
/// serve();
/// ```
fn serve() -> ! { loop {} }
}

The arguments and return types of functions are links to their respective types.

The sidebar on the left offers quick navigate to other parts of the module.

Cargo integration

This command builds and opens the docs to your current project:

$ cargo doc --open

Normally only pub items are documented. You can change this:

$ cargo doc --document-private-items --open

Drop, panic, and abort


What happens in detail when values drop?

Drop-Order

Rust generally guarantees drop order (RFC1857)

Drop-Order

  • Values are dropped at the end of their scope
  • The order is the reverse introduction order
  • Unbound values drop immediately
  • Structure fields are dropped first to last

Destructors

Sometimes, certain actions must be taken before deallocation.

For this, the Drop trait can be implemented.


struct LevelDB {
    handle: *mut leveldb_database_t
}

impl Drop for LevelDB {
    fn drop(&mut self) {
        unsafe { leveldb_close(self.handle) };
    }
}

Warning!

Destructors cannot return errors.

Also possible

Explicit destruction of a value through a consuming function. This cannot be statically enforced currently.

Implementing a Drop-bomb (a failing destructor) can make sure this error is caught early.

Panics

Rust also has another error mechanism: panic!

fn main() {
    panicking_function();
}

fn panicking_function() {
    panic!("gosh, don't call me!");
}

In case of a panic, the following happens:

  • The current thread immediately halts
  • The stack is unwound
  • All affected values are dropped and their destructors run

Panics are implementation-wise similar to C++-Exceptions, but should only be used for fatal errors. They cannot be (normally) caught.

The affected thread dies.

Catching Panics

Panicking across FFI-boundaries is undefined behaviour. In these cases, panics must be caught. For cases like this, there are std::panic::catch-unwind and std::panic::resume-unwind.

Hooks

std::panic::set_hook allows setting a global handler that is run before the unwinding happens.


In general, Result is always the right way to propagate errors if they are to be handled.

Abort

In some environments, unwinding on panic! is not very meaningful. For those cases, rustc and cargo have a switch that immediately aborts the program on panic.

The panic hook is executed.

Double-panics

Panicking while a panic is being handled - for example in a destructor - invokes undefined behaviour. For that reason, the program will immediately abort.

Dynamic Dispatch


Sometimes, we want to take the decision of which implementation to use at runtime instead of letting the compiler monomorphize the code.

There's two approaches.

Dispatch through Enums

If the number of possible choices is limited, an Enum can be used:

#![allow(unused)]
fn main() {
enum Operation {
    Get,
    Set(String),
    Count
}

fn execute(op: Operation) {
    match op {
        Operation::Get => { }
        Operation::Set(s) => { }
        Operation::Count => { }
    }
}
}

Alternative Form

#![allow(unused)]
fn main() {
enum Operation {
    Get,
    Set(String),
    Count
}

impl Operation {
    fn execute(&self) {
        match &self {
            &Operation::Get => { }
            &Operation::Set(s) => { }
            &Operation::Count => { }
        }
    }
}
}

Recommendation

For best performance, try to minimize repeated matches on the enum.

See https://godbolt.org/z/8Yf4751qh

Note:

It takes multiple instructions to extract the tag from the enum and then jump to the appropriate block of code based on the value of that tag. If you use the Trait Objects we describe later, the kind of thing is encoded in the pointer to the dynamic dispatch table (or v-table) and so the CPU can just do two jumps instead of 'if this is 0, do X, else if this is a 1, do Y, else ...'.

Trait Objects

We can make references which do not know the type of the value but instead only know one particular trait that the value implements.

This is a trait object.

Internally, trait objects are a pair of pointers - one to a vtable and one the value itself.

Note:

The term vtable is short for virtual dispatch table, and it's basically a struct full of function pointers that is auto-generated by the compiler.

Usage

fn print(thing: &dyn std::fmt::Debug) {
    // I can call `std::fmt::Debug` methods on `thing`
    println!("{:?}", thing);
    // But I don't know what the *actual* type is
}

fn main() {
    print(&String::from("hello"));
    print(&123);
}

Limitations

  • You can only use one trait per object
    • Plus auto traits, like Send and Sync
  • This trait must fulfill certain conditions

Rules for dyn-compatible traits (abbreviated)

  • Must not have Self: Sized
  • No associated constants or GATs
  • All methods must:
    • Have no type parameters
    • Not use Self, only &self etc
    • Not return impl Trait

See the docs for details.

Note that these used to be called "object safety" rules before 1.83.

Performance

There is a small cost for jumping via the vtable, but it's cheaper than an enum match.

See https://godbolt.org/z/cheWrvM45

Trait Objects and Closures

Closure traits are dyn-compatible.

#![allow(unused)]
fn main() {
fn factory() -> Box<dyn Fn(i32) -> i32> {
    let num = 5;

    Box::new(move |x| x + num)
}
}

Is this a reference to a String?

Any type that is 'static + Sized implements std::any::Any.

We can use this to ask "is this reference actually a reference to this specific type?"

fn print_if_string(value: &dyn std::any::Any) {
    if let Some(s) = value.downcast_ref::<String>() {
        println!("It's a string({}): '{}'", s.len(), s);
    } else {
        println!("Not a string...");
    }
}

fn main() {
    print_if_string(&0);
    print_if_string(&String::from("cookie monster"));
}

Note:

Be sure to check the documentation because Any has some important restrictions.

Macros

What can macros do?

Macros can be used to things such as:

  • Generate repetitive code
  • Create Domain-Specific Languages (or DSLs)
  • Write things that would otherwise be hard without Macros

There are two kinds of macro

  • Declarative
  • Procedural

Declarative Macros

Declarative Macros

  • Defined using macro_rules!
  • Perform pattern matching and substitution
  • Can do repeated actions

Declarative Macros are:

  • Hygienic: expansion happens in a different 'syntax context'
  • Correct: they cannot expand to invalid code
  • Limited: they cannot, for example, pollute their expansion site

The vec! macro

fn main() {
    // You write:
    let v = vec![1, 2, 3];
    // The compiler sees (roughly):
    let v = {
        let mut temp_vec = Vec::new();
        temp_vec.push(1);
        temp_vec.push(2);
        temp_vec.push(3);
        temp_vec
    };
}

How does that work?

"Match zero or more expressions, and paste each into into a temp_vec.push() call"

#![allow(unused)]
fn main() {
#[macro_export]
macro_rules! vec {
    ( $( $x:expr ),* ) => {
        {
            let mut temp_vec = Vec::new();
            $(
                temp_vec.push($x);
            )*
            temp_vec
        }
    };
}
}

Note:

The actual macro is more complicated as it sets the Vec to have the correct capacity up front, to avoid re-allocation during the pushing of the values. Any new variables we introduce are given a colour to distinguish them from any the caller had created in the same scope.

println! and friends

println! is a macro, because:

  • Rust does not have variadic functions
  • Rust wants to type-check the call

Expanding println!

fn main() {
    // You write
    println!("Hello {}, aged {}", "Sam", 40);
    // The compiler sees (roughly):
    let arguments = Arguments {
        pieces: &["Hello ", ", aged ", "\n"],
        args: &[
            Argument { value: &"Sam", formatter: string_formatter },
            Argument { value: &40, formatter: integer_formatter },
        ],
    };
    ::std::io::_print(arguments);
}

Note:

This is a simplified example - the real output is slightly more complicated, and is in fact handled by a compiler built-in so you can't even see the macro source for yourself.

Downsides of Declarative Macros

  • Can be difficult to debug
  • Can be confusing to read and understand

When Should You Use Declarative Macros?

  • When there are no other good alternatives

Procedural macros

Procedural macros

  • A procedural macro is a function that takes some code as input, and produces some code.
  • It runs at compile time
  • It is written in Rust and must therefore be compiled before your program is

Three kinds of procedural macro

  • Custom #[derive] macros
  • Attribute-like macros
  • Function-like macros

Custom #[derive] macros

Work like the built-in Rust derives, once you've imported them:

use serde::Serialize;

#[derive(Debug, Clone, Serialize)]
struct Square {
    width: u32,
}

fn main() {
    let sq = Square { width: 25 };
    let json = serde_json::to_string(&sq).unwrap();
    println!("{}", json);
}

Often named after the traits they implement.

Note:

In the Rust Docs search results, the trait appears in blue, and the macro appears in green.

Rust can always work out whether you mean the trait or the macro, from the context.

Attribute-like macros

  • Placed above a type, function, or field
  • Can have optional arguments
#[tokio::main(worker_threads = 2)]
async fn main() {
    println!("Hello world");
}

Function-like macros

Called like a function:

let query = sqlx::query!("SELECT * FROM `person`");

Downsides of Procedural Macros

  • Can be difficult to debug
  • Slows down compilation a lot
  • Have to be stored in a separate crate
    • You're basically building compiler plug-ins at build time

When Should You Use Procedural Macros?

  • When it saves your users a sufficient amount of work

Property Testing

This is your brain

  • Everything we know is subject to bias
  • Everything we build reflects these biases

Problem:

Our code reflects our biases, our tests are often biased similarly

Solution:

Don't write tests

Solution:

Write expectations


  • Have the machine generate random test cases
  • Make beliefs explicit, force them to pay rent

This is called property testing

Crate: proptest

// this property is false, but perhaps
// not unreasonable to expect to be true
proptest! {
  #[test]
  fn mult_and_div(ref a in any::<usize>()) {
    let result = (a * 5) / 5;
    assert_eq!(result, a);
  }
}

Crate: proptest

$ cargo test
test mult_and_div ... FAILED
Test failed: attempt to multiply with overflow;
minimal failing input: ref a = 3689348814741910324
test result: FAILED. 0 passed; 1 failed

Crate: proptest

$ cat proptest-regressions/main.txt
 # Seeds for failure cases proptest has
 # generated. It is automatically read
 # and these particular cases re-run before
 # any novel cases are generated.
 # shrinks to ref a = 3689348814741910324
 xs 4050946508 1278147119 4151624343 875310407

Wonderful for testing codecs, serialization, compression, or any set of operations that should retain equality.

proptest! {
  #[test]
  fn compress_roundtrip(ref s in ".*") {
    let result = decompress(compress(s));
    assert_eq!(result, s);
  }
}

It's easy to generate more structured input, too

proptest! {
  #[test]
  fn parses_all_valid_dates(
    ref s in "[0-9]{4}-[0-9]{2}-[0-9]{2}"
  ) {
    parse_date(s).unwrap();
  }
}

Configuration is a great target

proptest! {
  #[test]
  fn doesnt_crash(
    bit in 0usize..1_000_000,
    page_sz_exponent in 0usize..30
  ) {
    let page_sz = 1 << page_sz_exponent;
    let mut bits = Bitfield::new(page_sz);
    assert_eq!(bits.set(bit, true), Change::Changed);
    assert_eq!(bits.get(bit), true);
  }
}

Miscellaneous Tips

  • Isolate business logic from IO concerns
  • Use assert! and debug_assert! on non-trivial things! this makes our "fuzzers" extremely effective
  • Try not to use unwrap() everywhere, at least use expect("helpful message") to speed up debugging
  • When propagating errors, include context that helps you get back to the root

Rust Projects Build Time

Understanding Rust projects build time

  • Cargo keeps track of changes you make and only rebuilds what is necessary
  • when building a crate rustc can do most of work in parallel, but some steps still require synchronization
  • depending on the type of build, times spent in different build phases may be vastly different.
    • debug vs release
    • various flags for rustc and LLVM
    • a build from scratch vs an incremental build

Producing a build timings report

rm -rf target/debug && cargo build --timings

.
└── target/
    β”œβ”€β”€ cargo-timings/
    β”‚   β”œβ”€β”€ cargo-timings.html
    β”‚   └── cargo-timings-<timestamp>.html
    β”œβ”€β”€ debug/
    └── ...

Timings Report

Cargo Build Report for Rust Analyzer

Reading the report

  • Cargo can't start building a crate until all its dependencies have been built.
    • Cargo only waits for rustc to produce an LLVM IR, further compilation by LLVM can run in background (purple)
  • a crate can't start building until its build.rs is built and finishes running (yellow)
  • if multiple crates depend on a single crate they often can start building in parallel
  • if a package is both a binary and a library then the binary is built after a library
    • integration tests, examples, benchmarks, and documentation tests all produce binaries and thus take extra time to build.

Actions you can take

Keep your crates independent of each other

  • Bad dependency graph:

    D -> C -> B -> A -> App
    
  • Good dependency graph (A, B, and C can be built in parallel and with greater incrementality):

      /-> A  \
    D ->  B  -> App
      \-> C  /
    

Note: To clarify

  • more parallelism -> the compiler can do more work at the same time
  • more incrementality -> the compiler can avoid doing work it's done before

Turn off unused features

  • Before:

    [dependencies]
    tokio = { version = "1", features = ["full"] } # build all of Tokio                .
    
  • After:

    [dependencies]
    tokio = { version = "1", features = ["net", "io-util", "rt-multi-thread"] }
    

Prefer pure-Rust dependencies

  • crate cannot be built before build.rs is compiled and executed

    • crates using C-dependencies have to rely on build.rs
    • build.rs might trigger C/C++ compilation which in turn is often slow
  • e.g.: rustls instead of openssl

Use multi-module integration tests:

  • Before (3 binaries)
β”œβ”€β”€ src/
β”‚   └── ...
└── tests/
    β”œβ”€β”€ account-management.rs
    β”œβ”€β”€ billing.rs
    └── reporting.rs
  • After (a single binary)
β”œβ”€β”€ src/
β”‚   └── ...
└── tests/
    └── my-app-tests/
        β”œβ”€β”€ main.rs   # includes the rest as modules       .
        β”œβ”€β”€ account-management.rs
        β”œβ”€β”€ billing.rs
        └── reporting.rs
  • Also benchmark and examples

Other tips

  • split your large package into a few smaller ones to improve build parallelization
  • extract your binaries into separate packages
  • remove unused dependencies

Tools

  • cargo-chef to speed up your docker builds
  • sccache for caching intermediary build artifacts across multiple projects and developers

Send & Sync


There are two special traits in Rust for concurrency semantics.

  • Send marks a structure safe to send between threads.
  • Sync marks a structure safe to share between threads.
    • (&T is Send)

These traits are what Rust uses to prevent data races.

They are automatically derived for all types if appropriate.

Automatically Derived

use std::thread;

#[derive(Debug)]
struct Thing;

// Can send between threads!
fn main() {
    let thing = Thing;

    thread::spawn(move || {
        println!("{:?}", thing);
    }).join().unwrap();
}

There are some notable types which are not Send or Sync.

Such as Rc, raw pointers, and UnsafeCell.

Example: Rc

use std::rc::Rc;
use std::thread;

// Does not work!
fn main() {
    let value = Rc::new(true);
    thread::spawn(move || {
        println!("{:?}", value);
    }).join().unwrap();
}

Example: Rc

error[E0277]: `Rc<bool>` cannot be sent between threads safely
    --> src/main.rs:7:19
     |
7    |       thread::spawn(move || {
     |       ------------- ^------
     |       |             |
     |  _____|_____________within this `{closure@src/main.rs:7:19: 7:26}`
     | |     |
     | |     required by a bound introduced by this call
8    | |         println!("{:?}", value);
9    | |     }).join().unwrap();
     | |_____^ `Rc<bool>` cannot be sent between threads safely
     |
     = help: within `{closure@src/main.rs:7:19: 7:26}`, the trait `Send` is not implemented for `Rc<bool>`, which is required by `{closure@src/main.rs:7:19: 7:26}: Send`
note: required because it's used within this closure
    --> src/main.rs:7:19
     |
7    |     thread::spawn(move || {
     |                   ^^^^^^^
note: required by a bound in `spawn`
    --> /home/mrg/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:675:8
     |
672  | pub fn spawn<F, T>(f: F) -> JoinHandle<T>
     |        ----- required by a bound in this function
   ...
675  |     F: Send + 'static,
     |        ^^^^ required by this bound in `spawn`
For more information about this error, try `rustc --explain E0277`.

Implementing

It's possible to add the implementation of Send and Sync to a type.

#![allow(unused)]
fn main() {
struct Thing(*mut String);

unsafe impl Send for Thing {}
unsafe impl Sync for Thing {}
}

In these cases, the task of thread safety is left to the implementor.

Relationships

If a type implements both Sync and Copy then it can also implement Send.

Relationships

A type &T can implement Send if the type T also implements Sync.

unsafe impl<'a, T: Sync + ?Sized> Send for &'a T {}

Relationships

A type &mut T can implement Send if the type T also implements Send.

unsafe impl<'a, T: Send + ?Sized> Send for &'a mut T {}

Consequences

What are the consequences of having Send and Sync?

Consequences

Carrying this information at the type system level allows driving data race bugs down to a compile time level.

Preventing this error class from reaching production systems.

Send and Sync are independent of the choice of concurrency (async, threaded, etc.).

Serialization and Deserialization (serde)

Serialization and Deserialization

https://serde.rs

Serialize & Deserialize

To make a Rust structure (de)serializable:

#[derive(Debug, serde::Serialize, serde::Deserialize)]
struct Move {
    id: usize,
    direction: Direction,
}

#[derive(Debug, serde::Serialize, serde::Deserialize)]
enum Direction { North, South, East, West }

Formats

Serde supports a number of formats, such as:

  • JSON
  • CBOR
  • YAML
  • TOML
  • BSON
  • MessagePack
  • ... More!

Did you enjoy that acronym salad?

Serialize

To JSON:

use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
struct Move {
    id: usize,
    direction: Direction,
}

#[derive(Debug, Serialize, Deserialize)]
enum Direction { North, South, East, West }

fn main() {
    let action = Move { id: 1, direction: West };
    let payload = serde_json::to_string(&action);
    println!("{:?}", payload);
}

Deserialize

From JSON:

use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
struct Move {
    id: usize,
    direction: Direction,
}

#[derive(Debug, Serialize, Deserialize)]
enum Direction { North, South, East, West }

fn main() {
    let payload = r#"{ "id": 1, "direction": "West" }"#;
    let action = serde_json::from_str::<Move>(&payload);
    println!("{:?}", action);
}

Transcode

use serde::{Serialize, Deserialize};
use serde_transcode::transcode;

#[derive(Debug, Serialize, Deserialize)]
struct Move {
    id: usize,
    direction: Direction,
}

#[derive(Debug, Serialize, Deserialize)]
enum Direction { North, South, East, West }

fn main() {
    let payload = r#"{ "id": 1, "direction": "West" }"#;
    let mut buffer = String::new();
    {
        let mut ser = toml::Serializer::new(&mut buffer);
        let mut de = serde_json::Deserializer::from_str(&payload);
        transcode(&mut de, &mut ser)
            .unwrap();
    }
    println!("{:?}", buffer);
}

Attributes

serde has a large number of attributes you can utilize:

#[serde(deny_unknown_fields)] // Be extra strict
struct Move {
    #[serde(default)] // Call usize::default()
    id: usize,
    #[serde(rename = "dir")] // Use a different name
    direction: Direction,
}

https://serde.rs/attributes.html

Testing


Testing is fundamental to Rust.

Unit, integration, and documentation tests all come built-in.

Organizing Tests

Tests typically end up in 1 of 4 possible locations:

  • Immediately beside the functionality tested (Unit Tests)
  • In a tests submodule (Unit Tests)
  • In documentation. (Documentation Test)
  • In the tests/ directory. (Integration Tests)

Unit Tests

  • Allows testing functionality in the same module and environment.
  • Typically exist immediately near the functionality.
  • Good for testing to make sure a single action works.

Unit Tests

  • Allows testing as if the functionality is being used elsewhere in the project.
  • For testing private APIs and functionality.
  • Good for testing expected processes and use cases.

tests Submodule

#![allow(unused)]
fn main() {
enum Direction { North, South, East, West }

fn is_north(dir: Direction) -> bool {
    match dir {
        Direction::North => true,
        _ => false,
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn is_north_works() {
        assert!(is_north(Direction::North) == true);
        assert!(is_north(Direction::South) == false);
    }
}
}

tests Submodule

$ cargo test
running 1 test
test tests::is_north_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Documentation Tests

  • Allows testing public functionality.
  • Is displayed in rustdoc output.
  • For demonstrating expected use cases and examples.

Documentation Tests

#![allow(unused)]
fn main() {
/// ```rust
/// use example::Direction;
/// let way_home = Direction::North;
/// ```
pub enum Direction { North, South, East, West }
}

Documentation Tests

$ cargo test
running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

   Doc-tests example

running 1 test
test Direction_0 ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

Integration Tests

  • Tests as if the crate is an external dependency.
  • Intended for longer or full-function tests.

Integration Tests

./tests/basic.rs

use example::{is_north, Direction};

#[test]
fn is_north_works() {
    assert!(is_north(Direction::North) == true);
    assert!(is_north(Direction::South) == false);
}

Integration Tests

$ cargo test
running 1 test
test is_north_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured

     Running target/debug/deps/example-9f39afa5d2a1c6bf

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

   Doc-tests example

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured

std Library Tour


It's time for a tour of some interesting parts in std.

We will focus on parts we have not otherwise covered.

PhantomData

std::marker::PhantomData

Zero-sized types are used to mark things that "act like" they own a T.

These are useful for types which require markers, generics, or use unsafe code.

use std::marker::PhantomData;

struct HttpRequest<ResponseValue> {
    // Eventually returns this type.
    response_value: PhantomData<ResponseValue>,
}

fn main() {}

Command

std::process::Command

A process builder, providing fine-grained control over how a new process should be spawned.

Used for interacting with other executables.

#![allow(unused)]
fn main() {
use std::process::Command;

fn example() {
    Command::new("ls")
            .args(&["-l", "-a"])
            .spawn()
            .expect("ls command failed to start");
}
}

Filesystem Manipulation

std::fs & std::path

Path handling and file manipulation.

use std::fs::{File, canonicalize};
use std::io::Write;

fn main() {
    let mut file = File::create("foo.txt").unwrap();
    file.write_all(b"Hello, world!").unwrap();
    
    let path = canonicalize("foo.txt").unwrap();
        
    let components: Vec<_> = path.components().collect();
    println!("{:?}", components);
}

Using Cargo

Crates and Packages

  • Rust code is arranged into packages
  • a package is described by a Cargo.toml file
  • building a package can produce a single library, and 0 or more executables
    • these are called crates
    • unlike C/C++ compilers that compile code file by file, rustc treat all files for a crate as a single compilation unit
  • Cargo calls rustc to build each crate in the package.

Cargo

  • standard build toolchain for Rust projects
  • shipped with rustc

What Cargo does

  • resolves and installs project dependencies
  • runs rustc to compile your code
  • runs a linker to produce libraries and executables
  • runs tests and benchmarks
  • builds documentation and runs documentation tests
  • runs additional tools like code formatter and linter
  • can be extended with additional custom commands

Cargo does Everything!

Cargo commands

  • cargo new my-app
  • cargo run - runs a debug build of your program, builds it if necessary
  • cargo fmt - formats your code
  • cargo check - only reports errors, doesn't actually compile your code
  • cargo clippy - runs a linter
  • cargo test - builds your project if necessary and runs tests
    • by default runs unit tests, integration tests, and documentation tests
    • you can select which tests to run
  • cargo build --release - produces an optimized version of your application or library

Cargo commands (cont)

There are many more!

  • cargo bench - builds an optimized version of your project and runs benchmarks
  • cargo doc --open - builds documentation for your project and all its dependencies and opens it in a browser
  • cargo run --example ... - runs an example from your examples/ directory

See Cargo Book for more.

Cargo command arguments

Most cargo commands accept a few common arguments:

  • +toolchain
  • --target
  • --features, --all-features, and --no-default-features
  • --timings

Putting it all together:

cargo +nightly run --target x86_64-apple-darwin --features "a b c dependency/feature" --timings

  • use nightly Rust
  • enable features a, b, c, and a feature feature of a dependency crate
  • (assuming we use Apple Silicon computer) build a macOS executable for x86 processor and run it using built-in emulation (Rosetta2)
  • collect statistics during the build process and generate a report

Features

  • allows conditional compilation
    • support for different operating systems
    • adapters for different libraries
    • optional extensions
  • can expose features from transitive dependencies

Using Features

  • in code:

    #[cfg(feature = "json")]
    mod json_support;
    
  • in Cargo.toml

    [features]
    json = [] # list of features that this feature depends on
    default = [] # "json" feature is not enabled by default
    
  • when someone uses your dependency

    my-lib = { version: "1.0.0", features = ["json"] }
    

Anatomy of Rust package

cargo new hello-world
β”œβ”€β”€ Cargo.lock
β”œβ”€β”€ Cargo.toml
└── src/
    └── main.rs

Anatomy of Rust package

β”œβ”€β”€ Cargo.lock
β”œβ”€β”€ Cargo.toml
β”œβ”€β”€ build.rs
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ lib.rs
β”‚   β”œβ”€β”€ main.rs
β”‚   β”œβ”€β”€ ...
β”‚   └── bin/
β”‚       β”œβ”€β”€ additional-executable.rs
β”‚       └── multi-file-executable/
β”‚           β”œβ”€β”€ main.rs
β”‚           └── ...
β”œβ”€β”€ benches/
β”‚   └── ...
β”œβ”€β”€ examples/
β”‚   └── ...
└── tests/
    β”œβ”€β”€ some-integration-tests.rs
    └── multi-file-test/
        β”œβ”€β”€ main.rs
        └── ...

Cargo.toml - A manifest file

[package]
name = "tcp-mailbox"
version = "0.1.0"

[dependencies]
async-std = "1" # would also choose 1.5
clap = "2.2" # would also choose 2.3

Using Types to encode State

Systems have state

The system state is the product of all the things in the system that can be varied.

State can often be sub-divided into smaller units - some independent, some connected.

Examples?

A GPIO pin on a microcontroller. It typically has:

  • An output driver, that allows it to drive current out of the pin (or not)
  • An input buffer, that allows the CPU to read the state of the pin
  • An output level (high or low)

Functionality can depend on state

Is this program correct?

let p = GpioPin::new(7);
if p.is_low() {
    println!("Button is pressed");
}

Note:

  • What if the pin defaults to "output mode"?
  • What does it mean to read the level of a pin in output mode?

Ignoring the problem

You don't have to solve this problem.

See, Arduino, which happily uses int for GPIO pin IDs, not values of custom types.

But we can do better?

We've got a type system with traits and a powerful static analysis engine...

let p = OutputPin::new(7);
if p.is_low() {
    println!("Button is pressed");
}
1 | struct OutputPin {}
  | ---------------- method `is_low` not found for this struct
...
9 |     if p.is_low() {
  |          ^^^^^^ method not found in `OutputPin`

How would you change state?

With a method that takes ownership:

impl OutputPin {
    fn into_input(self) -> InputPin {
        poke_hardware_registers();
        InputPin { self.pin_id }        
    }
}

impl InputPin {
    fn into_output(self) -> OutputPin {
        poke_hardware_registers();
        OutputPin { self.pin_id }        
    }
}

Note:

The function call poke_hardware_registers() is a placeholder for whatever work you need to do on that microcontroller to change the state of that pin.

Non-Zero Sized Types

This type consumes 1 byte of RAM (maybe 4 bytes, with alignment). Is that strictly required?

#![allow(unused)]
fn main() {
struct OutputPin {
    pin_id: u8
}
}

Zero Sized Types

This type is of zero size. But any method call on it has access to the pin number, through the type system.

struct OutputPin<const PIN: u8> { _inner: () }

impl<const PIN: u8> OutputPin<PIN> {
    fn print_id(&self) {
        println!("I am pin {}", PIN);
    }
}

fn main() {
    let p: OutputPin<5> = OutputPin { _inner: () };
    p.print_id();
    println!("size is {}", std::mem::size_of_val(&p));
}

Note:

The _inner field is not pub, and therefore ensures values of this type can't be constructed outside the module it was defined in. This forces people to use the new functions you provide!

Generic Pin Modes?

#![allow(unused)]
fn main() {
pub trait PinMode {}

pub struct Output {}
impl PinMode for Output {}

pub struct Input {}
impl PinMode for Input {}

pub struct Pin<MODE> where MODE: PinMode { mode: MODE }

impl Pin<Output> {
    pub fn set_high(&self) { }
    pub fn set_low(&self) { }
}

impl Pin<Input> {
    pub fn is_high(&self) -> bool { todo!() }
    pub fn is_low(&self) -> bool { todo!() }
}
}

Preventing mis-use.

Who can impl PinMode for Type? Turns out anyone can...

use my_driver_crate::{Pin, PinMode};

struct OnFire {}
impl PinMode for OnFire {}

let pin: Pin<OnFire> = ...;

Sealing traits

#![allow(unused)]
fn main() {
mod private { pub trait Sealed {} }
pub trait PinMode: private::Sealed {}

pub struct Output {}
impl PinMode for Output {}
impl private::Sealed for Output {}

pub struct Input {}
impl PinMode for Input {}
impl private::Sealed for Input {}
}

Note:

The 'private' module is not pub, but the trait within it is pub. This means you cannot implement the PinMode trait yourself unless you can also 'see' a path to the private::Sealed trait - which is only visible within this module.

It's a trick to ensure only this module can implement the trait, but anyone else can see the trait and which types implement it.

WebAssembly

What?

WebAssembly (WASM) enables running Rust (among others) in a sandbox environment, including the browser.

WebAssembly is supported as a compile target.

High performance

WASM is built with speed in mind and executes almost as fast as native code.

The WASM sandbox

In its initial state, WASM does only provide memory and execution, no functionality.

This can be added through the host system in various ways.

Hello World

(module
    ;; Import the required fd_write WASI function which will write the given io vectors to stdout
    ;; The function signature for fd_write is:
    ;; (File Descriptor, *iovs, iovs_len, nwritten) -> Returns number of bytes written
    (import "wasi_unstable" "fd_write" (func $fd_write (param i32 i32 i32 i32) (result i32)))

    (memory 1)
    (export "memory" (memory 0))

    ;; Write 'hello world\n' to memory at an offset of 8 bytes
    ;; Note the trailing newline which is required for the text to appear
    (data (i32.const 8) "hello world\n")

    (func $main (export "_start")
        ;; Creating a new io vector within linear memory
        (i32.store (i32.const 0) (i32.const 8))  ;; iov.iov_base - This is a pointer to the start of the 'hello world\n' string
        (i32.store (i32.const 4) (i32.const 12))  ;; iov.iov_len - The length of the 'hello world\n' string

        (call $fd_write
            (i32.const 1) ;; file_descriptor - 1 for stdout
            (i32.const 0) ;; *iovs - The pointer to the iov array, which is stored at memory location 0
            (i32.const 1) ;; iovs_len - We're printing 1 string stored in an iov - so one.
            (i32.const 20) ;; nwritten - A place in memory to store the number of bytes written
        )
        drop ;; Discard the number of bytes written from the top of the stack
    )
)

WASM targets in Rust

Rust ships 3 WASM targets:

  • wasm32-unknown-emscripten (legacy)
    • ships with an implementation of libc for WASM
  • wasm32-unknown-unknown (stable)
    • direct compilation to WASM, with no additional tooling
  • wasm32-wasi (in development)
    • WASM with support for interface types, a structured way of adding capabilities

Installation: rustup Target

rustup allows installing multiple compilation targets.

$ rustup target install wasm32-unknown-unknown
$ rustup target install wasm32-wasi

Installing a host runtime

$ curl --proto '=https' --tlsv1.2 -sSf https://wasmtime.dev/install.sh | bash
  • Currently need building from git: https://github.com/bytecodealliance/wasmtime

Usage: Hello World!

$ cargo new hello
    Created binary (application) `hello` package
$ cargo build --target wasm32-wasi
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
$ wasmtime target/wasm32-wasi/debug/hello.wasm
Hello, world!

A Rust & WASM Tutorial

https://ferrous-systems.github.io/wasm-training-2022/

Unsafe Rust


Rust's type system provides many guarantees, but sometimes, they make specific solutions hard or impossible.

For that reason, Rust has the concept of "unsafe code".


Unsafe code is allowed to:

  • freely access memory
  • dereference raw pointers
  • call external functions
  • declare values Send and Sync
  • write to unsynced global variables

By definition, these are not unsafe:

  • conversion to raw pointers
  • memory leaks

Making pointers

#![allow(unused_variables)]
fn main() {
    let mut x = 1;
    // The old way
    let p1 = &x as *const i32;
    let p2 = &mut x as *mut i32;
    // Added in 1.51, was unsafe until 1.82
    let p1 = core::ptr::addr_of!(x);
    let p2 = core::ptr::addr_of_mut!(x);
    // As of Rust 1.82, use this instead:
    let p1 = &raw const x;
    let p2 = &raw mut x;    
}

Unsafe code should never:

  • be used to manage memory managed by a different allocator (e.g. construct a std:::vec::Vec from a C++ vector and drop it)
  • cheat on the borrow checker, for example by changing lifetimes or mutability of a type. The most common source of "but I was so sure that works" bugs.

Rust's little secret

When implementing data structures, unsafe isn't unusual.

Safe Rust is the worst language to implement linked lists. There's a full text on this


Unsafe code must always be marked unsafe.

fn main() {
    let mut x = 1;
    let p = &raw mut x;
    unsafe {
        my_write(p, 100);
    }
    println!("x is {} (or {})", x, unsafe { p.read() });
}

pub unsafe fn my_write<T>(p: *mut T, new_value: T) {
    p.write(new_value)
}

Note:

Modern Rust generally tries to have only a small number of unsafe operations per unsafe block. And any unsafe function should still use unsafe blocks for the unsafe code within, even though the function itself is unsafe to call.

Try running clippy on this example and play with clippy::multiple_unsafe_ops_per_block and clippy::undocumented_unsafe_blocks. Then try "Edition 2024".

Traps of unsafe

  • Not all examples are that simple. unsafe must guarantee the invariants that Rust expects.
  • This especially applies to ownership and mutable borrowing
  • unsafe can lead to a value having 2 owners -> double free
  • unsafe can make immutable data temporarily mutable, which will lead to broken promises and tears.

Rust allows you to shoot yourself in the foot, it just requires you to take your gun out of the holster and remove the safety first.

Practical example

As Rust forbids aliasing, it is impossible in safe Rust to split a slice into 2 non-overlapping parts.

#![allow(unused)]
fn main() {
#[inline]
fn split_at_mut<T>(value: &mut [T], mid: usize) -> (&mut [T], &mut [T]) {
    let len = value.len();
    let ptr = value.as_mut_ptr();
    assert!(mid <= len);
    unsafe {
        (std::slice::from_raw_parts_mut(ptr, mid),
         std::slice::from_raw_parts_mut(ptr.add(mid), len - mid))
    }
}
}

Highlight unsafe code in VSCode

  • Will highlight which function calls are unsafe inside an unsafe block
  • Helpful for longer unsafe blocks
{
    "editor.semanticTokenColorCustomizations": {
        "rules": {
            "*.unsafe:rust": "#ff00ff"
        }
    }
}

Foreign Function Interface (FFI)

What is it?

  • For interfacing Rust code with foreign functions
  • For interfacing foreign code with Rust functions

Application Binary Interface (ABI)

(Like an API, but for machine code calling machine code)


The Rust ABI is not stable.


Rust also supports your platform's ABI(s).

(There might be several...)

Note:

Processors don't understand 'function parameters'. They have registers, and they have the stack. The compiler of the caller function must decide where to place each argument - either in a register or on the stack. The compiler of the callee function (the function being called) must decide where to retrieve each argument from. There are also decisions to be made regarding which registers a function can freely re-use, and which registers must be carefully restore to their initial value on return. If a function can freely re-use a register, then the caller needs to think about saving and restoring the register contents. If each function is responsible to putting things back exactly as they were, then the caller has less work to do, but maybe you're saving and restoring registers that no-one cares about. When the stack is used, you also have agree whether the caller or the callee is responsible for resetting the stack point to where it was before the caller called the callee.

Think also what happens if you have a floating-point unit - do f32 and f64 values go into FPU registers, or are they placed in integer registers?

Clearly these two compilers must agree, otherwise the callee will not receive the correct arguments and your program will perform UB!

x86 is ~40 years old and many standards exist on how to do this. See https://en.wikipedia.org/wiki/X86_calling_conventions#Historical_background.

AMD64 is only ~20 years old, and there are two standards - the Microsoft one for Windows, and the Linux one (which is based on System V UNIX).

ARM64 has one main standard (the Arm Architecture Procedure Call Standard, or AAPCS), plus one Microsoft invented which works much more like AMD64 and lets ARM64 call emulated AMD64 much more easily. That's called ARM64EC.


CPUs have registers, and they have a pointer to the stack (in RAM)

Where does this function find its arguments? Where does the return value go?

#![allow(unused)]
fn main() {
struct SomeStruct(u32, f64);

fn hello(param1: i32, param2: f64) -> SomeStruct { todo!() }
}

Libraries

Your Rust code might want to interact with shared/static libraries.

Or be one.

Efficient bindings

There are no conversion costs moving from C to Rust or vice-versa

Using Rust from C

We have this amazing Rust library, we want to use in our existing C project.

#![allow(unused)]
fn main() {
struct MagicAdder {
	amount: u32
}

impl MagicAdder {
	fn new(amount: u32) -> MagicAdder {
		MagicAdder {
			amount
		}
	}

	fn process_value(&self, value: u32) -> u32 {
		self.amount + value
	}
}
}

Things TODO

  • Tell C these functions exist
  • Tell Rust to use C-compatible types and functions
  • Link the external code as a library
  • Provide some C types that match the Rust types
  • Call our Rust functions

C-flavoured Rust Code

#![allow(unused)]
fn main() {
#[repr(C)]
struct MagicAdder {
	amount: u32
}

impl MagicAdder {
    fn new(amount: u32) -> MagicAdder { todo!() }
    fn process_value(&self, value: u32) -> u32 { todo!() }
}

#[no_mangle]
extern "C" fn magicadder_new(amount: u32) -> MagicAdder {
	MagicAdder::new(amount)
}

#[no_mangle]
extern "C" fn magicadder_process_value(adder: *const MagicAdder, value: u32) -> u32 {
	if let Some(ma) = unsafe { adder.as_ref() } {
		ma.process_value(value)
	} else {
		0
	}
}
}

Note:

The .as_ref() method on pointers requires that the pointer either be null, or that it point at a valid, aligned, fully initialized object. If they just feed you a random integer, bad things will happen, and we can't tell if they've done that!

Matching C header

/// Designed to have the exact same shape as the Rust version
typedef struct magic_adder_t {
	uint32_t amount;
} magic_adder_t;

/// Wraps MagicAdder::new
magic_adder_t magicadder_new(uint32_t amount);

/// Wraps MagicAdder::process_value
uint32_t magicadder_process_value(magic_adder_t* self, uint32_t value);

Making a library

You can tell rustc to make:

  • binaries (bin)
  • libraries (lib)
    • rlib
    • dylib
    • staticlib
    • cdylib

Note:

See https://doc.rust-lang.org/reference/linkage.html

Cargo.toml

[package]
name = "magic_adder"
version = "1.0.0"
edition = "2021"

[lib]
crate-type = ["lib", "staticlib", "cdylib"]

Note:

See ./examples/ffi_use_rust_in_c for a working example.

Using C from Rust


We have this amazing C library, we want to use as-is in our Rust project.

cool_library.h:

/** Parse a null-terminated string */
unsigned int cool_library_function(const unsigned char* p);

cool_library.c:

#include "hello.h"

unsigned int cool_library_function(const unsigned char* s) {
    unsigned int result = 0;
    for(const char* p = s; *p; p++) {
        result *= 10;
        if ((*p < '0') || (*p > '9')) { return 0; }
        result += (*p - '0');
    }
    return result;
}

Things TODO

  • Tell Rust these functions exist
  • Link the external code as a library
  • Call those with unsafe { ... }
  • Transmute data for C functions

Naming things is hard

#![allow(unused)]
#![allow(non_camel_case_types, non_upper_case_globals, non_snake_case)]
fn main() {
}

 

Disables some Rust naming lints

Binding functions

/** Parse a null-terminated string */
unsigned int cool_library_function(const char* p);
#![allow(unused)]
fn main() {
use std::ffi::c_char; // also in core::ffi

extern "C" {
    // We state that this function exists, but there's no definition.
    // The linker looks for this 'symbol name' in the other objects
    fn cool_library_function(p: *const c_char) -> u32;
}
}

Note:

You cannot do extern "C" fn some_function(); with no function body - you must use the block.

Changes in Rust 1.82

You can now mark external functions as safe:

unsafe extern "C" {
    // This function is basically impossible to call wrong, so let's mark it safe
    safe fn do_stuff(x: i32) -> i32;
}

fn main() {
    dbg!(do_stuff(3));
}

#[unsafe(export_name = "do_stuff")]
extern "C" fn my_do_stuff(x: i32) -> i32 {
    x + 1
}

Note:

You can only mark an extern function as safe within an unsafe extern block.

Also note that in Rust 1.82, export_name became an unsafe attribute, along with no_mangle and link_section. The old form is still allowed in Edition 2021 and earlier (for backwards compatibility), but you will have to use the new syntax in Edition 2024.

Primitive types

Some C types have direct Rust equivalents. See also core::ffi.

CRust
int32_ti32
unsigned intc_uint
unsigned charu8 (not char!)
void()
char*CStr or *const c_char
T*Box<T> (if T is sized)

Note:

On some systems, a C char is not 8 bits in size. Rust does not support those platforms, and likely never will. Rust does support platforms where int is only 16-bits in size.

If T: ?Sized, then Box<T> may be larger than a single pointer as it will also need to hold the length information. That means it is no longer the same size and layout as T*.

Calling this

use std::ffi::{c_char, c_uint};

extern "C" {
    fn cool_library_function(p: *const c_char) -> c_uint;
}

fn main() {
    let s = c"123"; // <-- a null-terminated string!
    let result: u32 = unsafe { cool_library_function(s.as_ptr()) };
    println!("cool_library_function({s:?}) => {result}");
}

Some more specific details...

Cargo (build-system) support

  • Build native code via build-dependency crates:
  • build.rs can give linker extra arguments

Opaque types

When not knowing (or caring) about internal layout, opaque structs can be used.

#![allow(unused)]
fn main() {
/// This is like a 'struct FoobarContext;' in C
#[repr(C)]
pub struct FoobarContext { _priv: [i32; 0] }

extern "C" {
	fn foobar_init() -> *mut FoobarContext;
	fn foobar_do(ctx: *mut FoobarContext, foo: i32);
	fn foobar_destroy(ctx: *mut FoobarContext);
}

/// Use this in your Rust code
pub struct FoobarHandle(*mut FoobarContext);
}

Callbacks

extern "C" applies to function pointers given to extern functions too.

use std::ffi::c_void;

pub type FooCallback = extern "C" fn(state: *mut c_void);

extern "C" {
    pub fn libfoo_register_callback(state: *mut c_void, cb: FooCallback);
}

extern "C" fn my_callback(_state: *mut c_void) {
    // Do stuff here
}

fn main() {
    unsafe { libfoo_register_callback(core::ptr::null_mut(), my_callback); }
}

But this is a lot of manual work?

There's a better way!

Making C headers from Rust

cbindgen

Making Rust source from C headers

bindgen

Loading auto-generated Rust source

#[allow(non_camel_case_types, non_snake_case, non_upper_case_globals)]
pub mod bindings {
    include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
}

Calling these tools:

  • On the command line
  • Executing a command in build.rs
  • Calling a library function in build.rs

sys crates

xxxx-sys is a Rust crate that provides a thin wrapper around some C library xxxx.

You normally have a higher-level xxxx crate that provides a Rust interface

Note:

For example libgit2-sys (wraps libgit2), or nrfxlib-sys (nRF9160 support)

Working With Nightly

Why?

  • There are many features which are not yet stable
    • language
    • library
    • cargo, rustdoc, etc
  • Dependencies may require nightly
  • You can't wait for the train
  • Compile times and error messages are sometimes better (sometimes not)

Using Nightly

Use rustup to override the version used in a specific directory.

cd /nightly_project
rustup override set nightly-2024-02-01

Pinning a version

You can also store the information in your repo:

$ cat rust-toolchain.toml
[toolchain]
channel = "nightly-2024-02-01"

Langauge features

Language features are parts of Rust we haven't quite agreed on yet, but there's an implementation there to be tested. Each one has a tracking issue.

Some examples:

RPIT, RPITIT, AFIT, and more

  • Return Position Impl Trait
  • Return Position Impl Trait in Trait
  • Async Function in Trait
  • A handy guide

Note:

  • RPIT would be something like fn fetch() -> impl Debug.
  • RPITIT is a trait method that has impl trait in the return position.
  • AFIT is a trait method like async fn do_stuff()

Enabling Language Features

To enable, add the feature attribute to your top-level module:

#![feature(riscv_target_feature)]

Compiler features

Unstable compiler flags start with -Z.

See them all with:

rustc +nightly -Z help

Library features

Some parts of the Standard Library are 'unstable' and only available on nightly.

Nothing special required to opt-in, just nightly Rust.

You can see them in the docs, like slice::new_zeroed_slice()

Cargo features

You can specify unstable cargo features in your .cargo/config.toml:

[unstable]
mtime-on-use = true

The Standard Library

  • The Standard Library is written in Rust
  • It must therefore be compiled
  • But stable rustc cannot compile the Standard Library
  • => rustup gives you a pre-compiled Standard Library for your target

Note:

Why does it require nightly? Because it's full of unstable library APIs, and makes use of unstable compiler features.

So how do they build libstd during a toolchain release? With a secret magic flag that makes stable Rust look like nightly Rust for the purposes of building the standard library. You should not use this flag yourself.

Compiling the Standard Library

  • If you have nightly rust, you can compile it from source yourself
  • rustup component add rust-src
  • rustc -Z build-std=core,alloc ..., or give cargo this config:
[unstable]
build-std = ["core", "alloc"]

Availability

  • Nightly doesn't always succesfully build
  • rustup can go back in time and find a working build
  • rustup-component-history can help

The books

The Shape of a Rust Program


  • Embedded systems come in many shapes and sizes
  • Rust tries to be flexible and support developers

Some Terms

  • Binary
  • Static Library
  • Dynamic Library
  • RTOS

Note:

A binary is a collection of executable machine code and data, typically but not exclusively in ELF format, with a defined 'entry point'. The CPU should jump to the address of the 'entry point' and start executing from there.

A static library is an archive containing object code, typically with a .a extension. The object code contains gaps where the run-time addresses need to be plugged in by a linker, before it can be considered executable code.

A dynamic library looks more like a binary (and is typically in ELF format), but it still contains gaps that need to be plugged by a dynamic linker (also known as a loader). Linux .so files and Windows .dll files are in this category.

A Real-Time Operating System manages the execution of one or more tasks, typically with pre-emptive context switching, but not exclusively.

1) Flat Binaries

  • Top-level is a Rust Binary
    • Typically main.rs
  • Program runs on start-up
    • Started by the reset vector, or the boot ROM
  • Can pull in an RTOS or async runtime, as a static library
  • Linker sees everything
  • Flat address space
  • The most common approach
  • See RTIC, embassy, Eclipse ThreadX, or FreeRTOS

2) Bootloader + Application

  • Two binaries, linked separately
  • First binary (e.g. bootloader) starts the second (e.g. application)
  • Sometimes the second calls back into the first
  • Use linker scripts to divide up memory
  • Also often used to implement Arm Secure Mode (TrustZone) APIs
  • See RP2350 HAL or the nRF9160 SPM

Note:

The RP2350 Bootloader is in ROM, but it's still a binary. It inspects the application in flash (optionally performing a hash check or a cryptographic signature check) before jumping to the application. The application can then make calls back into the ROM bootloader, by calling a function that lives at a well-known address (or that has a function pointer that is stored at a well-known address). The bootloader in ROM starts in the Arm Cortex-M33's 'Secure' state, but can switch the CPU into 'non-secure' state before running the application, if that's what the application metadata says to do.

The nRF9160 Secure Partition Manager is similar, but must be written to the start of the nRF9160's flash. It also expects the exclusive use of a particular block of SRAM and so you must avoid that region of SRAM in your application. See the nrf9160-hal's memory.x file for an example.

3) Tasks are Libraries

  • Each 'task' is a static library
  • The OS provides a 'skeleton' binary
    • It imports and calls your tasks
  • Tasks provide an entry point, and some mechanism to call the OS
    • Typically SVC calls
  • See Zephyr and RTEMS

Note:

SVC is the Arm mnemonic for performing a system call. These are also known as 'software interrupts' and earlier Arm architectures used the mnemonic SWI.

4) Tasks are Binaries (dynamic linking)

  • Some systems have multiple 'flash slots'
    • The run-time address is not known at link time
  • Enforces isolation between tasks - has to use SVC calls
  • Rust does not currently support RWPI or ROPI code
  • Rust has some support for PIC/PIE code
    • But then you have to write a dynamic linker for fix the code at load time
  • See TockOS or Linux/macOS/Windows/QNX...

Note:

As of 2024, TockOS only allows Rust applications to be installed in the first flash slot, for this reason. C applications can be installed into any flash slot, because ROPI/RWPI works for C.

RWPI is read-write position independence, and involves static data not having a fixed address but instead being accessed via a reserved register that always contains the 'static base pointer' (i.e. the base address of the RW data).

ROPI is read-only position independence, and involves executable code not having a fixed address but instead being accessed via PC-relative jumps.

PIC/PIE is position independent code / executable. This involves non-PC-relative jumps to code or data being made via a Global Offset Table (GOT). The GOT needs modifying at load time, once you know where everything is in memory. Linux programs and shared libraries are PIE/PIC.

5) Tasks are Binaries (static linking)

  • Like (4), but you have a tool work out the linking once you have all the binaries
  • Doesn't require ROPI or RWPI
  • But you have to know the full set of tasks in advance
  • See Hubris

Summary

  1. Flat Binaries
  2. Bootloader + Application
  3. Tasks are Libraries
  4. Tasks are Binaries (dynamic linking)
  5. Tasks are Binaries (static linking)

Remember, these are embedded systems issues, not necessarily Rust-specific issues.

Overview of Bare-Metal Rust

A Layered Approach

When building bare-metal Systems in Rust, we use Rust crates to help us build a modular system.

The elements in our system are:

  • The program you are writing
  • The MCU are running on
  • The PCB (or Board) your MCU is on
  • The external devices connected to your MCU

The Layers

To support these elements, we (usually) have these layers.

  • Application
  • Board Support
  • External Drivers (e.g. SPI LCD Driver)
  • Hardware Abstraction Layer Traits
  • MCU Hardware Abstraction Layer Implementation
  • MCU Peripheral Access Crate
  • Core Peripherals
  • Core Runtime

---

%3appApplication(my_application)bscBoard Support(nrf52840_dk)app->bsclcd_driverSPI LCD Driver(ssd1306)app->lcd_driverrtCore Runtime(cortex_m_rt)app->rthalMCU HAL Implementation(nrf52480_hal)bsc->halhal_traitsHAL Traits(embedded_hal)bsc->hal_traitspacMCU PAC(nrf52840-pac)hal->pachal->hal_traitsimplementspac->rtcpCore Peripherals(cortex_m)pac->cplcd_driver->hal_traitsrt->cp

Don't worry

There's a lot here. We're going to take it step by step, starting at the bottom.

Booting a Cortex-M Microcontroller


In this deck, we're talking specifically about Arm Cortex-M based microcontrollers.

Other Arm processors, and processors from other companies may vary.

Terms

  • Processor - the core that executes instructions
  • SoC - the system-on-a-chip that contains a processor, some peripherals, and usually some memory
  • Flash - the flash memory that the code and the constants live in
  • RAM - the random-access memory that the global variables, heap and stack live in

An example

  • Arm Cortex-M4 - a processor core from Arm
    • Use the thumbv7em-none-eabi or thumbv7em-none-eabihf targets
  • nRF52840 - a SoC from Nordic Semi that uses that processor core

An example (2)

  • Arm Cortex-M0+ - a smaller, simpler, processor core from Arm
    • Use the thumbv6m-none-eabi target
  • RP2040 - a SoC from Raspberry Pi that uses two of those processor cores

Booting a Cortex-M

The Arm Architecture Reference Manual explains:

  • The CPU boots at a well-defined address
  • That word should contain a 32-bit RAM address for the stack pointer
  • The word after should contain a 32-bit code address for the 'Reset' function
  • The following 14 32-bit words are the exception handlers
  • After that comes words for each interrupt handler

The chip does everything else.

The steps

  1. Make an array, or struct, with those two (or more) words in it
  2. Convince the linker to put it at the right memory address
  3. Profit

C vector table

__attribute__ ((section(".nvic_table"))) unsigned long myvectors[] =
{
    (unsigned long) &_stack_top,
    (unsigned long) rst_handler, 
    (unsigned long) nmi_handler, 
    // ...
}

Rust vector table

#[link_section=".nvic_table"]
#[no_mangle]
pub static ISR_VECTORS: [Option<Handler>; 155] = [
    Some(_stack_top),
    Some(rst_handler),
    Some(nmi_handler),
    // ...
]

Note:

The cortex-m-rt crate does it more nicely than this. Stuffing the _stack_top address in an array of function-pointers - yuck!

C Reset Handler

Can be written in C! But it's hazardous.

extern unsigned long _start_data_flash, _start_data, _end_data;
extern unsigned long _bss_start, _bss_end;

void rst_handler(void) {
    unsigned long *src = &_start_data_flash;
    unsigned long *dest = &_start_data;
    while (dest < &_end_data) {
        *dest++ = *src++;
    }
    dest = &_bss_start,
    while (dest < &_bss_end) {
        *dest++ = 0;
    }
    main();
    while(1) { }
}

Note:

Global variables are not initialised when this function is executed. What if the C code touches an uninitialised global variable? C programmers don't worry so much about this. Rust programmers definitely worry about this.

Rust Reset Handler (1)

extern "C" {
    static mut _start_data_flash: usize;
    static mut _start_data: usize;
    static mut _end_data: usize;
    static mut _bss_start: usize;
    static mut _bss_end: usize;
}

Rust Reset Handler (2)

#[no_mangle]
pub unsafe extern "C" fn rst_handler() {
    let mut src: *mut usize = &mut _start_data_flash;
    let mut dest: *mut usize = &mut _start_data;
    while dest < &mut _end_data as *mut usize {
        dest.volatile_write(src.read());
        dest = dest.add(1);
        src = src.add(1);
    }
    dest = &mut _bss_start as *mut usize;
    while dest < &mut _end_data as *mut usize {
        dest.volatile_write(0);
        dest = dest.add(1);
    }
    main();
}

Note:

This is technically undefined behaviour because globals haven't been initialised yet.

Linker scripts

  • In Rust, they work exactly like they do in C.
  • Same .text, .rodata, .data, .bss sections

The cortex-m-rt crate

Does all this work for you, in raw Arm assembly language to avoid UB.

See Reset, Linker script, and Vector table

The #[entry] macro

  • Attaches your fn main() to the reset function in cmrt
  • Hides your fn main() so no-one else can call it
  • Remaps static mut FOO: T to static FOO: &mut T so they are safe

Using the crate

See Cortex-M Quickstart

PACs and svd2rust

Introduction

The Peripheral Access Crate crate sits near the bottom of the 'stack'. It provides access to the memory-mapped peripherals in your MCU.

Memory Mapped Peripherals

  • e.g. a UART peripheral
  • Has registers, represented by a memory address
  • Registers are usually consecutive in memory (not always)
  • Peripherals can have instances (same layout of registers, different start address)
    • UART0, UART1, etc

Note:

The Universal Asynchronous Receiver Transmitter is an IP block implementing a logic-level RS-232 interface, and one is fitted to basically every microcontroller. Also known as a serial port.

Nordic calls their peripheral UARTE, with the E standing for Easy DMA.

Registers

  • Registers are comprised of one or more bitfields.
  • Each bitfield is at least 1 bit in length.
  • Sometimes bitfields can only take from a limited set of values
  • This is all in your datasheet!

C Code

Embedded Code in C often uses shifts and bitwise-AND to extract bitfields from registers.

#define UARTE_INTEN_CTS_SHIFT (0)
#define UARTE_INTEN_CTS_MASK (0x00000001)
#define UARTE_INTEN_RXRDY_SHIFT (2)
#define UARTE_INTEN_RXRDY_MASK (0x00000001)

// The other nine fields are skipped for brevity
uint32_t cts = 0;
uint32_t rxrdy = 1;

uint32_t inten_value = ((cts & UARTE_INTEN_CTS_MASK) << UARTE_INTEN_CTS_SHIFT)
    | ((rxrdy & UARTE_INTEN_RXRDY_MASK) << UARTE_INTEN_RXRDY_SHIFT);

*((volatile uint32_t*) 0x40002300) = inten_value;

Rust Code

You could do this in Rust if you wanted...

const UARTE0_INTEN: *mut u32 = 0x4000_2300 as *mut u32;
unsafe { UARTE0_INTEN.write_volatile(0x0000_0003); }

But this still seems very error-prone. Nothing stops you putting the wrong value at the wrong address.

Adding structure

In C, the various registers for a peripheral can also be grouped into a struct:

typedef volatile struct uart0_reg_t {
    uint32_t tasks_startrx; // @ 0x000
    uint32_t tasks_stoprx; // @ 0x004
    // ...
    uint32_t inten; // @ 0x300
    uint32_t _padding[79]; 
    uint32_t baudrate; // @ 0x500
} uart0_reg_t

uart0_reg_t* const p_uart = (uart0_reg_t*) 0x40002000;

Structures in Rust

#[repr(C)]
pub struct Uart0 {
    pub tasks_startrx: VolatileCell<u32>, // @ 0x000
    pub tasks_stoprx: VolatileCell<u32>, // @ 0x004
    // ...
    pub inten: VolatileCell<u32>, // @ 0x300
    _reserved12: [u32; 79],
    pub baudrate: VolatileCell<u32>, // @ 0x500
}

let p_uart: &Uart0 = unsafe { &*(0x40002000 as *const Uart0) };    

The vcell::VolatileCell type ensures the compiler emits volatile pointer read/writes.

Note:

There is some discussion about whether VolatileCell technically breaks Rust's rules around references. It works in practice, but it might be technically unsound.

Other approaches

#![allow(unused)]
fn main() {
pub struct Uart { base: *mut u32 } // now has no fields

impl Uart {
    fn write_tasks_stoprx(&mut self, value: u32) {
        unsafe {
            let ptr = self.base.offset(1);
            ptr.write_volatile(value)
        }
    }

    fn read_baudrate(&self) -> u32 {
        unsafe {
            let ptr = self.base.offset(0x140);
            ptr.read_volatile()
        }
    }
}

let uart = Uart { base: unsafe { 0x40002000 as *mut u32 } };
}

Note:

The pointer is a *mut u32 so the offsets are all in 32-bit words, not bytes.

Zero Sized Types

We could handle the address as part of the type instead...

#![allow(unused)]
fn main() {
pub struct Uart<const ADDR: usize> {}

impl<const ADDR: usize> Uart<ADDR> {
    fn write_tasks_stoprx(&mut self, value: u32) {
        unsafe {
            let ptr = (ADDR as *mut u32).offset(1);
            ptr.write_volatile(value)
        }
    }

    fn read_baudrate(&self) -> u32 {
        unsafe {
            let ptr = (ADDR as *mut u32).offset(0x140);
            ptr.read_volatile()
        }
    }
}


let uart: Uart::<0x40002000> = Uart {};
}

Note:

By itself this seems a small change, but imagine a struct which represents 75 individual peripherals. That's not impossible for a modern microcontroller. Holding one word for each now takes up valuable RAM!

CMSIS-SVD Files

A CMSIS-SVD (or just SVD) file is an XML description of all the peripherals, registers and fields on an MCU.

We can use svd2rust to turn this into a Peripheral Access Crate.


%3svdSVD XMLsvd2rustsvd2rustsvd->svd2rustrustRust Sourcesvd2rust->rust

Note:

Although it is an Arm standard, there are examples of RISC-V based microcontrollers which use the same format SVD files and hence can use svd2rust.

Also be aware that manufacturers often assume you will only use the SVD file to inspect the microcontrollers state whilst debugging, and so accuracy has been known to vary somewhat. Rust groups often have to maintain a set of patches to fix known bugs in the SVD files.

The svd2rust generated API
%3PeripheralsPeripheralsuarte1.UARTE1: UARTE1Peripherals->uarte1uarte2.UARTE2: UARTE2Peripherals->uarte2uarte1_baudrate.baudrate: BAUDATEuarte1->uarte1_baudrateuarte1_inten.inten: INTENuarte1->uarte1_intenuarte2_baudrate.baudrate: BAUDATEuarte2->uarte2_baudrateuarte2_inten.inten: INTENuarte2->uarte2_inten


  • The crate has a top-level struct Peripherals with members for each Peripheral
  • Each Peripheral gets a struct, like UARTE0, SPI1, etc.
  • Each Peripheral struct has members for each Register
  • Each Register gets a struct, like BAUDRATE, INTEN, etc.
  • Each Register struct has read(), write() and modify() methods
  • Each Register also has a Read Type (R) and a Write Type (W)
    • Those Read/Write Types give you access to the Bitfields

The svd2rust generated API (2)

  • The read() method returns a special proxy object, with methods for each Field
  • The write() method takes a closure, which is given a special 'proxy' object, with methods for each Field
    • All the Field changes are batched together and written in one go
    • Any un-written Fields are set to a default value
  • The modify() method gives you both
    • Any un-written Fields are left alone

Using a PAC

let p = nrf52840_pac::Peripherals::take().unwrap();
// Reading the 'baudrate' field
let contents = p.UARTE1.baudrate.read();
let current_baud_rate = contents.baudrate();
// Modifying multiple fields in one go
p.UARTE1.inten.modify(|_r, w| {
    w.cts().enabled();
    w.ncts().enabled();
    w.rxrdy().enabled();
    w    
});

Wait, what's a closure?

  • It's an anonymous function, declared in-line with your other code
  • It can 'capture' local variables (although we don't use that feature here)
  • It enables a very powerful Rust idiom, that you can't easily do in C...

Let's take it in turns

  • I, the callee, need to set some stuff up
  • You, the caller, need to do a bit of work
  • I, the callee, need to clean everything up

We can use a closure to insert the caller-provided code in the middle of our function. We see this used all (1) over (2) the (3) Rust standard library!

Quiz time

What are the three steps here?

p.UARTE1.inten.modify(|_r, w| {
    w.cts().enabled();
    w.ncts().enabled();
    w.rxrdy().enabled();
    w    
});

Note:

  1. Read the peripheral MMIO register contents as an integer
  2. Call the closure to modify the integer
  3. Write the integer back to the peripheral MMIO register

Documentation

Docs can be generated from the source code.

See https://docs.rs/nrf52840-pac

Note that uarte0 is a module and UARTE0 could mean either a struct type, or a field on the Peripherals struct.

UPPER_CASE and TitleCase

Writing Drivers


  • Writing to all those registers is tedious
    • You have to get the values right, and the order right
  • Can we wrap it up into a nicer, easier-to-use object?

Typical driver interface

let p = pac::Peripherals.take().unwrap();
let mut uarte0 = hal::uarte::Uarte::new(
    // Our singleton representing exclusive access to
    // the peripheral IP block
    p.UARTE0,
    // Some other settings we might need
    115200,
    hal::uarte::Parity::None,
    hal::uarte::Handshaking::None,
);
// Using the `uarte0` object:
uarte0.write_all(b"Hey, I'm using a UART!").unwrap();

The Hardware Abstraction Layer

  • Contains all the drivers for a chip
  • Often common/shared across chip families
    • e.g. nRF52 HAL for 52832, 52840, etc
  • Usually community developed
  • Often quite different between MCU vendors
    • Different teams came up with different designs!

Kinds of driver

  • PLL / Clock Configuration
  • Reset / Power Control of Peripherals
  • GPIO pins
  • UART
  • SPI
  • IΒ²C
  • ADC
  • Timer/Counters
  • and more!

Handling GPIO pins with code

// Get the singletons
let p = pac::Peripherals.take().unwrap();
// Make a driver for GPIO port P0
let pins = hal::gpio::p0::Parts::new(p.P0);
// Get Pin 13 on port P0 and make it an output
let mut led_pin = pins.p0_13.into_push_pull_output(Level::High);
// Now set the output low
led_pin.set_low();

This differs widely across MCUs (ST, Nordic, Espressif, Atmel, etc). Some MCUs (e.g. Nordic) let you put any function on any pin, and some are much more restrictive!

Correctness by design

  • HALs want to make it hard to do the wrong thing
  • Is a UART driver any use, if you haven't configured at least one TX pin and one RX pin?
  • Should the UART driver check you've done that?

Giving the pins to the driver

// 'degrade()' converts a P0_08 type into a generic Pin type.
let uarte_pins =  hal::uarte::Pins {
    rxd: pins.p0_08.degrade().into_floating_input(),
    txd: pins.p0_06.degrade().into_push_pull_output(Level::High),
    cts: None,
    rts: None,
};

let uarte = hal::uarte::Uarte::new(
    periph.UARTE1, uarte_pins, Parity::EXCLUDED, Baudrate::BAUD115200
);

This is example is for the nRF52, as used in some of our examples.

The Embedded HAL and its implementations

These things are different

  • STM32F030 UART Driver
  • nRF52840 UART Driver
  • But I want to write a library which is generic!
    • e.g. an AT Command Parser

How does Rust allow generic behaviour?

  • Generics!
  • where T: SomeTrait

Traits

An example:

#![allow(unused)]
fn main() {
trait GenericSerial {
    type Error;
    fn read(&mut self, buffer: &mut [u8]) -> Result<usize, Self::Error>;
    fn write(&mut self, buffer: &[u8]) -> Result<usize, Self::Error>;
}
}

My Library

struct AtCommandParser<T> {
    uart: T,
    ...
}

impl<T> AtCommandParser<T> where T: GenericSerial {
    fn new(uart: T) -> AtCommandParser<T> { ... }
    fn get_command(&mut self) -> Result<Option<AtCommand>, Error> { ... }
}

Note how AtCommandParser owns the object which meets the GenericSerial trait.

My Application

let uart = stm32_hal::Uart::new(...);
let at_parser = at_library::AtCommandParser::new(uart);
while let Some(cmd) = at_parser.get_command().unwrap() {
    ...
}

My Application (2)

let uart = nrf52_hal::Uart::new(...);
let at_parser = at_library::AtCommandParser::new(uart);
while let Some(cmd) = at_parser.get_command().unwrap() {
    ...
}

How do we agree on the traits?

  • The Rust Embedded Working Group has developed some traits
  • They are called the Embedded HAL
  • See https://docs.rs/embedded-hal
  • All HAL implementations should implement these traits

Blocking vs Non-blocking

  • Should a trait API stall your CPU until the data is ready?
  • Or should it return early, saying "not yet ready"
    • So you can go an do something else in the mean time?
    • Or sleep?
  • embedded_hal::blocking::serial::Write, vs
  • embedded_hal::serial::Write

Trade-offs

  • Some MCUs have more features than others
  • The trait design has an inherent trade-off
    • Flexibility/Performance vs Portability

Board Support Crates

Using a 'normal' PC

  • Did you tell your PC it had a mouse plugged in?
  • Did you tell it what I/O address the video card was located at?
  • No! It auto-discovers all of these things.
    • USB, PCI-Express, SATA all have "plug-and-play"

Using an Embedded System

  • Plug-and-play is extremely rare
  • Your MCU can put different functions (UART, SPI, etc) on different pins
  • The choice of which function goes on which pin was decided by the PCB designer
  • You now have to tell the software how the PCB was laid out
    • i.e UART0 TX is on Port 0, Pin 13

A Board Support Crate

  • You can wrap this up into a Board Support Crate
  • Especially useful if you are using a widely available dev-kit
    • e.g. the nRF52840-DK, or the STM32 Discovery
  • Still useful if the board design is an in-house one-off
  • Create the drivers and does the pin assignments for you
  • Helps make your application portable across different boards

Using a Board Support Crate

See example-code/nrf52/bsp_demo

#[entry]
fn main() -> ! {
    let mut nrf52 = Board::take().unwrap();
    loop {
        writeln!(nrf52.cdc, "On!").unwrap();
        nrf52.leds.led_2.enable();
        writeln!(nrf52.cdc, "Off!").unwrap();
        nrf52.leds.led_2.disable();
    }
}

Note:

We don't have to configure the LED pins as outputs. We don't have to configure the UART pins. The Board Support Crate did it all for us.

Making a Board Support Crate

pub struct Board {
    /// The nRF52's pins which are not otherwise occupied on the nRF52840-DK
    pub pins: Pins,
    /// The nRF52840-DK UART which is wired to the virtual USB CDC port
    pub cdc: Uarte<nrf52::UARTE0>,
    /// The LEDs on the nRF52840-DK board
    pub leds: Leds,
    ...
    /// nRF52 peripheral: PWM0
    pub PWM0: nrf52::PWM0,
    ...
}

impl Board {
  fn take() -> Option<Self> { todo!() }
  fn new(cp: CorePeripherals, p: Peripherals) -> Self { todo!() }
}

Note:

Because constructing the Board struct consumed all the peripherals from the PAC, it's important to re-export the ones the BSC didn't use so that applications can construct their own drivers using them,.

More things to consider

  • Does the MCU start-up on a slow internal oscillator?
  • Are there jumpers to control routing on the board?
  • SD Cards: should you pick a driver, or let them choose?
  • Radios: same question!

Using defmt


defmt is the Deferred Formatter

Motivation

  • You have a microcontroller
  • You want to know what it is doing

Classical Approach

  • Set up a UART,
  • have a function that writes logs to the UART, and
  • instrument your code with logger calls.
#define INFO(msg, ...) do { \
    if (g_level >= LEVEL_INFO) { \
        fprintf(g_uart, "INFO: " msg, __VA_ARGS__ ) \
    }  \
} while(0)

INFO("received %u bytes", rx_bytes);

Downsides

  • Code size - where do the strings live?
  • Waiting for the UART

An idea

  • Who actually needs the strings?
  • Your serial terminal
  • Which is on your laptop...

Do the logging strings even need to be in Flash?

defmt

  • Deferred Formatting
  • Strings are interned into a .defmt section
    • Is in the ELF file
    • Is not in Flash
  • Arguments are packed in binary format
  • Tools to reconstruct log messages on the host side

Benefits

  • Uses less flash space
  • Less data to transfer over the wire

Downsides

  • Now you need a special viewer tool
  • Which needs the exact ELF file your chip is running

Example

let rx_bytes = 300u16;
defmt::error!("received {=u16} bytes", rx_bytes);

 

This will transmit just: [3, 44, 1]

Note:

The string index we give here as 3, and 44, 1 is 300 encoded as little-endian bytes.

Type Hints

The braces can contain {[pos][=Type][:Display]}:

  • pos: a numeric argument position (e.g. 0)
  • Type: a type hint
  • Display: a display hint

More Examples

defmt::info!("enabled: {=bool}, ready: {=bool}", enabled, ready);
// enabled: true, ready: false

defmt::trace!("{{ X: {0=0..8}, Y: {0=8..16}, Z: {0=16..19} }}", some_bitfield);
// { X: 125, Y: 3, Z: 2 }

defmt::error!("data = {=[u8]:#02x}", some_byte_slice)
// data = [0x00, 0x01, 0x02, 0x03]

Note:

The x..y syntax is the bitfield syntax. [u8] is the u8 slice syntax, and :#02x means two-digit hex in the alternate (0x) style.

Using type hints can produce a more efficient encoding.

Printing structs and enums

#![allow(unused)]
fn main() {
#[derive(Debug)]
struct Data {
    x: [u8; 5],
    y: f64
}

fn print(data: &Data) {
    println!("data = {:?}", data);
}
}

Printing structs and enums with defmt

#[derive(defmt::Format)]
struct Data {
    x: [u8; 5],
    y: f64
}

fn print(data: &Data) {
    defmt::info!("data = {=?}", data);
}

Note:

The =? is optional, as it is the default. It means render this using the defmt::Format trait.

In defmt, there is not Debug vs Display distinction - it is up to the host to decide how best to format the values.

Optionally enabling defmt

  • If a library uses defmt::Format, the application must set up a logger
  • Portable libraries don't want this. Instead:
#[cfg_attr(feature = "defmt", derive(defmt::Format))]
struct Data {
    x: [u8; 5],
    y: f64
}

A better transport

  • UART is slow
  • Background DMA from a ring-buffer is complicated to set up
  • Can we do better?

SEGGER RTT

  • Real Time Transport
  • Dedicated memory area
  • Marked with magic numbers
  • Can be found and read by your Debug Probe
  • Without interrupting the CPU!
  • High speed, near-zero-cost byte-pipe

defmt-rtt

  • Implement's SEGGER's RTT protocol
  • Wired up as a defmt global logger
  • Your binary just needs to:
use defmt_rtt as _;

Note:

The defmt calls in your libraries are able to find the 'logging sink' created by the defmt-rtt crate though the use of a type in defmt-rtt annotated with:

#[defmt::global_logger]

This creates a bunch of unsafe #[no_mangle] functions, like:

#[inline(never)]
#[no_mangle]
unsafe fn _defmt_acquire() {
    <Logger as defmt::Logger>::acquire()
}

Log Level

You can control the log level at compile time with an environment variable:

DEFMT_LOG=info cargo build

Note:

Windows users will use different syntax for cmd.exe vs Powershell.

Host tools

  • Knurling's probe-run was the first
  • The probe-rs CLI now has support (recommended)
  • Or use defmt-print

Using probe-rs

$ probe-rs run --chip nRF52840_xxAA target/thumbv7em-none-eabihf/debug/radio-puzzle-solution
      Erasing βœ” [00:00:00] [#########################] 16.00 KiB/16.00 KiB @ 35.52 KiB/s (eta 0s )
  Programming βœ” [00:00:00] [#########################] 16.00 KiB/16.00 KiB @ 49.90 KiB/s (eta 0s )    Finished in 0.79s
0 DEBUG Initializing the board
└─ dk::init @ /Users/jonathan/Documents/ferrous-systems/rust-exercises/nrf52-code/boards/dk/src/lib.rs:208
1 DEBUG Clocks configured
└─ dk::init @ /Users/jonathan/Documents/ferrous-systems/rust-exercises/nrf52-code/boards/dk/src/lib.rs:219

Customise the format

$ probe-rs run --chip nRF52840_xxAA ... --log-format "{t} {f}:{l} {L} {s}"
      Erasing βœ” [00:00:00] [#########################] 16.00 KiB/16.00 KiB @ 35.52 KiB/s (eta 0s )
  Programming βœ” [00:00:00] [#########################] 16.00 KiB/16.00 KiB @ 49.90 KiB/s (eta 0s )    Finished in 0.79s
0 lib.rs:208  DEBUG Initializing the board
1 lib.rs:219  DEBUG Clocks configured

Set it as your runner

[target.thumbv7em-none-eabihf]
runner = "probe-rs run --chip nRF52840_xxAA"
$ cargo run
    Finished dev [optimized + debuginfo] target(s) in 0.03s
     Running `probe-rs run --chip nRF52840_xxAA target/thumbv7em-none-eabihf/debug/radio-puzzle-solution`
      Erasing βœ” [00:00:00] [#########################] 16.00 KiB/16.00 KiB @ 35.52 KiB/s (eta 0s )
  Programming βœ” [00:00:00] [#########################] 16.00 KiB/16.00 KiB @ 49.90 KiB/s (eta 0s )    Finished in 0.79s
0 DEBUG Initializing the board
└─ dk::init @ /Users/jonathan/Documents/ferrous-systems/rust-exercises/nrf52-code/boards/dk/src/lib.rs:208
1 DEBUG Clocks configured
└─ dk::init @ /Users/jonathan/Documents/ferrous-systems/rust-exercises/nrf52-code/boards/dk/src/lib.rs:219

More info

There's a book!

https://defmt.ferrous-systems.com

Re-entrancy

defmt::info! (etc) can be called anywhere, even from an interrupt.

How do you make that safe?

Critical Sections

  • defmt-rtt uses the critical-section crate
  • More on this elsewhere

What is Ferrocene?

Ferrocene is

  • Rust, not a subset
  • A downstream of The Rust Project
  • Long-term stable
  • Open Source
  • Qualified per ISO 26262 (ASIL D) / IEC 61508 (SIL 4)
  • Supplied with a warranty
  • Available with support
  • Tested differently

Rust, not a subset

  • We didn't write a new Rust toolchain
  • We qualified The Rust Toolchain
  • The subset of Rust for safety-critical, is Rust

A downstream of The Rust Project

  • One of the Ferrocene pillars is that the standard library and the compiler must not diverge from upstream.
  • We've been pulling the master branch of rust-lang/rust into our tree since 2021

Patches

  • Of course, some changes were required
  • So, we upstreamed all of them
  • Like [#93717], [#108659], [#111936], [#108898]...
  • [#111992], [#112314], [#112418], [#112454], ...

Virtuous Cycle

  • Sometimes we find bugs that upstream missed
  • So we upstreamed the fixes
  • Like [#108905] or [#114613].

Long-term Stable

As of 3 September 2024, the Ferrocene releases are:

  • nightly (upstream nightly)
  • pre-rolling (upstream beta)
  • rolling (upstream stable)
  • stable-24.05 (upstream 1.76)
  • stable-24.08 (upstream 1.79)

Note:

We strive to make each stable release available for two years, including tracking of Known Problems.

Open Source

Qualified per ISO 26262 (ASIL D) / IEC 61508 (SIL 4)

We're in the TÜV SÜD database

TÜV SÜD logo

cargo isn't qualified

  • Qualifying a tool that touches the Internet is hard
  • You don't need a build system...
  • You can just call rustc (which is qualified) from a simple script for production

libstd isn't certified, libcore will be

  • It doesn't make sense to certify the Standard Library
    • It's mostly "If Windows, do X; if POSIX, do Y"
  • We are looking at certifying libcore

Supplied with a warranty

If you find a bug in the compiler, we will fix it or give you details on how to work around it

Available with support

  • A subscription gets you binary downloads and access to the Known Problems list
  • Signed Qualification Documents are available (call us)
  • If you need additional support with your Rust development, we can help

Tested Differently

  • The Rust Project only tests Tier 1 targets
  • We have developed our own CI
    • Separate and parallel to that used by The Rust Project
    • They have different goals!
  • Having multiple independent, parallel, rock solid CI pipelines can only benefit Rust
  • Our CI produces the artefacts we need for qualification

Installing and Using Ferrocene

What's in the box?

  • rustc - a compiler (β˜…)
    • lld - the LLVM linker (β˜…)
    • rustdoc - the docs generator
  • cargo/rustfmt/clippy - our usual friends
  • llvm-tools - objcopy, size, etc
  • rust-analyzer - for IDE integration
  • rust-src - libstd source code
  • rust-std-xxx - precompiled standard libraries (β˜†)
  • ferrocene-self-test - checks your installation
  • ferrocene-docs-xxx - documentation

β˜…: qualified tool β˜†: certification in progress

Note:

The lld linker and rustdoc come with the rustc-${rustc-host} package.


Portal

https://releases.ferrocene.dev

Note:

channels contain releases

Examples of channels include:

  • nightly
  • pre-rolling
  • rolling
  • beta-24.05
  • beta-24.08
  • stable-24.05
  • stable-24.08
  • etc

Examples of releases include:

  • nightly-2024-08-29
  • pre-rolling-2024-08-28
  • rolling-2024-08-08
  • beta-24.05-2024-06-19
  • beta-24.08-2024-08-22
  • stable-24.05.0
  • stable-24.08.0
  • etc

Portal

https://docs.ferrocene.dev

Targets

We have two dimensions:

  • Qualified, or not
  • Host or Cross-compiled

Qualified Targets

  • Production Ready
  • Passes the Rust Test Suite
  • Support is available
  • Signed qualification material
    • stable channel only

Note:

In stable-24.08 and earlier, these were called "Supported Targets"

Each release has a User Manual and it is important to follow the instructions for that target in that release otherwise you may be outside the qualification scope. As an example, we don't let you give arbitrary arguments to the linker - you can only pass the arguments we say are OK.

Quality Managed (QM) Targets

  • Production Ready
  • Passes the Rust Test Suite
  • Support is available
  • Signed qualification material

Note:

It may be that the target is en-route to being a Qualified Target, or it may be that it is deemed unlikely that the target would be useful in a safety critical context. Talk to us if you would like a QM Target available as a Qualified Target.

Experimental Targets

  • Not Production Ready
  • Not qualified
  • Might not pass the test suite
  • But useful for getting started early

Note:

A Ferrocene 'Experimental Target' is broadly equivalent to an upstream Tier 2 or Tier 1 target, depending on whether we're running the Test Suite in CI. And, to be fair, plenty of people use upstream Rust in production.

Host Targets

  • Ferrocene runs on a limited number of hosts:
  • Ferrocene is installed with criticalup
    • It's also open-source
    • Or, you can install a specific Ferrocene release from tarballs
  • Hosts always compile for themselves (proc-macros, build.rs, etc)

Cross-Compilation Targets

Using criticalup

  • Our equivalent of rustup
  • Fetches the appropriate Ferrocene toolchain packages
  • Need a criticalup.toml file for each project, and a global login token
    • Token only required to download a toolchain
    • You can burn the toolchain to a CD-R if you want

criticalup.toml

manifest-version = 1

[products.ferrocene]
release = "stable-24.08.0"
packages = [
  "rustc-${rustc-host}", "rust-std-${rustc-host}", "cargo-${rustc-host}",
  "rust-src", "rust-std-aarch64-unknown-none"
]

Installing Ferrocene

  1. Install criticalup
  2. Make a token
  3. Store your token with criticalup auth set
  4. Go to your project dir
  5. Run criticalup install

Example

$ criticalup auth set
$ criticalup install
info: installing product 'ferrocene' (stable-24.08.0)
info: downloading component 'cargo-x86_64-unknown-linux-gnu' for 'ferrocene' (stable-24.08.0)
...
info: downloading component 'rustc-x86_64-unknown-linux-gnu' for 'ferrocene' (stable-24.08.0)
info: installing component 'rustc-x86_64-unknown-linux-gnu' for 'ferrocene' (stable-24.08.0)
$ criticalup run rustc --version

Local State

Criticalup maintains local state in one of the following locations:

  • Linux: ~/.local/share/criticalup
  • macOS: ~/Library/Application Support/criticalup
  • Windows: %APPDATA%\criticalup

Running Ferrocene


You can execute the tool directly from the install dir

$ criticalup which rustc
/home/user/.local/criticalup/toolchains/cbfe2b...21e8b/bin/rustc

$ /home/user/.local/criticalup/toolchains/cbfe2b...21e8b/bin/rustc --version
rustc 1.79.0 (02baf75fd 2024-08-23) (Ferrocene by Ferrous Systems)

NB: cargo uses whichever rustc is in your PATH.


You can use the tool proxies:

$ ls /home/user/.local/criticalup/bin
cargo       rust-gdb    rust-gdbgui rust-lldb   rustc       rustdoc

$ /home/user/.local/criticalup/bin/rustc --version
rustc 1.79.0 (02baf75fd 2024-08-23) (Ferrocene by Ferrous Systems)

NB: cargo uses the corresponding rustc


You can use criticalup as a proxy:

$ criticalup run rustc --version
rustc 1.79.0 (02baf75fd 2024-08-23) (Ferrocene by Ferrous Systems)

NB: cargo uses the corresponding rustc

rust-analyzer in VS Code

Set RUSTC to tell it which rustc to use

$ RUSTC=$(criticalup which rustc) code .

PS D:\project> $Env:RUSTC=$(criticalup which rustc)
PS D:\project> code .

Ensure you have the rust-src package installed.


Our Rust Training has both 32-bit and 64-bit Arm bare-metal examples:

https://github.com/ferrous-systems/rust-training/tree/main/example-code

What is Rust?

The 100-foot view

A free and open-source systems programming language

A language empowering everyone to build reliable and efficient software.

Hello, World

fn main() {
    println!("Hello, world!");
}

You can build...

  • Network Services
  • Command-line Apps
  • Web Apps
  • Desktop Apps
  • Bootloaders
  • Device Drivers
  • Hypervisors
  • Embedded Systems
  • Libraries/plugins for applications in other languages

Front-end or Back-end?

It's applicable at every point in the stack!

The Three Words

  • Safety
  • Performance
  • Productivity

Stack Overflow Survey 2023:

Rust is on its eighth year as the most loved admired language with 85% of developers saying they want to continue using it.

Note:

Stack Overflow used to use the term most loved, which Rust won seven years in a row. In 2023 they changed the terms to desired and admired. Rust was the most admired language in 2023.

Cross-platform

  • Windows, macOS, Linux
  • iOS, Android, Web, QNX, Bare-metal, etc

Portable

  • Source code is portable across multiple architectures:
    • x86, RISC-V and Arm
    • Power, MIPS, SPARC, ...

Rust can import C-compatible libraries

Want to use zlib, OpenSSL, SomeSpecialDriverLib? Sure!

Rust can export C-compatible libraries

  • Python extension modules? Ok!
  • Android native libraries? No problem.
  • Replace the file parser in your Very Large C++ Application? Can-do.

Where did Rust come from?

A Little Bit of History

  • Rust began around 2008
  • An experimental project by Graydon Hoare
  • Adopted by Mozilla
  • Presented to the general public as version 0.4 in 2012

Focus

  • Rust lost many features from 2012 to 2014
    • garbage collector
    • evented runtime
    • complex error handling
    • etc
  • Rust oriented itself towards being a usable systems programming language

Development

  • Always together with a larger project (e.g. Servo)
  • Early adoption of regular releases
  • RFC process
  • Editions

Public Release

Who's in charge now?

The Rust Project

https://www.rust-lang.org/governance

  • The Leadership Council
  • Compiler Team
  • Dev Tools Team
  • Infrastructure Team
  • Language Team
  • Library Team
  • Moderation Team
  • Launching Pad Team

Working Groups

  • Async WG
  • Command-line Interface WG
  • Embedded devices WG
  • Game Development WG
  • Rust by Example WG
  • Secure Code WG
  • Security Response WG
  • WebAssembly (WASM) WG

The Rust Foundation

... is an independent non-profit organization dedicated to stewarding the Rust programming language, nurturing the Rust ecosystem, and supporting the set of maintainers governing and developing the project.

It has a powerful list of members

https://foundation.rust-lang.org/members/

Who decides on new features?

  • Discuss in chat/forums
  • Open a Request For Change (RFC)
  • Relevant team takes a vote
  • Tracking ticket is created
  • Pull Request(s) to implement the change
  • Stabilisation

Summary

  • Rust is a collaborative open-source project that prides itself on inclusion
  • There is no "owner", nor "BDFL"
  • It has strong financial backing
  • It remains a work-in-progress

Is this a community I can engage with?

A strong Code of Conduct

The Rust Project, and pretty much the whole Community, follow a Code of Conduct:

We are committed to providing a friendly, safe and welcoming environment for all, regardless of level of experience, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, nationality, or other similar characteristic.

A strong Code of Conduct

Likewise any spamming, trolling, flaming, baiting or other attention-stealing behavior is not welcome.

  • Builds on efforts in other communities

Why?

  • Because a community is only as strong as its members

Going beyond technical points, Rust has a vibrant, welcoming community - (Stack Overflow Blog)

Why?

  • If you allow both wolves and sheep into your space, you won't get any sheep
  • The Rust Community seems to have a higher than average representation from the LGBTQI+ community

So beginners are welcome?

  • Absolutely!
  • Relatively speaking, we're all still beginners
  • You even see open tickets on the rust-lang Github marked as E-easy: Good first issue.

This extends to the compiler's interface...

  • Any Rust error message which is unclear or ambiguous...
  • ... is considered a bug and will be fixed ...
  • ... if you open a ticket (or post @ the right people)

Compiler Error Driven Development works!

error[E0502]: cannot borrow `name` as mutable because it is also borrowed as immutable
 --> src/main.rs:4:5
  |
3 |     let nickname = &name[..3];
  |                     ---- immutable borrow occurs here
4 |     name.clear();
  |     ^^^^^^^^^^^^ mutable borrow occurs here
5 |     println!("Hello there, {}!", nickname);
  |                                  -------- immutable borrow later used here
Some errors have detailed explanations: E0502, E0596.
For more information about an error, try `rustc --explain E0502`.

What does Rust run on?

Host vs Target

  • The machine you develop on
  • The machine the program runs on

Rust is a cross-compiler

  • It uses LLVM to generate machine code
  • Every Rust install is a cross-compiler
    • No rummaging for extra installers for your specific target

Hosts

  • Windows (x86, Arm)
  • macOS (x86, Arm)
  • Linux (x86, Arm, RISC-V, MIPS, Power, S390x, SPARC...)
  • FreeBSD, NetBSD, Illumos, ...

Targets

  • All of the above, plus...
  • Android
  • iOS/watchOS/tvOS
  • Bare-metal Embedded
  • QNX, VxWorks, AIX
  • WebAssembly
  • UEFI
  • Nintendo Switch, Sony PSP and PS Vita...
  • Add your own!

What does Rust cost?

Rust is Open Source

Binaries are provided free of charge

  • Available using the rustup tool
  • AWS sponsor the project
  • Nothing to sign, no USB dongle required

Support is available

  • There are lots of places you can go for help
    • Forums, Discord, Reddit
    • Professional consulting firms
    • Rust Toolchain vendors

No-one is an expert overnight

  • Budget for some training
  • Budget for some time for the team to gain experience
  • Budget for some support when the team have questions

You might need a bigger computer...

Today, compiling the Rust compiler on a 4-core CPU, that is typically found in a standard laptop, takes up to 15 minutes with another 5-10 minutes for tests. However, a 96-core cloud virtual machine can complete the same build in less than 5 minutes with tests completing in 35 seconds.

Compile time checks vs run-time checks

  • Rust does a lot of work up front
  • The faster your checks run, the more productive you are!
  • A Raspberry Pi 4 technically works, but it takes a while...

Can I build safety-critical systems?

Some terminology

  • a system is certified as being sufficiently safe/correct
  • that system is often built using qualified tools
  • quality is the result of an ongoing process

Note:

Some industries use the terms certification and qualification interchangeably.

What is a safety-critical system?

Generally built following a standard, like ISO 26262:

ISO 26262 is intended to be applied to safety-related systems that include one or more electrical and/or electronic (E/E) systems and that are installed in series production passenger cars with a maximum gross vehicle mass up to 3500 kg.

What is a safety-critical system?

Generally built following a standard, like ISO 26262:

This document describes a framework for functional safety to assist the development of safety-related E/E systems. This framework is intended to be used to integrate functional safety activities into a company-specific development framework.

And for other applications:

  • DO-178C Software Considerations in Airborne Systems and Equipment Certification
  • IEC 61508 Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems
  • IEC 62278 Railway applications - Specification and demonstration of reliability, availability, maintainability and safety
  • IEC 62034 Medical device software – Software life cycle processes
  • There are many others...

Can I use Rust?

  • Well you can use C
  • And C is kinda risky...
  • But processes have been developed to manage that risk
  • And C toolchains have been qualified so you can rely on them doing what they say they are going to do
    • If you hold them the right way

Language Specifications

  • C has ISO/IEC 9899:2018 (C17)
  • C++ has ISO/IEC 14882:2020(E) (C++20)
  • Rust doesn't have a standard
    • The open-source compiler is the standard
    • The first ISO C standard (C90) came 17 years after C was invented, largely because there were a lot of different competing compilers

Ferrocene

Ferrocene is the open-source qualified Rust compiler toolchain for safety- and mission-critical. Qualified for automotive and industrial development.

ISO26262 (ASIL D) and IEC 61508 (SIL 4) available for x86 and ARM platforms.

Ferrocene

Safety, Performance and Productivity

1) Safety

Rust is memory-safe

  • Every value has one owner
  • You can create either:
    • One exclusive, mutable, reference
    • Multiple shared, immutable, references
    • Never both!
  • These rules are checked at compile time
    • Or at run-time if you choose
  • Rust applies bounds checks to array and slice accesses
    • Where possible (e.g. the indices are constant) those checks are optimized out

Index Example

#![allow(unused)]
fn main() {
fn process(items: &mut [i32]) {
    items[10] = 6;
}
}

If items isn't long enough, this raises a run-time panic instead of corrupting memory.

Iter Example

/// Adds 0x00 padding for every 0xCC found
fn process(data: &mut Vec<u8>) {
    for item in data.iter_mut() {
        if *item == 0xCC {
            data.push(0);
        }
    }
}

Rust won't let you modify the Vec<u8> whilst you iterate through it - this breaks the rules around exclusive borrows.

Note:

This is trivial to do in C++ and causes silent corruption.

Iter Example (fixed)

#![allow(unused)]
fn main() {
/// Adds 0x00 padding for every 0xCC found
fn process(data: &mut Vec<u8>) {
    let padding_byte_count = data.iter().filter(|&&x| x == 0xCC).count();
    for _ in 0..padding_byte_count {
        data.push(0);
    }
}
}

Rust is thread-safe

  • Types must be marked as safe for:
    • Transferring ownership between threads, and/or
    • Transferring a reference between threads
  • You cannot create race-hazards!

APIs can reason about thread-safety

  • Rust channels require types to be marked as thread-safe
  • Passing values when starting a spawned thread - same checks
  • The ref-counting allocation type Rc<T> is not thread-safe
  • The atomic-ref-counting allocation type Arc<T> is (but is slightly slower)
  • Make the wrong choice? Compiler stops you!

Thread Example

fn main() {
    let mut total = 0;
    for _ in 0..10 {
        std::thread::spawn(|| {
            total += 1;
        });
    }
    println!("{total}");
}

Note:

  • Failure 1 - threads can live forever, but they are trying to borrow a variable on the stack of the main function
  • Failure 2 - multiple threads trying to take mutable (exclusive) access to a variable

Thread Example (Fixed)

use std::sync::atomic::{AtomicU32, Ordering};
fn main() {
    let total = AtomicU32::new(0);
    std::thread::scope(|s| {
        for _ in 0..10 {
            s.spawn(|| total.fetch_add(1, Ordering::Relaxed));
        }
    });
    println!("{}", total.load(Ordering::Relaxed));
}

There's an escape hatch

  • Where the compiler cannot verify the rules are upheld, you can tell it you've done the checks manually
  • We create unsafe { } blocks and unsafe fn functions
  • Lets you access raw pointers (e.g. for memory-mapped I/O)
  • When you audit/review the code, you pay close attention to these parts!

2) Performance

A Comparison

Let's use Python to calculate the sum of the cubes of the first 100 million integers.

import datetime
start = datetime.datetime.now()
cube_sum = sum(
    map(
        lambda x: x * x * x,
        range(0, 100_000_000)
    )
)
print(f"Took {datetime.datetime.now() - start}")
print(f"cube_sum = {cube_sum}")
>>> run()
Took 0:00:09.076986
24999999500000002500000000000000

In Rust?

fn main() {
    let start = std::time::Instant::now();
    let sum: u128 = (0..100_000_000u32)
        .into_iter()
        .map(|n| {
            let n = u128::from(n);
            n * n * n
        })
        .sum();
    println!("Took {:?}", start.elapsed());
    println!("sum = {sum}");
}
$ cargo run --release
   Compiling process v0.1.0 (/Users/jonathan/process)
    Finished release [optimized] target(s) in 0.34s
Took 45ns
sum = 24999999500000002500000000000000

OK, but it's cheating

fn main() {
    let start = std::time::Instant::now();
    let sum: u128 = (0..100_000_000u32)
        .into_iter()
        .map(|n| {
            let n = u128::from(n);
            std::hint::black_box(n * n * n)
        })
        .sum();
    println!("Took {:?}", start.elapsed());
    println!("sum = {sum}");
}
$ cargo run --release
   Compiling process v0.1.0 (/Users/jonathan/process)
    Finished release [optimized] target(s) in 0.34s
Took 68.014583ms
sum = 24999999500000002500000000000000

Let's use all our CPU cores...

// Import the rayon library
use rayon::prelude::*;

fn main() {
    let start = std::time::Instant::now();
    // Swap `into_iter` for `into_par_iter`
    let sum: u128 = (0..100_000_000u32)
        .into_par_iter()
        .map(|n| {
            let n = u128::from(n);
            std::hint::black_box(n * n * n)
        })
        .sum();
    println!("Took {:?}", start.elapsed());
    println!("sum = {sum}");
}

Let's use all our CPU cores...

$ cargo add rayon
    Updating crates.io index
      Adding rayon v1.6.1 to dependencies.
$ cargo run --release
...
   Compiling rayon v1.6.1
   Compiling process v0.1.0 (/Users/jonathan/process)
    Finished release [optimized] target(s) in 2.38s
     Running `target/release/process`
Took 9.928125ms
sum = 24999999500000002500000000000000

Sure, but C can do this too, right?

$ clang -o ./target/main src/main.c -O3 -mcpu=native -std=c17 && ./target/main
sum 0x13b8b5ae675d38cb7260b704000
Took 70.3 milliseconds

And was getting that performance ... enjoyable?

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>
#include <time.h>

int main(int argc, char** argv) {
    uint64_t start = clock_gettime_nsec_np(CLOCK_MONOTONIC);
    __uint128_t x = 0;
    for(uint32_t idx = 0; idx < 100000000; idx++) {
        __uint128_t i = (__uint128_t) idx;
        volatile __uint128_t result = i * i * i;
        x += result;
    }
    uint64_t end = clock_gettime_nsec_np(CLOCK_MONOTONIC);
    printf("sum 0x%08llx%08llx\n", (unsigned long long) (x >> 64), (unsigned long long) x);
    printf("Took %.3g milliseconds\n", ((double) (end - start)) / (1000.0 * 1000.0) );
    return 0;
}

3) Productivity

libstd

  • Filesystem access and Path handling
  • Heap allocation, with optional reference-counting
  • Threads, with Mutexes, Condition Variables, and Channels
  • Strings, and a powerful value formatting system
  • Growable arrays, hash-tables, B-Trees
  • First-class Unicode text support
  • Networking support (IPv4/IPv6, TCP/UDP, etc)
  • I/O traits for working with files, strings, sockets, etc
  • Time handling: Duration and Instant
  • Environment Variables and CLI arguments

Much less time chasing down weird bugs

  • If it compiles, it'll probably work right
  • No data races across threads
  • No double frees, buffer overflows

Async Programming

  • Third-party libraries (e.g. tokio) give you all that but with an asynchronous API
  • Great if your code spends a lot of time waiting (for the disk, for the network)

Tools like rust-analyzer have powerful auto-completion

  • Filling in functions to meet a trait definition
  • Covering all the arms in a match expression
  • Importing modules or qualifying a given type

Built in testing

  • The test-runner compiles and runs:
    • All your unit tests
    • All your integration tests
    • All the code examples in your docs!
  • It also compiles all your examples

It's completely cross-platform

  • Windows, Linux and macOS devs all working with the same tools
  • You can build stand-alone binaries that are trivial to deploy

Tradeoffs

OK, but what's the catch?

You can't write C in Rust

  • You have to think about memory up-front
    • Who owns any given value?
    • Who needs to borrow it and when?
    • Does it live long enough to satisfy those borrows?
    • Are you borrowing something that might move?

Rust exposes underlying complexity

  • There are at least six kinds of "String" in Rust
    • Owned or Borrowed, Rust-native, C-compatible and OS API-compatible
  • There is no garbage collector - you manage your own memory
    • Maybe you'd be OK with the performance of Go, or C# or Java?

Rust doesn't interact well with C++ code

  • Rust doesn't understand classes or templates
  • Neither Rust nor C++ have a stable ABI
  • Projects do exist to auto-generate bindings, like cxx

Touching the hardware requires unsafe

Hardware is a blob of shared mutable state and you have to manually verify your access to it is correct

What you have works just fine

If it's safe enough, maintainable enough and fast enough, then you should keep it!

Definitely don't do too many new things at once.

It's early days for building critical-systems in Rust

Ferrocene is good, but C and Ada have a multi-decade head start

Is the juice worth the squeeze?

Only you can decide!

But we can show you what other people have found...

Some quotes...

  • Mozilla
  • Microsoft
  • Google
  • CISA
  • Amazon
  • Linux Kernel
  • Cloudflare
  • Dropbox
  • Meta
  • Infineon
  • Volvo

Mozilla

With the release of Firefox 48, we shipped the very first browser component to be written in the Rust programming language β€” an MP4 parser for video files. Streaming media files in your browser can be particularly risky if you don’t know or trust the source of the file, as these can maliciously take advantage of bugs in a browser’s code. Rust’s memory-safe capabilities prevent these vulnerabilities from being built into the code in the first place.

– Firefox Blog (2017)

Microsoft

We believe Rust changes the game when it comes to writing safe systems software. Rust provides the performance and control needed to write low-level systems, while empowering software developers to write robust, secure programs.

– MSRC Blog (2019)


Speaking of languages, it's time to halt starting any new projects in C/C++ and use Rust for those scenarios where a non-GC language is required. For the sake of security and reliability, the industry should declare those languages as deprecated.

– Mark Russinovich, CTO Azure (2022)

Note:

Microsoft are following up on this. As of October 2024, there is Rust in the Windows 11 kernel, and user-land APIs like DWriteCore are (at least partially) written in Rust.

Google

More than 2/3 of respondents are confident in contributing to a Rust codebase within two months or less when learning Rust.

Anecdotally, these ramp-up numbers are in line with the time we’ve seen for developers to adopt other languages, both inside and outside of Google.

– Google Open Source Blog (2023)


Rust teams at Google are as productive as ones using Go, and more than twice as productive as teams using C++.

and

In every case, we've seen a decrease by more than 2x in the amount of effort required to both build the services written in Rust, as well as maintain and update those services. [...] C++ is very expensive for us to maintain.

– Lars Bergstrom, Google (2024)


...the percentage of memory safety vulnerabilities in Android dropped from 76% to 24% over 6 years as development shifted to memory safe languages.

We see the (Safe Coding) shift showing up in important metrics such as rollback rates (emergency code revert due to an unanticipated bug). The Android team has observed that the rollback rate of Rust changes is less than half that of C++.

– Google Security Blog (2024)

CISA

There are, however, a few areas that every software company should investigate. First, there are some promising memory safety mitigations in hardware. ... Second, companies should investigate memory safe programming languages.

– "The Urgent Need for Memory Safety in Software Products", CISA (2023)

Note:

CISA is the US Government's Cybersecurity and Infrastructure Security Agency

Amazon

Here at AWS, we love Rust, too, because it helps AWS write highly performant, safe infrastructure-level networking and other systems software. ... we also use Rust to deliver services such as S3, EC2, CloudFront, Route 53, and more ... Our Amazon EC2 team uses Rust as the language of choice for new AWS Nitro System components...

– AWS Open Source Blog (2020)

Linux Kernel

Like we mentioned last time, the Rust support is still to be considered experimental. However, support is good enough that kernel developers can start working on the Rust abstractions for subsystems and write drivers and other modules.

– Linux Kernel Mailing List (2022)

Note:

  • Asahi Linux wrote the Apple Silicon GPU driver in Rust.
  • The new Nova open-source driver for nVidia GPUs will be written in Rust.

Dropbox

We wrote Nucleus in Rust! Rust has been a force multiplier for our team, and betting on Rust was one of the best decisions we made. More than performance, its ergonomics and focus on correctness has helped us tame sync’s complexity. We can encode complex invariants about our system in the type system and have the compiler check them for us.

– Dropbox.Tech (2022)

Cloudflare

In production, Pingora consumes about 70% less CPU and 67% less memory compared to our old service with the same traffic load.

– Cloudflare Blog (2022)

Meta

[Our Rust Engineers] came from Python and Javascript backgrounds. They appreciated Rust’s combination of high performance with compile-time error detection. As more success stories, such as performance improvements at two to four orders of magnitude, circulated within the company, interest grew in using Rust for back-end service code and exploring its use in mobile apps as well.

– Engineering at Meta (2021)

Infineon

With Infineon's support, we can expect Rust's usage in Embedded Systems to become more widespread, standardizing the usage of Rust in the industry while engaging with the Rust FOSS community.

– Infineon Developer Community Blog (2023)

SEGGER

Rust is fast, memory-efficient and safe. With first-class tool support, it has the potential to overtake C and C++.

– Rolf Segger, SEGGER (2024)

Volvo

I always had the feeling, is Rust too good to be true? I'm always looking for the big pitfall. So far I have not found anything bad. Only some small things...

[We have] a bigger and bigger pile of proof that Rust does actually work well.

– Julius Gustavsson, Volvo (2024)

Note:

As of October 2024, the Volvo EX30 and the Polestar 3 are shipping with some firmware written in Rust, particular in the Low-Power ECU.

Volvo

I think we're at that point where instead of asking 'Can we use Rust for this?', we should be asking 'Why can't we use Rust for this?'

– Julius Gustavsson, Volvo (2024)

Where Next?

On-line Self-Taught Courses

Desktop-based Self-Taught Courses

Project Documentation

Ferrocene Documentation

https://public-docs.ferrocene.dev

Working Group Materials

Online Books

  • Rust in Action
  • Rust for Rustaceans

Consultancy and Support

There are a growing number of Rust-based consultancies.

Professional Training

Ferrous Systems offer professional training for small teams: