Safety, Performance and Productivity
1) Safety
Rust is memory-safe
- Every value has one owner
- You can create either:
- One exclusive, mutable, reference
- Multiple shared, immutable, references
- Never both!
- These rules are checked at compile time
- Or at run-time if you choose
- Rust applies bounds checks to array and slice accesses
- Where possible (e.g. the indices are constant) those checks are optimized out
Index Example
#![allow(unused)] fn main() { fn process(items: &mut [i32]) { items[10] = 6; } }
If items
isn't long enough, this raises a run-time panic instead of corrupting
memory.
Iter Example
/// Adds 0x00 padding for every 0xCC found
fn process(data: &mut Vec<u8>) {
for item in data.iter_mut() {
if *item == 0xCC {
data.push(0);
}
}
}
Rust won't let you modify the Vec<u8>
whilst you iterate through it - this
breaks the rules around exclusive borrows.
Note:
This is trivial to do in C++ and causes silent corruption.
Iter Example (fixed)
#![allow(unused)] fn main() { /// Adds 0x00 padding for every 0xCC found fn process(data: &mut Vec<u8>) { let padding_byte_count = data.iter().filter(|&&x| x == 0xCC).count(); for _ in 0..padding_byte_count { data.push(0); } } }
Rust is thread-safe
- Types must be marked as safe for:
- Transferring ownership between threads, and/or
- Transferring a reference between threads
- You cannot create race-hazards!
APIs can reason about thread-safety
- Rust channels require types to be marked as thread-safe
- Passing values when starting a spawned thread - same checks
- The ref-counting allocation type
Rc<T>
is not thread-safe - The atomic-ref-counting allocation type
Arc<T>
is (but is slightly slower) - Make the wrong choice? Compiler stops you!
Thread Example
fn main() {
let mut total = 0;
for _ in 0..10 {
std::thread::spawn(|| {
total += 1;
});
}
println!("{total}");
}
Note:
- Failure 1 - threads can live forever, but they are trying to borrow a variable on the stack of the main function
- Failure 2 - multiple threads trying to take mutable (exclusive) access to a variable
Thread Example (Fixed)
use std::sync::atomic::{AtomicU32, Ordering}; fn main() { let total = AtomicU32::new(0); std::thread::scope(|s| { for _ in 0..10 { s.spawn(|| total.fetch_add(1, Ordering::Relaxed)); } }); println!("{}", total.load(Ordering::Relaxed)); }
There's an escape hatch
- Where the compiler cannot verify the rules are upheld, you can tell it you've done the checks manually
- We create
unsafe { }
blocks andunsafe fn
functions - Lets you access raw pointers (e.g. for memory-mapped I/O)
- When you audit/review the code, you pay close attention to these parts!
2) Performance
A Comparison
Let's use Python to calculate the sum of the cubes of the first 100 million integers.
import datetime
start = datetime.datetime.now()
cube_sum = sum(
map(
lambda x: x * x * x,
range(0, 100_000_000)
)
)
print(f"Took {datetime.datetime.now() - start}")
print(f"cube_sum = {cube_sum}")
>>> run()
Took 0:00:09.076986
24999999500000002500000000000000
In Rust?
fn main() { let start = std::time::Instant::now(); let sum: u128 = (0..100_000_000u32) .into_iter() .map(|n| { let n = u128::from(n); n * n * n }) .sum(); println!("Took {:?}", start.elapsed()); println!("sum = {sum}"); }
$ cargo run --release
Compiling process v0.1.0 (/Users/jonathan/process)
Finished release [optimized] target(s) in 0.34s
Took 45ns
sum = 24999999500000002500000000000000
OK, but it's cheating
fn main() { let start = std::time::Instant::now(); let sum: u128 = (0..100_000_000u32) .into_iter() .map(|n| { let n = u128::from(n); std::hint::black_box(n * n * n) }) .sum(); println!("Took {:?}", start.elapsed()); println!("sum = {sum}"); }
$ cargo run --release
Compiling process v0.1.0 (/Users/jonathan/process)
Finished release [optimized] target(s) in 0.34s
Took 68.014583ms
sum = 24999999500000002500000000000000
Let's use all our CPU cores...
// Import the rayon library
use rayon::prelude::*;
fn main() {
let start = std::time::Instant::now();
// Swap `into_iter` for `into_par_iter`
let sum: u128 = (0..100_000_000u32)
.into_par_iter()
.map(|n| {
let n = u128::from(n);
std::hint::black_box(n * n * n)
})
.sum();
println!("Took {:?}", start.elapsed());
println!("sum = {sum}");
}
Let's use all our CPU cores...
$ cargo add rayon
Updating crates.io index
Adding rayon v1.6.1 to dependencies.
$ cargo run --release
...
Compiling rayon v1.6.1
Compiling process v0.1.0 (/Users/jonathan/process)
Finished release [optimized] target(s) in 2.38s
Running `target/release/process`
Took 9.928125ms
sum = 24999999500000002500000000000000
Sure, but C can do this too, right?
$ clang -o ./target/main src/main.c -O3 -mcpu=native -std=c17 && ./target/main
sum 0x13b8b5ae675d38cb7260b704000
Took 70.3 milliseconds
And was getting that performance ... enjoyable?
#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>
#include <time.h>
int main(int argc, char** argv) {
uint64_t start = clock_gettime_nsec_np(CLOCK_MONOTONIC);
__uint128_t x = 0;
for(uint32_t idx = 0; idx < 100000000; idx++) {
__uint128_t i = (__uint128_t) idx;
volatile __uint128_t result = i * i * i;
x += result;
}
uint64_t end = clock_gettime_nsec_np(CLOCK_MONOTONIC);
printf("sum 0x%08llx%08llx\n", (unsigned long long) (x >> 64), (unsigned long long) x);
printf("Took %.3g milliseconds\n", ((double) (end - start)) / (1000.0 * 1000.0) );
return 0;
}
3) Productivity
libstd
- Filesystem access and Path handling
- Heap allocation, with optional reference-counting
- Threads, with Mutexes, Condition Variables, and Channels
- Strings, and a powerful value formatting system
- Growable arrays, hash-tables, B-Trees
- First-class Unicode text support
- Networking support (IPv4/IPv6, TCP/UDP, etc)
- I/O traits for working with files, strings, sockets, etc
- Time handling: Duration and Instant
- Environment Variables and CLI arguments
Much less time chasing down weird bugs
- If it compiles, it'll probably work right
- No data races across threads
- No double frees, buffer overflows
Async Programming
- Third-party libraries (e.g. tokio) give you all that but with an asynchronous API
- Great if your code spends a lot of time waiting (for the disk, for the network)
Tools like rust-analyzer
have powerful auto-completion
- Filling in functions to meet a trait definition
- Covering all the arms in a match expression
- Importing modules or qualifying a given type
Built in testing
- The test-runner compiles and runs:
- All your unit tests
- All your integration tests
- All the code examples in your docs!
- It also compiles all your examples
It's completely cross-platform
- Windows, Linux and macOS devs all working with the same tools
- You can build stand-alone binaries that are trivial to deploy