Heap Allocation (Box, Rc and Cow)
Where do Rust variables live?
struct Square { width: f32 } fn main() { let x: u64 = 0; let y = Square { width: 1.0 }; let mut z: String = "Hello".to_string(); z.push_str(", world!"); }
Note:
- The variable
x
is an 8-byte (64-bit) value, and lives on the stack. - The variable
y
is a 4-byte value, and also lives on the stack. - The variable
z
is a 3x4-byte value on 32-bit platforms, and a 3x8-byte value on 64-bit platforms. TheString
itself is a struct, and the bytes contained within the struct live on the heap.
Let's see some addresses...
struct Square { width: f32 } fn main() { let x: u64 = 0; let y = Square { width: 1.0 }; let mut z: String = "Hello".to_string(); z.push_str(", world!"); println!("x @ {:p}", &x); println!("y @ {:p}", &y); println!("z @ {:p}", &z); println!("z @ {:p}", z.as_str()); }
Note:
You expect to see something like:
x @ 0x7ffc2272c618
y @ 0x7ffc2272c624
z @ 0x7ffc2272c628
z @ 0x555829f269d0
The first z @
line is the struct String { ... }
itself. The second z @
line are the bytes the String
contains. They have a different addresses because they are in the heap and not on the stack.
If you run it multiple times, you will get different results. This is due to the Operating System randomizing the virtual addresses used for the stack and the heap, to make security vulnerabilities harder to exploit.
On macOS, you can run vmmap <pid>
to print the addresses for each region. On Linux you can use pmap <pid>
, or you could add something like:
#![allow(unused)] fn main() { if let Ok(maps) = std::fs::read_to_string(&format!("/proc/{}/maps", std::process::id())) { println!("{}", maps); } }
How does Rust handle the heap?
On three levels:
- Talking to your Operating System (or its C Library)
- A low-level API, called the Global Allocator
- A high-level API, with
Box
,Rc
,Vec
, etc
What's in the Box?
- A
Box<T>
in Rust, is a handle to a unique, owned, heap-allocated value of typeT
- The value is the size of a pointer
- The contents of the Box can be any T (including unsized things)
Note:
Pointers can be 'thin' (one word in length) or 'wide' (two words in length). In a wide pointer, the second word holds the length of the thing being pointed to, or a pointer to the vtable if it's a dyn-trait pointer. The same applies to Boxes.
Why not raw pointers?
Because Box<T>
:
- doesn't let you do pointer arithmetic on it
- will automatically free the memory when it goes out of scope
- implements
Deref<T>
andDerefMut<T>
Making a Box
The Deref
and DerefMut
trait implementations let us use a Box quite naturally:
fn main() { let x: Box<f64> = Box::new(1.0_f64); let y: f64 = x.sin() * 2.0; let z: &f64 = &x; println!("x={x}, y={y}, z={z}"); }
When should I use a Box?
- Not very often - friendlier containers (like
Vec<T>
) exist - If you have a large value that moves around a lot
- Moving a
Box<T>
is cheap, because only the pointer moves, not the contents
- Moving a
- To hide the size or type of a returned value...
Boxed Traits
fn make_stuff(want_integer: bool) -> Box<dyn std::fmt::Debug> { if want_integer { Box::new(42_i32) } else { Box::new("Hello".to_string()) } } fn main() { println!("make_stuff(true): {:?}", make_stuff(true)); println!("make_stuff(false): {:?}", make_stuff(false)); }
Note:
An i32
and a String
are very different sizes, and a function must have a single fixed size for the return value. But it does - it returns a Box
and the Box
itself always has the same size. The thing that varies in size is the value inside the box and that lives somewhere else - on the heap in fact.
This trick is also useful for closures, where the type cannot even be said out loud because it's compiler-generated. But you can say a closure implements the FnOnce
trait, for example.
Smarter Boxes
What if I want my Box to have multiple owners? And for the memory to be freed when both of the owners have finished with it?
We have the reference counted Rc<T>
type for that!
Using Rc<T>
use std::rc::Rc; struct Point { x: i32, y: i32 } fn main() { let first_handle = Rc::new(Point { x: 1, y: 1}); let second_handle = first_handle.clone(); let third_handle = second_handle.clone(); }
Reference Counting
- The
Rc
type is a handle to reference-counted heap allocation - When you do a
clone()
the count goes up by one - When you drop it, the count goes down by one
- The memory isn't freed until the count hits zero
- There's a
Weak
version which will not keep the allocation alive - to break cycles
Note:
A cycle would be if you managed to construct two Rc
wrapped structs and had
each one hold an Rc
reference to the other. Now neither can ever be freed,
because each will always have at least one owner (the other).
Thread-safety
Rc
cannot be sent into a thread (or through any API that requires the type to beSend
).- If in doubt, try it! Rust will save you from yourself.
- The trade-off is that
Rc
is really fast! - There is an Atomic Reference Counted type,
Arc
if you need it.
Rc is not mutable
NB: Rc
allows sharing, but not mutability...
use std::rc::{Rc, Weak}; struct Dog { name: String, owner: Weak<Human> } struct Human { name: String, pet_dogs: Vec<Dog> } fn main() { let mut sam = Rc::new( Human { name: "Sam".to_string(), pet_dogs: Vec::new() } ); let rover = Dog { name: "Rover".to_string(), owner: Rc::downgrade(&sam) }; // This is not allowed, because `sam` is actually immutable // sam.pet_dogs.push(rover); }
Note:
You get an error like:
error[E0596]: cannot borrow data in an `Rc` as mutable
--> src/main.rs:12:5
|
12 | sam.pet_dogs.push(rover);
| ^^^^^^^^^^^^ cannot borrow as mutable
|
= help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Rc<Human>`
For more information about this error, try `rustc --explain E0596`.
Why do you want this structure? Because given some &Dog
you might very well want to know who owns it!
Shared Mutability
We have more on this later...
use std::rc::{Rc, Weak}; use std::cell::RefCell; struct Dog { name: String, owner: Weak<RefCell<Human>> } struct Human { name: String, pet_dogs: Vec<Dog> } fn main() { let mut sam = Rc::new(RefCell::new( Human { name: "Sam".to_string(), pet_dogs: Vec::new() } )); let rover = Dog { name: "Rover".to_string(), owner: Rc::downgrade(&sam) }; // This is now allowed because `RefCell::borrow_mut` does a run-time borrow check sam.borrow_mut().pet_dogs.push(rover); }
Maybe Boxed, maybe not?
Why is this function less than ideal?
/// Replaces all the ` ` characters with `_` fn replace_spaces(input: &str) -> String { todo!() } fn main() { println!("{}", replace_spaces("Hello, world!")); println!("{}", replace_spaces("Hello!")); }
Note:
Did the second call replace anything? Did you have to allocate a String
and copy all the data anyway, even though nothing changed?
Copy-On-Write
Rust has the Cow
type to handle this.
/// Replaces all the ` ` characters with `_` fn replace_spaces(input: &str) -> std::borrow::Cow<str> { todo!() } fn main() { println!("{}", replace_spaces("Hello, world!")); println!("{}", replace_spaces("Hello!")); }
Note:
Cow
works on any T
where there is both a Borrowed version and an Owned version.
For example, &[u8]
and Vec<u8>
.