Understanding Smart Pointers in Rust: A Comprehensive Guide
Introduction
Before we go into what smart pointers are let’s try to set a clear basis for what a pointer is, a pointer in programming is often a piece of data that directs to the location of another piece of data, like for example, your home address directs to where you live. Smart pointers are like regular pointers that direct to the location of a piece of data but with additional capabilities which include allocating multiple owners to a value, interior mutability, and more.
Ps.
&
which connotes a reference in Rust can also be regarded as a pointer since it points to a reference of a piece of data.
In this article, we’ll look at common smart pointers in Rust and how we can use them. some of the common smart pointers in Rust include Box
, Rc
, Arc
, Refcell
, and Mutex
.
Box
The Box smart pointer is often used to allocate data to the heap, so with the Box smart pointer you can allocate an i32
which is usually allocated on the stack to the heap instead, this is usually helpful when you have large data that you don’t want to be stored on the stack because of its limited size.
Here’s how you use the Box smart pointer:
// Our i32 will be allocated on the heap instead of the stack
let heap_allocated_i32 = Box::new(1);
Here’s how you can store large data with Box
:
struct LargeData {
data: [i32; 1000000], // An array with 1 million elements
}
fn main(){
let boxed_data = Box::new(LargeData{
data: [0; 1000000], // Initialize the array with zeros
});
}
Normally, a variable with the type [i32; 1000000]
would be stored on the stack but this would be inefficient because of its size and that’s why we would want to store it on the heap instead.
Another reason you would want to use a Box
is to create recursive data structures like binary trees and linked lists. Recursive data structures, such as trees, often self-reference, making it hard to determine their size at compile time in Rust. The Box
smart pointer helps bypass this by storing a fixed-size pointer to the data on the stack, while the actual data resides on the heap. This method enables the creation of recursive structures without needing to know their size in advance.
Here’s a simple implementation of a binary tree with Box
#[derive(Debug)]
struct TreeNode<T> {
value: T,
left: Option<Box<TreeNode<T>>>,
right: Option<Box<TreeNode<T>>>,
}
impl<T> TreeNode<T> {
fn new(value: T) -> TreeNode<T> {
TreeNode {
value,
left: None,
right: None,
}
}
}
fn main() {
let left_leaf = TreeNode::new("left leaf");
let right_leaf = TreeNode::new("right leaf");
let root = TreeNode {
value: "root",
left: Some(Box::new(left_leaf)),
right: Some(Box::new(right_leaf)),
};
println!("{:#?}", root);
}
Another important difference between the Box
smart pointer and a regular pointer is the fact that the Box
smart pointer is an owning pointer, when you Drop
the Box
, it will Drop
the T
it contains.
RC (Reference counting)
The Rust compiler follows a rule in which every variable is supposed to have one owner but with the RC
smart pointer, we can mess around with that rule. The Reference counting(RC
) smart pointer as the name implies keeps count of how many variables own the data it wraps and the data is deallocated from memory when the number of owners for that data gets to zero.
Here’s an example:
#[derive(Debug)]
//This is how we bring Rc into scope
use std::rc::Rc;
struct Person {
name: String,
age: u32,
}
fn main() {
let person1 = Rc::new(Person {
name: "Alice".to_string(),
age: 25,
});
// Clone the Rc pointer to create additional references
let person2 = Rc::clone(&person1);
let person3 = Rc::clone(&person1);
println!("Name: {}, Age: {}", person1.name, person1.age);
println!("Name: {}, Age: {}", person2.name, person2.age);
println!("Name: {}, Age: {}", person3.name, person3.age);
println!("Reference Count: {}", Rc::strong_count(&person1));
}
It is worth noting that the clone method on Rc
does not clone the data it wraps but instead makes another Rc
that points to the data on the heap.
Arc(Atomic reference counting)
The Arc
smart pointer is just like the Rc
smart pointer but with a little bonus, it is thread-safe. what this simply implies is that the Arc
smart pointer lets us give multiple variables ownership of a certain piece of data while being able to access it in multiple threads. Let’s try our previous Rc
code examples with multiple threads and see how it performs:
use std::thread;
use std::rc::Rc;
struct Person {
name: String,
age: u32,
}
fn main() {
let person = Rc::new(Person {
name: "Alice".to_string(),
age: 25,
});
let person_clone1 = Rc::clone(&person);
let person_clone2 = Rc::clone(&person);
let thread1 = thread::spawn(move || {
println!("Thread 1: Name={}, Age={}", person_clone1.name, person_clone1.age);
// Simulate some work being done in thread 1
thread::sleep_ms(1000);
});
let thread2 = thread::spawn(move || {
println!("Thread 2: Name={}, Age={}", person_clone2.name, person_clone2.age);
// Simulate some work being done in thread 2
thread::sleep_ms(1000);
});
thread1.join().unwrap();
thread2.join().unwrap();
println!("Reference Count: {}", Rc::strong_count(&person));
}
When we run this code, we’ll end up with this error:
error[E0277]: `Rc<Person>` cannot be sent between threads safely
--> src/main.rs:18:33
We get this error because Rc
is not thread-safe and is meant only for a single thread.
If we wanted to be able to share data among multiple threads with the Arc
smart pointer, here’s how we would do it:
use std::thread;
use std::sync::Arc;
struct Person {
name: String,
age: u32,
}
fn main() {
let person = Arc::new(Person {
name: "Alice".to_string(),
age: 25,
});
let person_clone1 = Arc::clone(&person);
let person_clone2 = Arc::clone(&person);
let thread1 = thread::spawn(move || {
println!("Thread 1: Name={}, Age={}", person_clone1.name, person_clone1.age);
// Simulate some work being done in thread 1
thread::sleep_ms(1000);
});
let thread2 = thread::spawn(move || {
println!("Thread 2: Name={}, Age={}", person_clone2.name, person_clone2.age);
// Simulate some work being done in thread 2
thread::sleep_ms(1000);
});
thread1.join().unwrap();
thread2.join().unwrap();
// Hey Curly, do you know why this is one? I know it has something to do with threads
println!("Reference Count: {}", Arc::strong_count(&person));
}
Although Arc
is better suited for multiple-threaded cases, it is slower than Rc
when dealing with single threads.
RefCell(Reference Cell)
We saw how we could mess around with ownership rules with Rc
and Arc
by being able to allocate multiple owners to a value in single and multiple threads respectively. With the RefCell
smart pointer, we can bend the borrowing rules by mutating immutable references, and this pattern is often referred to as interior mutability in Rust.
One of the borrowing rules in Rust implies that you cannot have a mutable reference to an immutable value, so when we try something like this:
fn main(){
let a : i32 = 14;
*&mut a += 1;
println!("{}", a);
}
We would get an error like this:
error[E0596]: cannot borrow `a` as mutable, as it is not declared as mutable
--> src/main.rs:3:3
|
3 | *&mut a += 1;
| ^^^^^^ cannot borrow as mutable
|
help: consider changing this to be mutable
|
2 | let mut a : i32 = 14;
| +++
Of course, we could just do what the compiler says and make a
mutable by adding mut a: i32
but what if we can’t? then we would have to use the RefCell
smart pointer like so:
use std::cell::RefCell;
fn main(){
let a: RefCell<i32> = RefCell::new(14);
*a.borrow_mut() += 1;
println!("{}", *a.borrow());
}
Note: You can think of
.borrow
and.borrow_mut
methods as&
and&mut
respectively for theRefCell
smart pointer
Let’s look at a simple real-life scenario where we would need to use the RefCell
smart pointer and the interior mutability pattern. Imagine you wanted to implement a trait for a data type and one of the methods of the trait takes an immutable reference as its parameter like &self
but you want to be able to mutate this parameter, because of the Rust borrowing you would not be able to do this normally but thankfully we have RefCell
in our toolkit.
For example, say this is the trait we wanted to implement for our data type:
trait Counter {
fn increment(&self);
fn get(&self) -> i32;
}
Here’s our data type called Count
use std::cell::RefCell;
struct Count {
value: RefCell<i32>,
}
impl Count {
fn new() -> Self {
Count {
value: RefCell::new(0),
}
}
}
Here’s how we would implement the method for our Count
type:
impl Counter for Count {
fn increment(&self) {
// we make a mutable reference of `&self` with the borrow_mut method
let mut value = self.value.borrow_mut();
// then mutate the mutable reference
*value += 1;
}
fn get(&self) -> i32 {
*self.value.borrow()
}
}
Our entire code including a main
function for testing should look like this:
use std::cell::RefCell;
trait Counter {
fn increment(&self);
fn get(&self) -> i32;
}
struct Count {
value: RefCell<i32>,
}
impl Count {
fn new() -> Self {
Count {
value: RefCell::new(0),
}
}
}
impl Counter for Count {
fn increment(&self) {
// we make a mutable reference of `&self` with the borrow_mut method
let mut value = self.value.borrow_mut();
// then mutate the mutable reference
*value += 1;
}
fn get(&self) -> i32 {
*self.value.borrow()
}
}
fn main() {
let count = Count::new();
count.increment();
count.increment();
println!("Count: {}", count.get()); // Output will be "Count: 2"
}
Mutex(Mutual Exclusion)
The Mutex
smart pointer is helpful when we want to be able to mutate shared data in multiple threads safely. As the full acronym implies “Mutual exclusion”, each thread can lock a value while mutating it until it's out of scope, the lock each thread places on a shared value prevents other threads from mutating it. Let’s look at an example:
use std::sync::Mutex;
fn main(){
// wrap an integer in a Mutex
let value = Mutex::new(0);
// lock `value` to this variable
let mut value_changer = value.lock().unwrap();
// deference value then increment the wrapped integer by 1
*value_changer += 1;
println!("{}", value_changer); // Output: 1
}
In this example, we wrap an integer in a Mutex
and assign it to a variable called value
, later we lock the Mutex
to the value_changer
variable and then increment it on the next line. With the lock placed on value by value_changer
no other variable will be to mutate or even access it. Take for example:
use std::sync::Mutex;
fn main(){
//same code
let value = Mutex::new(0);
let mut value_changer = value.lock().unwrap();
*value_changer += 1;
//Look here!
println!("{:?}", &value); // Mutex { data: <locked>, poisoned: false, .. }
}
When we try to output value
we get Mutex { data: <locked>, poisoned: false, .. }
notice how the data says locked, that’s due to the lock()
the method we called on value earlier while assigning it to value_changer
. To be able to “unlock” value, we’d need to wait for value_changer
to go out of scope or use the unlock
method on it.
use std::sync::Mutex;
fn main(){
//same code
let value = Mutex::new(0);
let mut value_changer = value.lock().unwrap();
*value_changer += 1;
//This is the same as a variable going out of scope
std::mem::drop(value_changer);
//Look here!
println!("{:?}", value); // Mutex { data: 1, poisoned: false, .. }
}
Notice how when value_changer
goes out of scope, value
can access the Mutex
, this is similar to how variables are dropped after a thread
has ended, so when you lock in a Mutex
in a thread
, only that thread will have access to the value and will be able to mutate it. We do this because we want to be able to protect data when it is being used in multiple threads to prevent race conditions and make use of our concurrent program is thread-safe.
Conclusion
In this article, we have looked at smart pointers in Rust, what they are, and how we can use them to our advantage. We have looked at some common smart pointers in Rust, including Box
, Rc
, Arc
, RefCell
, and Mutex
. We have also seen how we can use smart pointers to allocate data directly to the heap, create recursive data structures like binary trees and linked lists, mess around with ownership rules, and implement the interior mutability pattern in Rust. Finally, we have looked at how we can use the Mutex
smart pointer to protect data when it is being used in multiple threads and prevent race conditions to make our concurrent programs thread-safe.