Derive Proc macro step by step
Thu 26 September 2024
Tokensstreamin macro (photography) (Photo credit: wikipedia)
I recently try to write a proc_macro in rust but failed to find enough example and I didn't have all the needed knowledge to fully understand the examples I found.
The use case
Let's take a toy use case. Let's say we define simple enums (without taking advantage of all the power of rust enums) and we want to iterate over the categories.
enum Direction {
Est,
West,
North,
South,
}
We want to get a function on our enum so that we can iter on catgories
let wanted = [
Direction::Est,
Direction::West,
Direction::North,
Direction::South,
].iter().collect::<HashSet<_>>();
assert_eq!(Direction::categories().iter().collect::HashSet(), wanted)
Given I use a HashSet and test an equality, I must derive Hash, Eq, PartialEq.
Token
One of the first step in the compilation process (and in most of text processing) is to divide the source code in lexical tokens. A token is a meaningful piece of text. In rust, a token can be:
- an identifier (mostly variable names)
- a literal (strings, numbers,...)
- a keyword
- a symbol (
+
,&
,==
...) - other things
A source code can be parsed in a sequence of tokens. If the compiler finds the sequence is not a syntaxically valid one, it will fail with a message that looks like:
` expected one of `.`, `;`, `?`, `else`, or an operator `
Token Stream
A procedural macro takes as input a token stream.
A token stream is a sequence of token (as its name suggest).
When you define a #[proc_macro_derive]
, the input is the token stream
obtained by parsing the structure it is applied to.
The token stream your macro returns is a valid sequence of tokens.
Simple implementation
Toy implementation
We can implement a function for our enum.
First, a procedural macro must be in a specific crate. For this crate, you must put in the Cargo.toml
[lib]
proc-macro = true
We can generate a valid TokenStream from a string. Let's do it. The signature is as mention in the rust book.
extern crate proc_macro;
use proc_macro::TokenStream;
#[proc_macro_derive(Category)]
pub fn derive_category(item: TokenStream) -> TokenStream {
let name = item.into_iter().nth(1).unwrap(); // first 2 tokens should be `enum name`
let new_func = vec![
format!("impl {name}{{"),
"pub fn categories() -> Self {".into(),
"Vec::new()".into(),
"}".into(),
"}".into(),
];
new_func.join("\n").parse().unwrap()
}
This code will produce an empty function named categories() for the structure. Let's see with cargo expand. To expand tests, we must precise what target we want to expand. Moreover, testing a macro cannot be done in unit tests. It must be done in integration tests.
Let's edit a test integration file, e.g. tests/simple.rs
#[cfg(test)]
mod tests {
use categories_macro::Category;
use std::collections::HashSet;
#[derive(Category, Hash, Eq, PartialEq, Debug)]
enum Direction {
Est,
West,
North,
South,
}
#[test]
fn simple_expansion() {
let wanted = [
Direction::Est,
Direction::West,
Direction::North,
Direction::South,
]
.iter()
.collect::<HashSet<_>>();
assert_eq!(
Direction::categories().iter().collect::<HashSet<_>>(),
wanted
)
}
}
As we want to put our direction in HashSet and test equality between elements, we must derive Hash, Eq, PartialEq.
Now let's run
cargo expand --test simple
And as expected:
...
enum Direction {
Est,
West,
North,
South,
}
impl Direction {
pub fn categories() -> Vec<Self> {
Vec::new()
}
}
...
Link to the code
Given this basic implementation, we can play with input token stream. Remember the macro is expanded in the first steps of compilation.
Useful implementation
Now that we can generate code, let's generate code so that our test pass. We just need to iterate over Tokens. If two consecutive tokens are (Ident, Punc), then we can assume the name is a needed category.
Let's modify our code. Here is the src/lib.rs
extern crate proc_macro;
use proc_macro::TokenStream;
use proc_macro::TokenTree::{Group, Ident, Punct};
#[proc_macro_derive(Category)]
pub fn derive_category(item: TokenStream) -> TokenStream {
let name = item.clone().into_iter().nth(1).unwrap();
let mut new_func = vec![
format!("impl {name}{{"),
"pub fn categories() -> Vec<Self>{".into(),
"let mut v = Vec::new();".into(),
];
let variants = match item.into_iter().nth(2).unwrap() {
// should be a group delimited by curly braces
Group(g) => g.stream().into_iter().collect::<Vec<_>>(),
_ => unreachable!(),
};
let tokens = variants.windows(2);
for elts in tokens {
match (elts.first(), elts.get(1)) {
(Some(Ident(x)), Some(Punct(_))) => new_func.push(format!("v.push(Self::{x});")),
_ => {
// TODO: handle other cases
}
}
}
new_func.push("v}\n}".into());
new_func.join("\n").parse().unwrap()
}
And now, as expected the test pass.
More complicated implementation
Let's now see a more complex implementation. Let's modify our enum to put a category with one parameter:
...
enum Ingredients {
Egg(usize),
Butter,
Flour,
Sugar,
}
...
Given (usize) is parsed as a TokenStream::Group, the modification is straight forward:
...
let tokens = variants.windows(2);
for elts in tokens {
match (elts.first(), elts.get(1)) {
(Some(Ident(x)), Some(Punct(_))) => new_func.push(format!("v.push(Self::{x});")),
(Some(Ident(x)), Some(Group(_))) => new_func.push(format!("v.push(Self::{x}(Default::default()));")),
_ => {
// TODO: handle other cases
}
}
}
...
We can now add some test:
...
#[derive(Category, Eq, PartialEq, Debug)]
enum Ingredient {
Butter,
Eggs(usize),
Flour,
Sugar,
}
#[test]
fn complexe_expansion() {
let ingredients = Ingredient::categories();
assert_eq!(ingredients.len(), 4);
assert!(ingredients.contains(&Ingredient::Eggs(usize::default())));
assert!(ingredients.contains(&Ingredient::Butter));
assert!(ingredients.contains(&Ingredient::Flour));
assert!(ingredients.contains(&Ingredient::Sugar));
}
...
and the result of the cargo expand
...
enum Ingredient {
Butter,
Eggs(usize),
Flour,
Sugar,
}
impl Ingredient {
pub fn categories() -> Vec<Self> {
let mut v = Vec::new();
v.push(Self::Butter);
v.push(Self::Eggs(Default::default()));
v.push(Self::Flour);
v.push(Self::Sugar);
v
}
}
...
Link to the code
Crate organisation
A crate made for a proc_macro has some limitations. To handle those limitations, popular crates such as quickcheck put the proc_macro crate inside the initial crate. Other such as serde have crates dedicated to the proc_macro.
Related articles (or not):
- evaluation of exploration values
- languages popularity
- quick pcap transformation
- Kmeans with Polars
- Mastermind solver in rust
Category: howto Tagged: rust programming tutorial, howto