Derive Proc macro step by step

Thu 26 September 2024
Tokens stream in macro (photography) (Photo credit: wikipedia)

I recently try to write a proc_macro in rust but failed to find enough example and I didn't have all the needed knowledge to fully understand the examples I found.

The use case

Let's take a toy use case. Let's say we define simple enums (without taking advantage of all the power of rust enums) and we want to iterate over the categories.

enum Direction {
    Est,
    West,
    North,
    South,
}

We want to get a function on our enum so that we can iter on catgories

let wanted = [
    Direction::Est,
    Direction::West,
    Direction::North,
    Direction::South,
].iter().collect::<HashSet<_>>();

assert_eq!(Direction::categories().iter().collect::HashSet(), wanted)

Given I use a HashSet and test an equality, I must derive Hash, Eq, PartialEq.

Token

One of the first step in the compilation process (and in most of text processing) is to divide the source code in lexical tokens. A token is a meaningful piece of text. In rust, a token can be:

  • an identifier (mostly variable names)
  • a literal (strings, numbers,...)
  • a keyword
  • a symbol (+, &, ==...)
  • other things

A source code can be parsed in a sequence of tokens. If the compiler finds the sequence is not a syntaxically valid one, it will fail with a message that looks like:

` expected one of `.`, `;`, `?`, `else`, or an operator `

Token Stream

A procedural macro takes as input a token stream. A token stream is a sequence of token (as its name suggest). When you define a #[proc_macro_derive], the input is the token stream obtained by parsing the structure it is applied to. The token stream your macro returns is a valid sequence of tokens.

Simple implementation

Toy implementation

We can implement a function for our enum.

First, a procedural macro must be in a specific crate. For this crate, you must put in the Cargo.toml

[lib]
proc-macro = true

We can generate a valid TokenStream from a string. Let's do it. The signature is as mention in the rust book.

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro_derive(Category)]
pub fn derive_category(item: TokenStream) -> TokenStream {
    let name = item.into_iter().nth(1).unwrap(); // first 2 tokens should be `enum name`
    let new_func = vec![
        format!("impl {name}{{"),
        "pub fn categories() -> Self {".into(),
        "Vec::new()".into(),
        "}".into(),
        "}".into(),
    ];
    new_func.join("\n").parse().unwrap()
}

This code will produce an empty function named categories() for the structure. Let's see with cargo expand. To expand tests, we must precise what target we want to expand. Moreover, testing a macro cannot be done in unit tests. It must be done in integration tests.

Let's edit a test integration file, e.g. tests/simple.rs

#[cfg(test)]
mod tests {

    use categories_macro::Category;
    use std::collections::HashSet;
    #[derive(Category, Hash, Eq, PartialEq, Debug)]
    enum Direction {
        Est,
        West,
        North,
        South,
    }

    #[test]
    fn simple_expansion() {
        let wanted = [
            Direction::Est,
            Direction::West,
            Direction::North,
            Direction::South,
        ]
        .iter()
        .collect::<HashSet<_>>();

        assert_eq!(
            Direction::categories().iter().collect::<HashSet<_>>(),
            wanted
        )
    }
}

As we want to put our direction in HashSet and test equality between elements, we must derive Hash, Eq, PartialEq.

Now let's run

cargo expand --test simple

And as expected:

...

 enum Direction {
     Est,
     West,
     North,
     South,
 }
 impl Direction {
     pub fn categories() -> Vec<Self> {
         Vec::new()
     }
 }

 ...

Link to the code

Given this basic implementation, we can play with input token stream. Remember the macro is expanded in the first steps of compilation.

Useful implementation

Now that we can generate code, let's generate code so that our test pass. We just need to iterate over Tokens. If two consecutive tokens are (Ident, Punc), then we can assume the name is a needed category.

Let's modify our code. Here is the src/lib.rs

extern crate proc_macro;
use proc_macro::TokenStream;
use proc_macro::TokenTree::{Group, Ident, Punct};

#[proc_macro_derive(Category)]
pub fn derive_category(item: TokenStream) -> TokenStream {
    let name = item.clone().into_iter().nth(1).unwrap();
    let mut new_func = vec![
        format!("impl {name}{{"),
        "pub fn categories() -> Vec<Self>{".into(),
        "let mut v = Vec::new();".into(),
    ];

    let variants = match item.into_iter().nth(2).unwrap() {
        // should be a group delimited by curly braces
        Group(g) => g.stream().into_iter().collect::<Vec<_>>(),
        _ => unreachable!(),
    };
    let tokens = variants.windows(2);
    for elts in tokens {
        match (elts.first(), elts.get(1)) {
            (Some(Ident(x)), Some(Punct(_))) => new_func.push(format!("v.push(Self::{x});")),
            _ => {
                // TODO: handle other cases
            }
        }
    }
    new_func.push("v}\n}".into());
    new_func.join("\n").parse().unwrap()
}

And now, as expected the test pass.

Link to the code

More complicated implementation

Let's now see a more complex implementation. Let's modify our enum to put a category with one parameter:

...

 enum Ingredients {
     Egg(usize),
     Butter,
     Flour,
     Sugar,
 }

 ...

Given (usize) is parsed as a TokenStream::Group, the modification is straight forward:

...

let tokens = variants.windows(2);
for elts in tokens {
    match (elts.first(), elts.get(1)) {
        (Some(Ident(x)), Some(Punct(_))) => new_func.push(format!("v.push(Self::{x});")),
        (Some(Ident(x)), Some(Group(_))) => new_func.push(format!("v.push(Self::{x}(Default::default()));")),
        _ => {
            // TODO: handle other cases
        }
    }
}

...

We can now add some test:

...

 #[derive(Category, Eq, PartialEq, Debug)]
 enum Ingredient {
     Butter,
     Eggs(usize),
     Flour,
     Sugar,
 }

 #[test]
 fn complexe_expansion() {
     let ingredients = Ingredient::categories();
     assert_eq!(ingredients.len(), 4);
     assert!(ingredients.contains(&Ingredient::Eggs(usize::default())));
     assert!(ingredients.contains(&Ingredient::Butter));
     assert!(ingredients.contains(&Ingredient::Flour));
     assert!(ingredients.contains(&Ingredient::Sugar));
 }

 ...

and the result of the cargo expand

...

enum Ingredient {
    Butter,
    Eggs(usize),
    Flour,
    Sugar,
}
impl Ingredient {
    pub fn categories() -> Vec<Self> {
        let mut v = Vec::new();
        v.push(Self::Butter);
        v.push(Self::Eggs(Default::default()));
        v.push(Self::Flour);
        v.push(Self::Sugar);
        v
    }
}

...

Link to the code

Crate organisation

A crate made for a proc_macro has some limitations. To handle those limitations, popular crates such as quickcheck put the proc_macro crate inside the initial crate. Other such as serde have crates dedicated to the proc_macro.

Category: howto Tagged: rust programming tutorial, howto


R mclapply cores option

Mon 11 August 2014

This morning, I found the default behavior of the mclapply() function not quite different from the one of the lapply().  After quick investigation, I found is was due to an option not correctly set: the number of core to use.

> getOption("cores")
NULL

I must then overwrite the default mc …

Category: programming Tagged: R programming useless

Read More

unfactor R factor

Thu 17 July 2014
Consider the following R code:
a <- factor(floor(rnorm(100)))        class(a)        summary(a)
Why should you have that? great question. Maybe because of memory place. I really don’t know, but I received some R dataset in the form of a factor. My problem is to revert this process …

Category: R Tagged: how to programming R useless

Read More
Page 1 of 2

Next »