Derive Proc macro step by step

Thu 26 September 2024
Tokens stream in macro (photography) (Photo credit: wikipedia)

I recently try to write a proc_macro in rust but failed to find enough example and I didn't have all the needed knowledge to fully understand the examples I found.

The use case

Let's take a toy use case. Let's say we define simple enums (without taking advantage of all the power of rust enums) and we want to iterate over the categories.

enum Direction {
    Est,
    West,
    North,
    South,
}

We want to get a function on our enum so that we can iter on catgories

let wanted = [
    Direction::Est,
    Direction::West,
    Direction::North,
    Direction::South,
].iter().collect::<HashSet<_>>();

assert_eq!(Direction::categories().iter().collect::HashSet(), wanted)

Given I use a HashSet and test an equality, I must derive Hash, Eq, PartialEq.

Token

One of the first step in the compilation process (and in most of text processing) is to divide the source code in lexical tokens. A token is a meaningful piece of text. In rust, a token can be:

  • an identifier (mostly variable names)
  • a literal (strings, numbers,...)
  • a keyword
  • a symbol (+, &, ==...)
  • other things

A source code can be parsed in a sequence of tokens. If the compiler finds the sequence is not a syntaxically valid one, it will fail with a message that looks like:

` expected one of `.`, `;`, `?`, `else`, or an operator `

Token Stream

A procedural macro takes as input a token stream. A token stream is a sequence of token (as its name suggest). When you define a #[proc_macro_derive], the input is the token stream obtained by parsing the structure it is applied to. The token stream your macro returns is a valid sequence of tokens.

Simple implementation

Toy implementation

We can implement a function for our enum.

First, a procedural macro must be in a specific crate. For this crate, you must put in the Cargo.toml

[lib]
proc-macro = true

We can generate a valid TokenStream from a string. Let's do it. The signature is as mention in the rust book.

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro_derive(Category)]
pub fn derive_category(item: TokenStream) -> TokenStream {
    let name = item.into_iter().nth(1).unwrap(); // first 2 tokens should be `enum name`
    let new_func = vec![
        format!("impl {name}{{"),
        "pub fn categories() -> Self {".into(),
        "Vec::new()".into(),
        "}".into(),
        "}".into(),
    ];
    new_func.join("\n").parse().unwrap()
}

This code will produce an empty function named categories() for the structure. Let's see with cargo expand. To expand tests, we must precise what target we want to expand. Moreover, testing a macro cannot be done in unit tests. It must be done in integration tests.

Let's edit a test integration file, e.g. tests/simple.rs

#[cfg(test)]
mod tests {

    use categories_macro::Category;
    use std::collections::HashSet;
    #[derive(Category, Hash, Eq, PartialEq, Debug)]
    enum Direction {
        Est,
        West,
        North,
        South,
    }

    #[test]
    fn simple_expansion() {
        let wanted = [
            Direction::Est,
            Direction::West,
            Direction::North,
            Direction::South,
        ]
        .iter()
        .collect::<HashSet<_>>();

        assert_eq!(
            Direction::categories().iter().collect::<HashSet<_>>(),
            wanted
        )
    }
}

As we want to put our direction in HashSet and test equality between elements, we must derive Hash, Eq, PartialEq.

Now let's run

cargo expand --test simple

And as expected:

...

 enum Direction {
     Est,
     West,
     North,
     South,
 }
 impl Direction {
     pub fn categories() -> Vec<Self> {
         Vec::new()
     }
 }

 ...

Link to the code

Given this basic implementation, we can play with input token stream. Remember the macro is expanded in the first steps of compilation.

Useful implementation

Now that we can generate code, let's generate code so that our test pass. We just need to iterate over Tokens. If two consecutive tokens are (Ident, Punc), then we can assume the name is a needed category.

Let's modify our code. Here is the src/lib.rs

extern crate proc_macro;
use proc_macro::TokenStream;
use proc_macro::TokenTree::{Group, Ident, Punct};

#[proc_macro_derive(Category)]
pub fn derive_category(item: TokenStream) -> TokenStream {
    let name = item.clone().into_iter().nth(1).unwrap();
    let mut new_func = vec![
        format!("impl {name}{{"),
        "pub fn categories() -> Vec<Self>{".into(),
        "let mut v = Vec::new();".into(),
    ];

    let variants = match item.into_iter().nth(2).unwrap() {
        // should be a group delimited by curly braces
        Group(g) => g.stream().into_iter().collect::<Vec<_>>(),
        _ => unreachable!(),
    };
    let tokens = variants.windows(2);
    for elts in tokens {
        match (elts.first(), elts.get(1)) {
            (Some(Ident(x)), Some(Punct(_))) => new_func.push(format!("v.push(Self::{x});")),
            _ => {
                // TODO: handle other cases
            }
        }
    }
    new_func.push("v}\n}".into());
    new_func.join("\n").parse().unwrap()
}

And now, as expected the test pass.

Link to the code

More complicated implementation

Let's now see a more complex implementation. Let's modify our enum to put a category with one parameter:

...

 enum Ingredients {
     Egg(usize),
     Butter,
     Flour,
     Sugar,
 }

 ...

Given (usize) is parsed as a TokenStream::Group, the modification is straight forward:

...

let tokens = variants.windows(2);
for elts in tokens {
    match (elts.first(), elts.get(1)) {
        (Some(Ident(x)), Some(Punct(_))) => new_func.push(format!("v.push(Self::{x});")),
        (Some(Ident(x)), Some(Group(_))) => new_func.push(format!("v.push(Self::{x}(Default::default()));")),
        _ => {
            // TODO: handle other cases
        }
    }
}

...

We can now add some test:

...

 #[derive(Category, Eq, PartialEq, Debug)]
 enum Ingredient {
     Butter,
     Eggs(usize),
     Flour,
     Sugar,
 }

 #[test]
 fn complexe_expansion() {
     let ingredients = Ingredient::categories();
     assert_eq!(ingredients.len(), 4);
     assert!(ingredients.contains(&Ingredient::Eggs(usize::default())));
     assert!(ingredients.contains(&Ingredient::Butter));
     assert!(ingredients.contains(&Ingredient::Flour));
     assert!(ingredients.contains(&Ingredient::Sugar));
 }

 ...

and the result of the cargo expand

...

enum Ingredient {
    Butter,
    Eggs(usize),
    Flour,
    Sugar,
}
impl Ingredient {
    pub fn categories() -> Vec<Self> {
        let mut v = Vec::new();
        v.push(Self::Butter);
        v.push(Self::Eggs(Default::default()));
        v.push(Self::Flour);
        v.push(Self::Sugar);
        v
    }
}

...

Link to the code

Crate organisation

A crate made for a proc_macro has some limitations. To handle those limitations, popular crates such as quickcheck put the proc_macro crate inside the initial crate. Other such as serde have crates dedicated to the proc_macro.

Category: howto Tagged: rust programming tutorial, howto


GNSS as an optimization problem

Thu 20 July 2023
Satellite optimized (Photo credit: Wikipedia)

GNSS's basic functionality works as follow: Satellites sends their position and their local time frequently. The time is synchronized between satellites thanks to the most accurate time keeping devices ever made (atomic clock). Each GNSS constellation implements its own time (e.g. GPS time). The …

Category: maths Tagged: python maths optimization

Read More

Suicide burn

Sat 27 August 2022
Rocket burn (Photo credit: Wikipedia)

Principle overview

The goal of the suicide burn (also called hoverslam) [1] is to land a rocket while minimal thrust is not low enough to hover [2]. Thus, the engine must be turned off at the point where the rocket land.

The challenge is then …

Category: aviation Tagged: aviation Fight dynamics rocket science

Read More

Differential thrust

Wed 27 July 2022
Differential thrust (Photo credit: GE)

Principle overview

The basic principle is to create a torque offsetting the total thrust. This offset is done by throttling thrust of multiple engines whose thrust is not align with gravity center.

On the drawing, the thrust is aligned with CG for the left engine …

Category: aviation Tagged: aviation Fight dynamics rocket science

Read More

Voting method reflections

Fri 28 January 2022
Transparent voting box (Photo credit: Wikipedia)

This article presents personal thoughts on voting methods. More specifically, it presents guarantees offer by the traditional non-electronic voting method and won’t elaborate on electronic voting.

Specifications

First, let’s define few specifications I’d like to develop.

  • any voter can understand how …

Category: reflections Tagged: geopolitics reflection vote

Read More

Mobile OS alternatives and european sovereignty

Tue 04 May 2021
Sailfish (not OS) (photo credit: wikipedia)

OS for mobile devices has evolved over the pasts years. Once, European actors (e.g. Nokia) were in lead position. Now it's hard to find non-android mobile OS. Each OS mobile comes with its software and hardware environements, and its geopolitical considerations.

Here is …

Category: tools Tagged: smartphone geopolitics

Read More
Page 1 of 12

Next »