Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Foreword

Carefully we wipe them hour by hour, And let no dust alight.

— stuuupidcat, 2025-04-24, Beijing

Getting Started with RPL

Installation and Quick Start

  1. Clone the repository and enter the directory: git clone https://github.com/RPL-Toolchain/RPL.git && cd RPL

  2. Install RPL as a cargo subcommand: cargo install --path .

  3. Run RPL analysis on your Rust project:

    • RPL_PATS=/path/to/RPL/docs/patterns-pest cargo +nightly-2025-02-14 rpl (using built-in RPL pattern definitions based on inline MIR)
    • RUSTFLAGS="-Zinline-mir=false" RPL_PATS=/path/to/RPL/docs/patterns-pest cargo +nightly-2025-02-14 rpl (using built-in RPL pattern definitions based on MIR)

We have the following plans for easing the usage of RPL:

  • Integration of standard patterns: Provide a set of commonly used patterns as part of RPL’s standard library.
  • Configuration-based pattern management: Introduce a rpl.toml configuration file to manage RPL patterns in a structured and centralized manner.

Introduction To MIR

Before delving into the specifics of RPL's pattern modeling language, this chapter will first provide an introduction to Rust's Mid-level Intermediate Representation (MIR). A foundational understanding of MIR is essential, as RPL's entire approach to modeling function logic is directly based on its structure and semantics. Accordingly, the following sections will begin with an overview of MIR before proceeding to a detailed guide on the pattern modeling language itself.

Intermediate Representation

What is IR?

In the architecture of a modern compiler, the process of translating human-readable source code into machine-executable code is not a single, monolithic step. Instead, the compiler is typically divided into three main stages: the frontend, the middle-end, and the backend. The Intermediate Representation (IR) is the crucial data structure or code format that acts as the bridge between the frontend and the backend.

  1. Frontend: Parses the source code (like Rust, C++, or Swift), checks for syntax errors, and performs semantic analysis. It then translates the source code into an IR.

  2. Middle-end (Optimizer): Takes the IR from the frontend, performs a series of machine-independent optimizations to improve the code's performance and efficiency. The output of this stage is still the (now optimized) IR.

  3. Backend (Code Generator): Takes the optimized IR and translates it into machine code for a specific target architecture (like x86-64, ARM64, etc.).

The use of an IR provides a powerful abstraction that decouples the source language from the target machine. This means you can write frontends for many different languages and backends for many different architectures, and have them all work together through the common IR.

LLVM (Low Level Virtual Machine) is a collection of modular and reusable compiler and toolchain technologies. Its Intermediate Representation, LLVM IR, is one of its most influential components. It's a low-level, statically typed, and language-independent IR designed to be the target of a wide variety of frontends and the source for many backends.A key feature of LLVM IR is that it is in Static Single Assignment (SSA) form. In SSA form, every variable is assigned a value exactly once. If a variable's value needs to be updated (e.g., in a loop), a new variable is created instead. This makes many optimizations, like data-flow analysis, significantly simpler to implement.

The following example shows the LLVM IR and machine code of a simple C code:

C code:

int add(int a, int b) {
    int result = a + b;
    return result;
}

LLVM IR:

; Generated by Clang 17.0.0 (clang-1700.0.13.3)
; clang -S -emit-llvm add.c -o add.ll -O1
; ModuleID = 'add.c'
source_filename = "add.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "arm64-apple-macosx15.0.0"

; Function Attrs: mustprogress nofree norecurse nosync nounwind ssp willreturn memory(none) uwtable(sync)
define i32 @add(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 {
  %3 = add nsw i32 %1, %0
  ret i32 %3
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind ssp willreturn memory(none) uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+altnzcv,+bti,+ccdp,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 15, i32 4]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"Apple clang version 17.0.0 (clang-1700.0.13.3)"}

Machine code:

add:
    lea     eax, [rdi+rsi]
    ret

The LLVM IR represents a optimized version of a function named @add which computes the sum of two 32-bit integers. It directly takes the two input registers (%0 and %1), performs the addition with a single add instruction, and immediately returns the result (%3). The function attributes, such as memory(none) (indicating it doesn't read or write from memory) and willreturn, confirm that the compiler has aggressively optimized the code, eliminating all unnecessary memory operations.

Rust's Intermediate Representations

The Rust compiler transforms source code into machine code through a pipeline of intermediate representations (IRs). This journey from high-level to low-level code involves several key stages:

  • HIR (High-level IR)
  • THIR (Typed HIR)
  • MIR (Mid-level IR)
  • LLVM IR

To see this process in action, we'll examine the IRs generated for the following generic add function.

#![allow(unused)]
fn main() {
use std::ops::Add;

fn add<T: Add<Output = T>>(a: T, b: T) -> T {
    a + b
}
}

HIR - High-Level Intermediate Representation

The HIR is the primary IR used in most of rustc. It is a compiler-friendly representation of the abstract syntax tree (AST).

  • HIR is generated after parsing, macro expansion, and name resolution.

HIR is used for type checking.

#![allow(unused)]
fn main() {
// cargo rustc -- -Z unpretty=hir
#[prelude_import]
use std::prelude::rust_2024::*;
#[attr = MacroUse {arguments: UseAll}]
extern crate std;
use std::ops::Add;

fn add<T>(a: T, b: T) -> T where T: Add<Output = T> { a + b }
}

THIR - Typed High-Level Intermediate Representation

The THIR is a lowered version of the HIR where all the types have been filled in, which is possible after type checking has completed. But it has some other features that distinguish it from the HIR:

  • Like the MIR, the THIR only represents bodies, i.e. "executable code"; this includes function bodies, but also const initializers.
  • Each body of THIR is only stored temporarily and is dropped as soon as it's no longer needed, as opposed to being stored until the end of the compilation process.
  • Besides making the types of all nodes available, the THIR also has additional desugaring compared to the HIR. For example, automatic references and dereferences are made explicit, and method calls and overloaded operators are converted into plain function calls. Destruction scopes are also made explicit.
  • Statements, expressions, and match arms are stored separately.

THIR is used for MIR construction, exhaustiveness checking, and unsafety checking.

#![allow(unused)]
fn main() {
// cargo rustc -- -Z unpretty=thir-flat
DefId(0:4 ~ rust_playground[a18a]::add):
Thir {
    body_type: Fn(
        fn(T/#0, T/#0) -> T/#0,
    ),
    arms: [],
    blocks: [
        Block {
            targeted_by_break: false,
            region_scope: Node(5),
            span: src/lib.rs:3:45: 5:2 (#0),
            stmts: [],
            expr: Some(
                e6,
            ),
            safety_mode: Safe,
        },
    ],
    exprs: [
        Expr {
            kind: VarRef {
                id: LocalVarId(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).2),
                ),
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:6 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(7),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).7),
                ),
                value: e0,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:6 (#0),
        },
        Expr {
            kind: VarRef {
                id: LocalVarId(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).4),
                ),
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:9: 4:10 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(9),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).9),
                ),
                value: e2,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:9: 4:10 (#0),
        },
        Expr {
            kind: ZstLiteral {
                user_ty: None,
            },
            ty: FnDef(
                DefId(2:3797 ~ core[f655]::ops::arith::Add::add),
                [
                    T/#0,
                    T/#0,
                ],
            ),
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:10 (#0),
        },
        Expr {
            kind: Call {
                ty: FnDef(
                    DefId(2:3797 ~ core[f655]::ops::arith::Add::add),
                    [
                        T/#0,
                        T/#0,
                    ],
                ),
                fun: e4,
                args: [
                    e1,
                    e3,
                ],
                from_hir_call: false,
                fn_span: src/lib.rs:4:5: 4:10 (#0),
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:10 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(6),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).6),
                ),
                value: e5,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:10 (#0),
        },
        Expr {
            kind: Block {
                block: b0,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:3:45: 5:2 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(11),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).11),
                ),
                value: e7,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:3:45: 5:2 (#0),
        },
    ],
    stmts: [],
    params: [
        Param {
            pat: Some(
                Pat {
                    ty: T/#0,
                    span: src/lib.rs:3:28: 3:29 (#0),
                    kind: Binding {
                        name: "a",
                        mode: BindingMode(
                            No,
                            Not,
                        ),
                        var: LocalVarId(
                            HirId(DefId(0:4 ~ rust_playground[a18a]::add).2),
                        ),
                        ty: T/#0,
                        subpattern: None,
                        is_primary: true,
                    },
                },
            ),
            ty: T/#0,
            ty_span: Some(
                src/lib.rs:3:31: 3:32 (#0),
            ),
            self_kind: None,
            hir_id: Some(
                HirId(DefId(0:4 ~ rust_playground[a18a]::add).1),
            ),
        },
        Param {
            pat: Some(
                Pat {
                    ty: T/#0,
                    span: src/lib.rs:3:34: 3:35 (#0),
                    kind: Binding {
                        name: "b",
                        mode: BindingMode(
                            No,
                            Not,
                        ),
                        var: LocalVarId(
                            HirId(DefId(0:4 ~ rust_playground[a18a]::add).4),
                        ),
                        ty: T/#0,
                        subpattern: None,
                        is_primary: true,
                    },
                },
            ),
            ty: T/#0,
            ty_span: Some(
                src/lib.rs:3:37: 3:38 (#0),
            ),
            self_kind: None,
            hir_id: Some(
                HirId(DefId(0:4 ~ rust_playground[a18a]::add).3),
            ),
        },
    ],
}
}

MIR - Mid-Level Intermediate Representation

MIR is Rust's Mid-level Intermediate Representation. It is constructed from THIR.

Some of the key characteristics of MIR are:

  • It is based on a control-flow graph.
  • It does not have nested expressions.
  • All types in MIR are fully explicit.

MIR is used for certain flow-sensitive safety checks (borrow checker) , optimization, code generation.

#![allow(unused)]
fn main() {
// cargo rustc -- -Z unpretty=mir
fn add(_1: T, _2: T) -> T {
    debug a => _1;
    debug b => _2;
    let mut _0: T;

    bb0: {
        _0 = <T as Add>::add(move _1, move _2) -> [return: bb1, unwind continue];
    }

    bb1: {
        return;
    }
}
}

The Core Concepts of MIR

Control-Flow Graph

The foundational structure of MIR is the Control-Flow Graph. The CFG represents a program as a directed graph where nodes are basic blocks and edges represent the flow of control between them.

BasicBlock

A BasicBlock is a node in the CFG. It is a straight-line sequence of code with a single entry point and a single exit point, where no branches or jumps occur within the block. In Rust MIR, a basic block is defined as a sequence of zero or more Statements followed by exactly one Terminator. This structure guarantees that control flow can only enter at the beginning of the block and can only exit at the end, via the terminator. There are no branches or jumps in the middle of a basic block.

In current version of rustc (1.91.0-nightly), the data structure of a basic block is defined as follows:

#![allow(unused)]
fn main() {
#[non_exhaustive]
pub struct BasicBlockData<'tcx> {
    pub statements: Vec<Statement<'tcx>>,
    pub terminator: Option<Terminator<'tcx>>,
    pub is_cleanup: bool, // Whether this block is a cleanup block in an unwind path
}
}

Statement

A Statement represents an action that occurs within a basic block. Crucially, statements have a single, implicit successor: the next statement in the block or, if it is the last one, the block's terminator.

In current version of rustc (1.91.0-nightly), there are 14 kinds of statements.

#![allow(unused)]
fn main() {
pub enum StatementKind<'tcx> {
    Assign(Box<(Place<'tcx>, Rvalue<'tcx>)>),
    FakeRead(Box<(FakeReadCause, Place<'tcx>)>),
    SetDiscriminant {
        place: Box<Place<'tcx>>,
        variant_index: VariantIdx,
    },
    Deinit(Box<Place<'tcx>>),
    StorageLive(Local),
    StorageDead(Local),
    Retag(RetagKind, Box<Place<'tcx>>),
    PlaceMention(Box<Place<'tcx>>),
    AscribeUserType(Box<(Place<'tcx>, UserTypeProjection)>, Variance),
    Coverage(CoverageKind),
    Intrinsic(Box<NonDivergingIntrinsic<'tcx>>),
    ConstEvalCounter,
    Nop,
    BackwardIncompatibleDropHint {
        place: Box<Place<'tcx>>,
        reason: BackwardIncompatibleDropReason,
    },
}
}

In most cases, we only use Assign statements when modeling the logic of a function body.

Terminator

A Terminator is the final instruction in every basic block and is the sole mechanism for directing control flow between blocks. It explicitly defines the successor(s) to the current block. This is where all branching, function calls, returns, and panics are represented. A terminator can have zero successors (e.g., return), one successor (e.g., goto), or multiple successors (e.g., switchInt for an if or match, or Call which has a success path and a potential unwind path).

In current version of rustc (1.91.0-nightly), there are 10 kinds of terminators.

#![allow(unused)]
fn main() {
pub enum TerminatorKind<'tcx> {
    Goto {
        target: BasicBlock,
    },
    SwitchInt {
        discr: Operand<'tcx>,
        targets: SwitchTargets,
    },
    UnwindResume,
    UnwindTerminate(UnwindTerminateReason),
    Return,
    Unreachable,
    Drop {
        place: Place<'tcx>,
        target: BasicBlock,
        unwind: UnwindAction,
        replace: bool,
        drop: Option<BasicBlock>,
        async_fut: Option<Local>,
    },
    Call {
        func: Operand<'tcx>,
        args: Box<[Spanned<Operand<'tcx>>]>,
        destination: Place<'tcx>,
        target: Option<BasicBlock>,
        unwind: UnwindAction,
        call_source: CallSource,
        fn_span: Span,
    },
    TailCall {
        func: Operand<'tcx>,
        args: Box<[Spanned<Operand<'tcx>>]>,
        fn_span: Span,
    },
    Assert {
        cond: Operand<'tcx>,
        expected: bool,
        msg: Box<AssertMessage<'tcx>>,
        target: BasicBlock,
        unwind: UnwindAction,
    },
    Yield {
        value: Operand<'tcx>,
        resume: BasicBlock,
        resume_arg: Place<'tcx>,
        drop: Option<BasicBlock>,
    },
    CoroutineDrop,
    FalseEdge {
        real_target: BasicBlock,
        imaginary_target: BasicBlock,
    },
    FalseUnwind {
        real_target: BasicBlock,
        unwind: UnwindAction,
    },
    InlineAsm {
        asm_macro: InlineAsmMacro,
        template: &'tcx [InlineAsmTemplatePiece],
        operands: Box<[InlineAsmOperand<'tcx>]>,
        options: InlineAsmOptions,
        line_spans: &'tcx [Span],
        targets: Box<[BasicBlock]>,
        unwind: UnwindAction,
    },
}
}

In most cases, we only use Goto, SwitchInt, Return, Drop, and Call statements when modeling the logic of a function body.

Desugared Data Representation

In parallel with simplifying control flow, MIR also simplifies the representation of data and computations. Complex, nested expressions are broken down into a sequence of simple operations on a small set of data-related concepts.

Local

Memory locations allocated on the stack (conceptually, at least).

This includes function arguments, user-declared variables, and compiler-generated temporary variables used to hold intermediate results. In MIR, locals are not identified by name but by a simple index, such as \_0, \_1, \_2, and so on. The local \_0 is always reserved for the return value.

Place

Expressions that identify a location in memory. It is the MIR equivalent of an "l-value" in C.

  • The simplest Place is just a Local (e.g., _1).
  • More complex Places are formed by starting with a base Local and applying a sequence of projections, such as field access, array indexing, or pointer dereferencing. For example, _1.f represents the field f of the struct stored in local _1, and *_2 represents the memory location pointed to by the local _2.

In current version of rustc (1.91.0-nightly), the data structure of a place is defined as follows:

#![allow(unused)]
fn main() {
pub struct Place<'tcx> {
    pub local: Local,
    pub projection: &'tcx List<PlaceElem<'tcx>>,
}

pub enum PlaceElem<'tcx> {
    Deref,
    Field(FieldIdx, Ty<'tcx>),
    Index(Local),
    ConstantIndex {
        offset: u64,
        min_length: u64,
        from_end: bool,
    },
    Subslice {
        from: u64,
        to: u64,
        from_end: bool,
    },
    Downcast(Option<Symbol>, VariantIdx),
    OpaqueCast(Ty<'tcx>),
    UnwrapUnsafeBinder(Ty<'tcx>),
    Subtype(Ty<'tcx>),
}
}

Rvalue

Expressions that produce a value.

The "R" stands for the fact that these are the "right-hand side" of an assignment.

In current version of rustc (1.91.0-nightly), the data structure of an rvalue is defined as follows:

#![allow(unused)]
fn main() {
pub enum Rvalue<'tcx> {
    Use(Operand<'tcx>),
    Repeat(Operand<'tcx>, Const<'tcx>),
    Ref(Region<'tcx>, BorrowKind, Place<'tcx>),
    ThreadLocalRef(DefId),
    RawPtr(RawPtrKind, Place<'tcx>),
    Len(Place<'tcx>),
    Cast(CastKind, Operand<'tcx>, Ty<'tcx>),
    BinaryOp(BinOp, Box<(Operand<'tcx>, Operand<'tcx>)>),
    NullaryOp(NullOp<'tcx>, Ty<'tcx>),
    UnaryOp(UnOp, Operand<'tcx>),
    Discriminant(Place<'tcx>),
    Aggregate(Box<AggregateKind<'tcx>>, IndexVec<FieldIdx, Operand<'tcx>>),
    ShallowInitBox(Operand<'tcx>, Ty<'tcx>),
    CopyForDeref(Place<'tcx>),
    WrapUnsafeBinder(Operand<'tcx>, Ty<'tcx>),
}
}

Operand

The arguments to an Rvalue.

In current version of rustc (1.91.0-nightly), the data structure of an operand is defined as follows:

#![allow(unused)]
fn main() {
pub enum Operand<'tcx> {
    Copy(Place<'tcx>),
    Move(Place<'tcx>),
    Constant(Box<ConstOperand<'tcx>>),
}
}

rpl::dump_mir: A Gadget for MIR Dumping

To effectively model a code pattern, it is often necessary to first inspect the MIR of the target function. RPL provides the #[rpl::dump_mir] procedural macro to simplify this process. This macro instructs the customized compiler to output the MIR of a target function, providing clear intermediate artifacts that can be used to model/define a new RPL pattern.

Usage

Using the macro involves a simple two-step process.

(1) Annotate the Target Function: First, apply the #[rpl::dump_mir] attribute directly to the function whose MIR you wish to inspect. The macro accepts optional arguments, such as dump_cfg and dump_ddg, to also generate graph visualizations.

#![allow(unused)]
fn main() {
use std::mem;

#[rpl::dump_mir(dump_cfg, dump_ddg)]
pub unsafe fn get_data<T: ?Sized>(val: *const T) -> *const () {
    unsafe { *mem::transmute::<*const *const T, *const *const ()>(&val) }
}
}

(2) Run the RPL Linter: Next, run cargo rpl from your project's root directory. For this debugging macro to work, the RPL_PATS environment variable must be set, but it can point to an empty directory if you are only dumping MIR.

RPL_PATS=/path/to/any/pattern/dir cargo rpl

Output

The macro produces output in two places: directly in the console and in a new directory in your project root.

Console Output

The compiler will print the formatted MIR and other diagnostic information directly to your terminal. This includes a breakdown of the function's local variables and the statements within each basic block.

note: MIR of `get_data`
 --> src/main.rs:4:1
  |
3 |   #[rpl::dump_mir(dump_cfg, dump_ddg)]
  |   ------------------------------------ MIR dumped because of this attribute
...
note: bb0: {
        _4 = &_1;                       // scope[0]
        _3 = &raw const (*_4);          // scope[0]
        _2 = move _3 as *const *const () (Transmute); // scope[0]
        _0 = copy (*_2);                // scope[0]
        return;                         // scope[0]
    }
...
error: abort due to debugging
 --> src/main.rs:3:1
  |
3 | #[rpl::dump_mir(dump_cfg, dump_ddg)]
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: remove this attribute

Note that the process intentionally ends with an error: abort due to debugging. This is a feature designed to remind you to remove the debugging attribute before committing your code.

Generated Artifacts

The macro will also create a mir_dump directory in your project root containing several files. These artifacts provide a more detailed and persistent record of the function's structure.

mir_dump/
├── rust_playground.get_data.-------.dump_mir..mir
├── rust_playground.get_data.-------.dump_mir..mir.cfg.dot
└── rust_playground.get_data.-------.dump_mir..mir.ddg.dot
  • .mir file: The raw, textual representation of the function's MIR.

  • .cfg.dot file: A representation of the Control-Flow Graph (CFG) in DOT format, which can be visualized with tools like Graphviz.

  • .ddg.dot file: A representation of the Data-Dependence Graph (DDG) in DOT format.

The following image shows the CFG and DDG of the get_data function.

CFG of get_data function

DDG of get_data function

Modeling a RPL Pattern from MIR

The primary use of the dumped MIR is to serve as a template for a new RPL pattern. The process involves "hollowing out" the concrete MIR by replacing its specific local variables with abstract metavariables.

For example, consider the dumped MIR above:

#![allow(unused)]
fn main() {
// Dumped MIR from console
bb0: {
    _4: &*const T = &_1;
    _3: *const *const T = &raw const (*_4);
    _2: *const *const () = move _3 as *const *const () (Transmute);
    _0: *const () = copy (*_2);
    return;
}
}

Converting the raw MIR dump into a functional RPL pattern is a systematic process. The key steps are to add the necessary syntactic structure and then abstract the logic with metavariables.

  1. Add let bindings: Convert each raw MIR assignment (e.g., _4 = &_1;) into a full let statement.
  2. Annotate types: Use the list of locals provided in the MIR dump to add explicit type annotations for each local variable.
  3. Abstract the pattern with metavariables. This is where you "hollow out" the concrete MIR to make it a general template. You replace the specific, compiler-generated names for locals and types with descriptive, abstract metavariables (prefixed with $). For example, the concrete local _1 of type *const T becomes the abstract statement let $ptr: *const $T = _;. By applying this process to all statements, you transform the specific MIR dump into a reusable pattern.
#![allow(unused)]
fn main() {
let $ptr: *const $T = _; // _1
let $ref_to_ptr: &*const $T = &$ptr; // _4
let $ptr_to_ptr_t: *const *const $T = &raw const (*$ref_to_ptr); // _3
let $ptr_to_ptr: *const *const () = move $ptr_to_ptr_t as *const *const () (Transmute); // _2
let $data_ptr: *const () = copy (*$ptr_to_ptr); // _0
}

Architecture of RPL

This chapter provides an overview of the internal architecture of RPL. It first introduces the algorithmic workflow. Following this, it presents the implementation architecture, detailing how the system is structured into components.

Workflow

Algorithmic Workflow Architecture

The workflow of RPL is shown in the figure above, which conceptually comprises four main activities: (1) pattern modeling, (2) pattern analysis, (3) target-code analysis, and (4) instance detection.

The process begins when a user models a set of patterns with the pattern modeling DSL, thereby supplying the inputs for subsequent analysis. During pattern analysis, each pattern is parsed into an Abstract Syntax Tree (AST) and then processed into its core components: a symbol table, a structural representation, and a set of semantic constraints. The symbol tables are used to check the internal errors, providing a feedback loop for the developer to revise the patterns. After validation, the structural representation is next transformed into the pattern graph representation. Target-code analysis compiles the Rust repository and retrieves the compiler’s intermediate representations. These artifacts are converted into the target graph representation.

Once the graph representations for both the pattern and the target code are ready, the analysis proceeds to a three-stage matching, filtering, and reporting process. First, the detection engine uses a graph matching algorithm to find all target code segments that structurally match the pattern's graph. Next, these candidate matches pass through a filtering stage where semantic constraints declared in the pattern are applied. Only the matches that satisfy both the structural and semantic conditions are considered valid findings. Finally, these validated results are processed into compiler-friendly diagnostic messages to report the potential issues in the target code to the developer.

Implementation Architecture

Algorithmic Workflow Architecture