Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Intermediate Representation

What is IR?

In the architecture of a modern compiler, the process of translating human-readable source code into machine-executable code is not a single, monolithic step. Instead, the compiler is typically divided into three main stages: the frontend, the middle-end, and the backend. The Intermediate Representation (IR) is the crucial data structure or code format that acts as the bridge between the frontend and the backend.

  1. Frontend: Parses the source code (like Rust, C++, or Swift), checks for syntax errors, and performs semantic analysis. It then translates the source code into an IR.

  2. Middle-end (Optimizer): Takes the IR from the frontend, performs a series of machine-independent optimizations to improve the code's performance and efficiency. The output of this stage is still the (now optimized) IR.

  3. Backend (Code Generator): Takes the optimized IR and translates it into machine code for a specific target architecture (like x86-64, ARM64, etc.).

The use of an IR provides a powerful abstraction that decouples the source language from the target machine. This means you can write frontends for many different languages and backends for many different architectures, and have them all work together through the common IR.

LLVM (Low Level Virtual Machine) is a collection of modular and reusable compiler and toolchain technologies. Its Intermediate Representation, LLVM IR, is one of its most influential components. It's a low-level, statically typed, and language-independent IR designed to be the target of a wide variety of frontends and the source for many backends.A key feature of LLVM IR is that it is in Static Single Assignment (SSA) form. In SSA form, every variable is assigned a value exactly once. If a variable's value needs to be updated (e.g., in a loop), a new variable is created instead. This makes many optimizations, like data-flow analysis, significantly simpler to implement.

The following example shows the LLVM IR and machine code of a simple C code:

C code:

int add(int a, int b) {
    int result = a + b;
    return result;
}

LLVM IR:

; Generated by Clang 17.0.0 (clang-1700.0.13.3)
; clang -S -emit-llvm add.c -o add.ll -O1
; ModuleID = 'add.c'
source_filename = "add.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "arm64-apple-macosx15.0.0"

; Function Attrs: mustprogress nofree norecurse nosync nounwind ssp willreturn memory(none) uwtable(sync)
define i32 @add(i32 noundef %0, i32 noundef %1) local_unnamed_addr #0 {
  %3 = add nsw i32 %1, %0
  ret i32 %3
}

attributes #0 = { mustprogress nofree norecurse nosync nounwind ssp willreturn memory(none) uwtable(sync) "frame-pointer"="non-leaf" "no-trapping-math"="true" "probe-stack"="__chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="apple-m1" "target-features"="+aes,+altnzcv,+bti,+ccdp,+ccidx,+complxnum,+crc,+dit,+dotprod,+flagm,+fp-armv8,+fp16fml,+fptoint,+fullfp16,+jsconv,+lse,+neon,+pauth,+perfmon,+predres,+ras,+rcpc,+rdm,+sb,+sha2,+sha3,+specrestrict,+ssbs,+v8.1a,+v8.2a,+v8.3a,+v8.4a,+v8.5a,+v8a,+zcm,+zcz" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 2, !"SDK Version", [2 x i32] [i32 15, i32 4]}
!1 = !{i32 1, !"wchar_size", i32 4}
!2 = !{i32 8, !"PIC Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = !{i32 7, !"frame-pointer", i32 1}
!5 = !{!"Apple clang version 17.0.0 (clang-1700.0.13.3)"}

Machine code:

add:
    lea     eax, [rdi+rsi]
    ret

The LLVM IR represents a optimized version of a function named @add which computes the sum of two 32-bit integers. It directly takes the two input registers (%0 and %1), performs the addition with a single add instruction, and immediately returns the result (%3). The function attributes, such as memory(none) (indicating it doesn't read or write from memory) and willreturn, confirm that the compiler has aggressively optimized the code, eliminating all unnecessary memory operations.

Rust's Intermediate Representations

The Rust compiler transforms source code into machine code through a pipeline of intermediate representations (IRs). This journey from high-level to low-level code involves several key stages:

  • HIR (High-level IR)
  • THIR (Typed HIR)
  • MIR (Mid-level IR)
  • LLVM IR

To see this process in action, we'll examine the IRs generated for the following generic add function.

#![allow(unused)]
fn main() {
use std::ops::Add;

fn add<T: Add<Output = T>>(a: T, b: T) -> T {
    a + b
}
}

HIR - High-Level Intermediate Representation

The HIR is the primary IR used in most of rustc. It is a compiler-friendly representation of the abstract syntax tree (AST).

  • HIR is generated after parsing, macro expansion, and name resolution.

HIR is used for type checking.

#![allow(unused)]
fn main() {
// cargo rustc -- -Z unpretty=hir
#[prelude_import]
use std::prelude::rust_2024::*;
#[attr = MacroUse {arguments: UseAll}]
extern crate std;
use std::ops::Add;

fn add<T>(a: T, b: T) -> T where T: Add<Output = T> { a + b }
}

THIR - Typed High-Level Intermediate Representation

The THIR is a lowered version of the HIR where all the types have been filled in, which is possible after type checking has completed. But it has some other features that distinguish it from the HIR:

  • Like the MIR, the THIR only represents bodies, i.e. "executable code"; this includes function bodies, but also const initializers.
  • Each body of THIR is only stored temporarily and is dropped as soon as it's no longer needed, as opposed to being stored until the end of the compilation process.
  • Besides making the types of all nodes available, the THIR also has additional desugaring compared to the HIR. For example, automatic references and dereferences are made explicit, and method calls and overloaded operators are converted into plain function calls. Destruction scopes are also made explicit.
  • Statements, expressions, and match arms are stored separately.

THIR is used for MIR construction, exhaustiveness checking, and unsafety checking.

#![allow(unused)]
fn main() {
// cargo rustc -- -Z unpretty=thir-flat
DefId(0:4 ~ rust_playground[a18a]::add):
Thir {
    body_type: Fn(
        fn(T/#0, T/#0) -> T/#0,
    ),
    arms: [],
    blocks: [
        Block {
            targeted_by_break: false,
            region_scope: Node(5),
            span: src/lib.rs:3:45: 5:2 (#0),
            stmts: [],
            expr: Some(
                e6,
            ),
            safety_mode: Safe,
        },
    ],
    exprs: [
        Expr {
            kind: VarRef {
                id: LocalVarId(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).2),
                ),
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:6 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(7),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).7),
                ),
                value: e0,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:6 (#0),
        },
        Expr {
            kind: VarRef {
                id: LocalVarId(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).4),
                ),
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:9: 4:10 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(9),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).9),
                ),
                value: e2,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:9: 4:10 (#0),
        },
        Expr {
            kind: ZstLiteral {
                user_ty: None,
            },
            ty: FnDef(
                DefId(2:3797 ~ core[f655]::ops::arith::Add::add),
                [
                    T/#0,
                    T/#0,
                ],
            ),
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:10 (#0),
        },
        Expr {
            kind: Call {
                ty: FnDef(
                    DefId(2:3797 ~ core[f655]::ops::arith::Add::add),
                    [
                        T/#0,
                        T/#0,
                    ],
                ),
                fun: e4,
                args: [
                    e1,
                    e3,
                ],
                from_hir_call: false,
                fn_span: src/lib.rs:4:5: 4:10 (#0),
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:10 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(6),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).6),
                ),
                value: e5,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:4:5: 4:10 (#0),
        },
        Expr {
            kind: Block {
                block: b0,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:3:45: 5:2 (#0),
        },
        Expr {
            kind: Scope {
                region_scope: Node(11),
                lint_level: Explicit(
                    HirId(DefId(0:4 ~ rust_playground[a18a]::add).11),
                ),
                value: e7,
            },
            ty: T/#0,
            temp_lifetime: TempLifetime {
                temp_lifetime: Some(
                    Node(11),
                ),
                backwards_incompatible: None,
            },
            span: src/lib.rs:3:45: 5:2 (#0),
        },
    ],
    stmts: [],
    params: [
        Param {
            pat: Some(
                Pat {
                    ty: T/#0,
                    span: src/lib.rs:3:28: 3:29 (#0),
                    kind: Binding {
                        name: "a",
                        mode: BindingMode(
                            No,
                            Not,
                        ),
                        var: LocalVarId(
                            HirId(DefId(0:4 ~ rust_playground[a18a]::add).2),
                        ),
                        ty: T/#0,
                        subpattern: None,
                        is_primary: true,
                    },
                },
            ),
            ty: T/#0,
            ty_span: Some(
                src/lib.rs:3:31: 3:32 (#0),
            ),
            self_kind: None,
            hir_id: Some(
                HirId(DefId(0:4 ~ rust_playground[a18a]::add).1),
            ),
        },
        Param {
            pat: Some(
                Pat {
                    ty: T/#0,
                    span: src/lib.rs:3:34: 3:35 (#0),
                    kind: Binding {
                        name: "b",
                        mode: BindingMode(
                            No,
                            Not,
                        ),
                        var: LocalVarId(
                            HirId(DefId(0:4 ~ rust_playground[a18a]::add).4),
                        ),
                        ty: T/#0,
                        subpattern: None,
                        is_primary: true,
                    },
                },
            ),
            ty: T/#0,
            ty_span: Some(
                src/lib.rs:3:37: 3:38 (#0),
            ),
            self_kind: None,
            hir_id: Some(
                HirId(DefId(0:4 ~ rust_playground[a18a]::add).3),
            ),
        },
    ],
}
}

MIR - Mid-Level Intermediate Representation

MIR is Rust's Mid-level Intermediate Representation. It is constructed from THIR.

Some of the key characteristics of MIR are:

  • It is based on a control-flow graph.
  • It does not have nested expressions.
  • All types in MIR are fully explicit.

MIR is used for certain flow-sensitive safety checks (borrow checker) , optimization, code generation.

#![allow(unused)]
fn main() {
// cargo rustc -- -Z unpretty=mir
fn add(_1: T, _2: T) -> T {
    debug a => _1;
    debug b => _2;
    let mut _0: T;

    bb0: {
        _0 = <T as Add>::add(move _1, move _2) -> [return: bb1, unwind continue];
    }

    bb1: {
        return;
    }
}
}