Pattern Matching

Table of Contents

Published on 13 October 2024.

Last modified 13 October 2024.

Pattern matching is an advanced method of conditional control flow in higher level languages. Pattern matching is superficially like an advanced switch statement in the sense of C . In C and C++, switches allow fallthrough - that is, for a given case , by not including a break statement, that case is handled by the code for the following case , up until the first break is encountered. This feature has been (ab)used to create Duff's device. Pattern matching does not allow fallthrough, but it can be used for destructuring, and to handle types that are not integrals. In some languages, pattern matching is restricted to constant cases, while more dynamic languages allow for cases that can only be resolved at runtime.

int integralValue = foo(params);
switch (integralValue) {
case 0: // fallthrough
case 1:
    printf("Value %d is less than 2\n", integralValue);
    break;
case 42:
    puts("Value is special");
    break;
default:
    puts("Nothing to say about value");
    break;
}

1. A trivial example

Each language implements pattern matching according to its syntax and constraints. First, let's come up with a simple example to demonstrate the differences between switches, if-else, and pattern matching.

In C, this first example is most quickly expressed as an ordinary chain of if, else if, and else.

#include <stdio.h>
int main(void)
{
    int x = 42;
    if (x >= 0 && x < 10) {
        puts("x is decimal");
    } else if (x == 42) {
        puts("x is special");
    } else {
        puts("x is not special");
    }
    return 0;
}

Now let's compare that first to rust:

pub fn main() {
    let x = 42;
    match x {
        0..10 => println!("x is decimal"),
        42 => println!("x is special"),
        _ => println!("x is not special")
    }
}

That's a lot more condensed, even excluding the extra formalities the C code must have. What about another language? Let's try it in Python, which has gained pattern matching since version 3.10.

x = 9
match x:
    case decimal if x in range(0, 10):
        print("x is decimal")
    case 42:
        print("x is special")
    case _:
        print("x is not special")

Not as concise as the rust syntax; one less line to express it than the C method.

For fun, let's add in an example with scheme ( guile flavor ). Of note, scheme's syntax does not include pattern matching, but is implemented with hygienic syntactic macros .

(use-modules (ice-9 match))
(let ((x 42))
  (match x
    ((? (lambda (n) (<= 0 n 9))) (display "x is decimal\n"))
    (42 (display "x is special\n"))
    (_ (display "x is not special\n"))))

This is a trivial example, so there's not much meaningful difference between the expressions. It's time to kick it up a notch to start seeing the actual power of pattern matching.

2. Destructuring with pattern matching

Let's match on an object which has a timestamp, and a 3d vector describing its location.

#include <cstdio>

struct Blip {
    double timestamp;
    double x;
    double y;
    double z;
};

void describeBlip(Blip blip) {
    if (blip.timestamp < 0.0) {
        puts("Invalid timestamp for blip");
        return;
    }

    auto xPos = blip.x >= 0;
    auto yPos = blip.y >= 0;
    auto zPos = blip.z >= 0;

    if (xPos && yPos && zPos) {
        puts("Octant 1");
    } else if (!xPos && yPos && zPos) {
        puts("Octant 2");
    } else if (!xPos && !yPos && zPos) {
        puts("Octant 3");
    } else if (xPos && !yPos && zPos) {
        puts("Octant 4");
    } else if (xPos && yPos && !zPos) {
        puts("Octant 5");
    } else if (!xPos && yPos && !zPos) {
        puts("Octant 6");
    } else if (!xPos && !yPos && !zPos) {
        puts("Octant 7");
    } else {
        puts("Octant 8");
    }
}

int main()
{
    describeBlip(Blip{-1.0, 1.0, 1.0, 1.0}); // invalid
    describeBlip(Blip{1.0, 1.0, 1.0, 1.0}); // 1
    describeBlip(Blip{1.0, -1.0, -1.0, 1.0}); // 3
    describeBlip(Blip{1.0, 1.0, -1.0, -1.0}); // 8
    return 0;
}

A bit cumbersome, and longer in lines because of the convenience of xPos, yPos, and zPos. I wouldn't call this fun to read, but I think it's tractable.

struct Blip {
    timestamp: f64,
    x: f64,
    y: f64,
    z: f64,
}

fn describe_blip(blip: Blip) {
    match blip {
        Blip { timestamp: ts, .. } if ts < 0.0 => println!("Invalid"),
        Blip { timestamp: _, x: a, y: b, z: c } =>
            println!("{}", match (a >= 0.0, b >= 0.0, c >= 0.0) {
                (true,   true,  true) => "Octant 1",
                (false,  true,  true) => "Octant 2",
                (false, false,  true) => "Octant 3",
                (true,  false,  true) => "Octant 4",
                (true,   true, false) => "Octant 5",
                (false,  true, false) => "Octant 6",
                (false, false, false) => "Octant 7",
                _ => "Octant 8",
            })
    }
}

pub fn main() {
    describe_blip(Blip{timestamp: -1.0,  x:  1.0, y:  1.0, z:  1.0}); // inv
    describe_blip(Blip{timestamp:  1.0,  x:  1.0, y:  1.0, z:  1.0}); // 1
    describe_blip(Blip{timestamp:  1.0,  x: -1.0, y: -1.0, z:  1.0}); // 3
    describe_blip(Blip{timestamp:  1.0,  x:  1.0, y: -1.0, z: -1.0}); // 8
}

I would call the rust version significantly more compact. Let's check in with python again.

def describe_blip(blip):
    match blip:
        case {"timestamp": ts} if ts < 0.0:
            print("Invalid")
        case {"x": x, "y": y, "z": z}:
            match (x >= 0.0, y >= 0.0, z >= 0.0):
                case (True, True, True): print("Octant 1")
                case (False, True, True): print("Octant 2")
                case (False, False, True): print("Octant 3")
                case (True, False, True): print("Octant 4")
                case (True, True, False): print("Octant 5")
                case (False, True, False): print("Octant 6")
                case (False, False, False): print("Octant 7")
                case _: print("Octant 8")


describe_blip({"timestamp": -1.0, "x": 1.0, "y": 1.0, "z": 1.0})  # inv
describe_blip({"timestamp": 1.0, "x": 1.0, "y": 1.0, "z": 1.0})  # 1
describe_blip({"timestamp": 1.0, "x": -1.0, "y": -1.0, "z": 1.0})  # 3
describe_blip({"timestamp": 1.0, "x": 1.0, "y": -1.0, "z": -1.0})  # 8

I can't say I'm a fan of the level of indentation that Python reaches here. I've worked in C and C++ codebases where the style guidelines similarly required indentation of case labels for switch statements, and I've always thought it led to excessive indentation. The linux kernel coding style happens to say the same, not that it is the sole source of what good style is.

(use-modules (ice-9 match))
(define (describe-blip blip)
  (match blip
    ((? (lambda (x) (< (cadr (assq 'timestamp x)) 0.0))) (display "Invalid\n"))
    ((ts ('x x) ('y y) ('z z))
     (match (list (>= x 0.0) (>= y 0.0) (>= z 0.0))
       ((#t #t #t) (display "Octant 1\n"))
       ((#f #t #t) (display "Octant 2\n"))
       ((#f #f #t) (display "Octant 3\n"))
       ((#t #f #t) (display "Octant 4\n"))
       ((#t #t #f) (display "Octant 5\n"))
       ((#f #t #f) (display "Octant 6\n"))
       ((#f #f #f) (display "Octant 7\n"))
       (_ (display "Octant 8\n"))))))

(describe-blip '((timestamp -1.0) (x  0.0) (y  0.0) (z  0.0))) ; invalid
(describe-blip '((timestamp  1.0) (x  1.0) (y  1.0) (z  1.0))) ; 1
(describe-blip '((timestamp  1.0) (x -1.0) (y -1.0) (z  1.0))) ; 3
(describe-blip '((timestamp  1.0) (x  1.0) (y -1.0) (z -1.0))) ; 8

The scheme version is about equivalent to the rust version.

3. Algebraic Data Types

Now let's consider a more advanced example where an algebraic data type is constructed.

First, the C++ version.

#include <cstdio>
#include <string>
#include <variant>
#include <type_traits> // provides decay_t, is_same_v

struct Point2d {
    double x;
    double y;
};

using Adt = std::variant<Point2d, std::string, std::monostate>;

void handle_adt(const Adt& adt) {
    std::visit([](const auto& arg) {
        using T = std::decay_t<decltype(arg)>;
        if constexpr (std::is_same_v<T, Point2d>) {
            if (arg.x == 0.0 && arg.y == 0) {
                puts("Origin");
            } else {
                printf("Point: (%f, %f)\n", arg.x, arg.y);
            }
        } else if constexpr (std::is_same_v<T, std::string>) {
            printf("Message %s\n", arg.c_str());
        } else if constexpr (std::is_same_v<T, std::monostate>) {
            puts("Nil");
        } else {
            static_assert(false, "non-exhaustive visitor!");
        }
    }, adt);
}

int main() {
    Adt adt{Point2d{0.0, 0.0}};
    handle_adt(adt);
    adt = Point2d{42.0, 13.0};
    handle_adt(adt);
    adt = std::string("secret message");
    handle_adt(adt);
    adt = std::monostate{};
    handle_adt(adt);
    return 0;
}

This version uses a lot of stuff that people unfamiliar with C++ will have questions about. Adt is an alias for std::variant templated on Point2d, std::string, and std::monostate. std::monostate is an "empty" type for std::variant. handle_adt uses std::visit to handle adt depending on its inner type. It effectively generates 3 functions by using Constexpr if to only have code for each type that is being guarded.

Some people have criticized the design of std::visit, calling it too complicated. I think showing how this works in rust will illustrate their point.

struct Point2d {
    x: f64,
    y: f64
}

enum Adt {
    Point(Point2d),
    Msg(String),
    Nil
}

fn handle_adt(adt: &Adt)
{
    match adt {
        &Adt::Point(Point2d{x: 0.0, y: 0.0}) => println!("Point: Origin"),
        &Adt::Point(Point2d{x, y}) => println!("Point: ({}, {})", x, y),
        &Adt::Msg(ref msg) => println!("Message: {}", msg),
        &Adt::Nil => println!("Nil")
    }
}

pub fn main() {
    handle_adt(&Adt::Point(Point2d{x: 0.0, y: 0.0}));
    handle_adt(&Adt::Point(Point2d{x: 42.0, y: 13.0}));
    handle_adt(&Adt::Msg("secret message".to_string()));
    handle_adt(&Adt::Nil);
}

I won't bother showing this example in either python or scheme - those languages are so dynamic that there is no need to specially handle, or even express the concept of, algebraic data types.

4. Conclusion

Pattern matching is more convenient for succinctly handling conditional cases than if and select, and is very handy for destructuring data. Both of these use cases may be waved away as syntactic sugar. However, their application in algebraic data types enables the use of ADTs as a first class concept in strongly typed static languages, increasing their dynamism without resorting to using inheritance on pointers. The speed of handling a variant in C++ vs going through a vtable is not necessarily a selling point for variants, but there may be other benefits or design reasons to use a variant - improving cache performance, not coupling objects through inheritance, working with native machine types, and having static compile time checks are good reasons to consider the use of std::variant. The inclusion of pattern matching in C++26, at the time of writing this article, is dubious.