Pattern Matching
Table of Contents
Published on 13 October 2024.
Last modified 13 October 2024.
Pattern matching is an advanced method of conditional control flow in higher
level languages. Pattern matching is superficially like an advanced
switch
statement
in the sense of C
. In C and C++, switches allow fallthrough - that is,
for a given
case
, by not including a
break
statement, that
case
is handled
by the code for the following
case
, up until the first
break
is encountered.
This feature has been (ab)used to create
Duff's device.
Pattern matching does
not allow fallthrough, but it can be used for destructuring, and to handle types
that are not integrals. In some languages, pattern matching is restricted to
constant cases, while more dynamic languages allow for cases that can only be
resolved at runtime.
int integralValue = foo(params); switch (integralValue) { case 0: // fallthrough case 1: printf("Value %d is less than 2\n", integralValue); break; case 42: puts("Value is special"); break; default: puts("Nothing to say about value"); break; }
1. A trivial example
Each language implements pattern matching according to its syntax and constraints. First, let's come up with a simple example to demonstrate the differences between switches, if-else, and pattern matching.
In C, this first example is most quickly expressed as an ordinary chain of if, else if, and else.
#include <stdio.h> int main(void) { int x = 42; if (x >= 0 && x < 10) { puts("x is decimal"); } else if (x == 42) { puts("x is special"); } else { puts("x is not special"); } return 0; }
Now let's compare that first to rust:
pub fn main() { let x = 42; match x { 0..10 => println!("x is decimal"), 42 => println!("x is special"), _ => println!("x is not special") } }
That's a lot more condensed, even excluding the extra formalities the C code must have. What about another language? Let's try it in Python, which has gained pattern matching since version 3.10.
x = 9 match x: case decimal if x in range(0, 10): print("x is decimal") case 42: print("x is special") case _: print("x is not special")
Not as concise as the rust syntax; one less line to express it than the C method.
For fun, let's add in an example with scheme ( guile flavor ). Of note, scheme's syntax does not include pattern matching, but is implemented with hygienic syntactic macros .
(use-modules (ice-9 match)) (let ((x 42)) (match x ((? (lambda (n) (<= 0 n 9))) (display "x is decimal\n")) (42 (display "x is special\n")) (_ (display "x is not special\n"))))
This is a trivial example, so there's not much meaningful difference between the expressions. It's time to kick it up a notch to start seeing the actual power of pattern matching.
2. Destructuring with pattern matching
Let's match on an object which has a timestamp, and a 3d vector describing its location.
#include <cstdio> struct Blip { double timestamp; double x; double y; double z; }; void describeBlip(Blip blip) { if (blip.timestamp < 0.0) { puts("Invalid timestamp for blip"); return; } auto xPos = blip.x >= 0; auto yPos = blip.y >= 0; auto zPos = blip.z >= 0; if (xPos && yPos && zPos) { puts("Octant 1"); } else if (!xPos && yPos && zPos) { puts("Octant 2"); } else if (!xPos && !yPos && zPos) { puts("Octant 3"); } else if (xPos && !yPos && zPos) { puts("Octant 4"); } else if (xPos && yPos && !zPos) { puts("Octant 5"); } else if (!xPos && yPos && !zPos) { puts("Octant 6"); } else if (!xPos && !yPos && !zPos) { puts("Octant 7"); } else { puts("Octant 8"); } } int main() { describeBlip(Blip{-1.0, 1.0, 1.0, 1.0}); // invalid describeBlip(Blip{1.0, 1.0, 1.0, 1.0}); // 1 describeBlip(Blip{1.0, -1.0, -1.0, 1.0}); // 3 describeBlip(Blip{1.0, 1.0, -1.0, -1.0}); // 8 return 0; }
A bit cumbersome, and longer in lines because of the convenience of xPos, yPos, and zPos. I wouldn't call this fun to read, but I think it's tractable.
struct Blip { timestamp: f64, x: f64, y: f64, z: f64, } fn describe_blip(blip: Blip) { match blip { Blip { timestamp: ts, .. } if ts < 0.0 => println!("Invalid"), Blip { timestamp: _, x: a, y: b, z: c } => println!("{}", match (a >= 0.0, b >= 0.0, c >= 0.0) { (true, true, true) => "Octant 1", (false, true, true) => "Octant 2", (false, false, true) => "Octant 3", (true, false, true) => "Octant 4", (true, true, false) => "Octant 5", (false, true, false) => "Octant 6", (false, false, false) => "Octant 7", _ => "Octant 8", }) } } pub fn main() { describe_blip(Blip{timestamp: -1.0, x: 1.0, y: 1.0, z: 1.0}); // inv describe_blip(Blip{timestamp: 1.0, x: 1.0, y: 1.0, z: 1.0}); // 1 describe_blip(Blip{timestamp: 1.0, x: -1.0, y: -1.0, z: 1.0}); // 3 describe_blip(Blip{timestamp: 1.0, x: 1.0, y: -1.0, z: -1.0}); // 8 }
I would call the rust version significantly more compact. Let's check in with python again.
def describe_blip(blip): match blip: case {"timestamp": ts} if ts < 0.0: print("Invalid") case {"x": x, "y": y, "z": z}: match (x >= 0.0, y >= 0.0, z >= 0.0): case (True, True, True): print("Octant 1") case (False, True, True): print("Octant 2") case (False, False, True): print("Octant 3") case (True, False, True): print("Octant 4") case (True, True, False): print("Octant 5") case (False, True, False): print("Octant 6") case (False, False, False): print("Octant 7") case _: print("Octant 8") describe_blip({"timestamp": -1.0, "x": 1.0, "y": 1.0, "z": 1.0}) # inv describe_blip({"timestamp": 1.0, "x": 1.0, "y": 1.0, "z": 1.0}) # 1 describe_blip({"timestamp": 1.0, "x": -1.0, "y": -1.0, "z": 1.0}) # 3 describe_blip({"timestamp": 1.0, "x": 1.0, "y": -1.0, "z": -1.0}) # 8
I can't say I'm a fan of the level of indentation that Python reaches here. I've worked in C and C++ codebases where the style guidelines similarly required indentation of case labels for switch statements, and I've always thought it led to excessive indentation. The linux kernel coding style happens to say the same, not that it is the sole source of what good style is.
(use-modules (ice-9 match)) (define (describe-blip blip) (match blip ((? (lambda (x) (< (cadr (assq 'timestamp x)) 0.0))) (display "Invalid\n")) ((ts ('x x) ('y y) ('z z)) (match (list (>= x 0.0) (>= y 0.0) (>= z 0.0)) ((#t #t #t) (display "Octant 1\n")) ((#f #t #t) (display "Octant 2\n")) ((#f #f #t) (display "Octant 3\n")) ((#t #f #t) (display "Octant 4\n")) ((#t #t #f) (display "Octant 5\n")) ((#f #t #f) (display "Octant 6\n")) ((#f #f #f) (display "Octant 7\n")) (_ (display "Octant 8\n")))))) (describe-blip '((timestamp -1.0) (x 0.0) (y 0.0) (z 0.0))) ; invalid (describe-blip '((timestamp 1.0) (x 1.0) (y 1.0) (z 1.0))) ; 1 (describe-blip '((timestamp 1.0) (x -1.0) (y -1.0) (z 1.0))) ; 3 (describe-blip '((timestamp 1.0) (x 1.0) (y -1.0) (z -1.0))) ; 8
The scheme version is about equivalent to the rust version.
3. Algebraic Data Types
Now let's consider a more advanced example where an algebraic data type is constructed.
First, the C++ version.
#include <cstdio> #include <string> #include <variant> #include <type_traits> // provides decay_t, is_same_v struct Point2d { double x; double y; }; using Adt = std::variant<Point2d, std::string, std::monostate>; void handle_adt(const Adt& adt) { std::visit([](const auto& arg) { using T = std::decay_t<decltype(arg)>; if constexpr (std::is_same_v<T, Point2d>) { if (arg.x == 0.0 && arg.y == 0) { puts("Origin"); } else { printf("Point: (%f, %f)\n", arg.x, arg.y); } } else if constexpr (std::is_same_v<T, std::string>) { printf("Message %s\n", arg.c_str()); } else if constexpr (std::is_same_v<T, std::monostate>) { puts("Nil"); } else { static_assert(false, "non-exhaustive visitor!"); } }, adt); } int main() { Adt adt{Point2d{0.0, 0.0}}; handle_adt(adt); adt = Point2d{42.0, 13.0}; handle_adt(adt); adt = std::string("secret message"); handle_adt(adt); adt = std::monostate{}; handle_adt(adt); return 0; }
This version uses a lot of stuff that people unfamiliar with C++ will have
questions about.
Adt
is an alias for
std::variant
templated on
Point2d,
std::string,
and
std::monostate.
std::monostate
is an "empty" type for
std::variant.
handle_adt
uses
std::visit
to handle
adt
depending on its inner type. It effectively generates 3 functions by using
Constexpr if
to only have code for each type that is being guarded.
Some people have criticized
the design of
std::visit,
calling it too
complicated. I think showing how this works in rust will illustrate their point.
struct Point2d { x: f64, y: f64 } enum Adt { Point(Point2d), Msg(String), Nil } fn handle_adt(adt: &Adt) { match adt { &Adt::Point(Point2d{x: 0.0, y: 0.0}) => println!("Point: Origin"), &Adt::Point(Point2d{x, y}) => println!("Point: ({}, {})", x, y), &Adt::Msg(ref msg) => println!("Message: {}", msg), &Adt::Nil => println!("Nil") } } pub fn main() { handle_adt(&Adt::Point(Point2d{x: 0.0, y: 0.0})); handle_adt(&Adt::Point(Point2d{x: 42.0, y: 13.0})); handle_adt(&Adt::Msg("secret message".to_string())); handle_adt(&Adt::Nil); }
I won't bother showing this example in either python or scheme - those languages are so dynamic that there is no need to specially handle, or even express the concept of, algebraic data types.
4. Conclusion
Pattern matching is more convenient for succinctly handling conditional cases
than if and select, and is very handy for destructuring data. Both of these use
cases may be waved away as syntactic sugar. However, their application in
algebraic data types enables the use of ADTs as a first class concept in
strongly typed static languages, increasing their dynamism without resorting to
using inheritance on pointers. The speed of handling a variant in C++ vs going
through a vtable is
not necessarily a selling point
for variants, but there may
be other benefits or design reasons to use a variant - improving cache
performance, not coupling objects through inheritance, working with native
machine types, and having static compile time checks are good reasons to
consider the use of
std::variant.
The inclusion of pattern matching in C++26,
at the time of writing this article, is dubious.