{ The Rithm Blog. }

Coming Up in Python: `match` March 03, 2021

The next version of Python, 3.10, is in late alpha and is expected to be released in the middle of 2021. It has a number of small changes to the language, but also one later new piece of syntax, the `match` keyword.

It's common (in any programming language) to check a variable against different conditions, acting on the first case that matches. Imagine code like this, which might be useful in a graphing application:

 

def find_point(point):
    """Given a tuple of (x, y), find that point on a plane."""

    if not isinstance(point, tuple) or len(point) != 2:
        print("not a point")

    elif point[0] == 0 and point[1] == 0:
        print("at origin")

    elif point[0] == 0:
        print(f"on y axis at {point[1]}")

    elif point[1] == 0:
        print(f"on x axis at {point[0]}")

    else:
        print("somewhere else")

 

We could use this like:

 

>>> find_point((0, 3))
on y axis at 3

 

This code is fairly straightforward, but it has a lot of repetition --- we keep having to examine the items in the tuple by lookup. We could restructure our code to get around this, but then we couldn't have this in a single self of `if/elif/else` blocks, since we'd have to check first if we were given a valid two-item tuple. It's also unfortunate that our *not-a-point* case happens at the beginning; it would be clearer to have that at the end, as a catch-all for anything not matched previously.

 

Python 3.10 introduces the new keyword `match`, which provides a feature that is a bit like `switch` (which is found in JavaScript and many other languages, but not in Python), but with additional features. Let's look at our code using `match`:

def find_point(point):
    """Given a tuple of (x, y), find that point on a plane."""

    match point:
        case (0, 0):
            print("at origin")

        case (0, y):
            print(f"on y axis at {y}")

        case (x, 0):
            print(f"on x axis at {x}")

        case (x, y):
            print("somewhere else")

        case _:
            print("not a point")

 

Our `match` clause matches against the `point` variable, and matches based on the "shape" of the case clauses: we can use literals to mean what they means. So `case (0, 0)` will match only a tuple with `(0, 0)`, or to partially match the pattern. Matching `case (0, y)` requires a two-item tuple with the first element being 0, and the second element can be anything, which will be captured in the new variable `y`.)

It is required that the default catch-all case must be listed --- it will be a syntax error if this is not done (this is a nice, protective feature, helping programmers write more careful code). The `case _` catches everything.

There's a good deal more here. From the Python Enhancement Proposal that launched this feature:

  • A literal pattern is useful to filter constant values in a structure. It  looks like a Python literal (including some values like True, False and None). It only matches objects equal to the literal, and never binds.
  • A capture pattern looks like x and is equivalent to an identical assignment target: it always matches and binds the variable with the given (simple) name.
  • The wildcard pattern is a single underscore: _. It always matches, but does not capture any variable (which prevents interference with other uses for _ and allows for some optimizations).
  • A constant value pattern works like the literal but for certain named constants. Note that it must be a qualified (dotted) name, given the possible ambiguity with a capture pattern. It looks like Color.RED and only matches values equal to the corresponding value. It never binds.
  • A sequence pattern looks like [a, *rest, b] and is similar to a list unpacking. An important difference is that the elements nested within it can be any kind of patterns, not just names or sequences. It matches only sequences of appropriate length, as long as all the sub-patterns also match.It makes all the bindings of its sub-patterns.
  •  A mapping pattern looks like {"user": u, "emails": [*es]}. It matches mappings with at least the set of provided keys, and if all the sub-patterns match their corresponding values. It binds whatever the sub-patterns bind while matching with the values corresponding to the keys. Adding **rest at the end of the pattern to capture extra items is allowed.
  • A class pattern is similar to the above but matches attributes instead of keys. It looks like datetime.date(year=y, day=d). It matches instances of the given type, having at least the specified attributes, as long as the attributes match with the corresponding sub-patterns. It binds whatever the sub-patterns bind when matching with the values of the given attributes. An optional protocol also allows matching positional arguments.
  • An OR pattern looks like [*x] | {"elems": [*x]}. It matches if any of its sub-patterns match. It uses the binding for the leftmost pattern that matched.
  • A walrus pattern looks like d := datetime(year=2020, month=m). It matches only if its sub-pattern also matches. It binds whatever the sub-pattern match does, and also binds the named variable to the entire object.

That's a lot of flexibility --- and should definitely make handling complex switch statements in Python a lot easier!

Python is generally conservative in adding new features to the language, to keep it a language that can be easily learned and used by non-experts as well as experts. However, the idea of pattern matching has been present in some other new languages, like Swift and Kotlin, so it's exciting to see this idea being ported in Python.

Written by Joel

Back to all posts