What’s New in Python 3.7: Data Classes

In June, the Python Software Foundation released the newest version of Python,3.7. In this post, we'll look at one of the nicest new features, "data classes".

Still On Python 2?

It's possible that you may be used to working in an older fork of Python,Python 2, particularly if you've been reading older books, working on legacy software, or following outdated curricula. (If you're not certain, you can check the version number of Python when you start it up.)

Years ago, Python split into two projects: one continuing the current language (Python 2) and the other creating a more modern version, Python 3. The differences between these two included many parts around string/Unicode handling, printing, and lasy evaluation, and are covered well elsewhere. Python 2 has continued to be developed, but in maintenance mode: no new features or syntax have been added to the language in years, and only bug fixes are contributed to it. The PSF announced years ago that Python 2.7 would be the last release in the Python 2 language, and it has a maintenance end of life in 2020 — after that, even bugs in it will no longer be fixed.

The upshot of this is that Python 3 is far more innovative, and includes many new features not in Python 2.

As 2020 approaches, it becomes even more critical to move to Python 3 — and new features like those in 3.7 can make these even more enjoyable than before.

Data Classes

The Challenge

It’s common when creating classes in Python (or any OO language) to have a fair amount of boilerplate code. For example, imagine an Employee class:

class Employee:
    """Employee of our company."""

However, we want to be able to instantiate employees by providing attributes, so we need to write an init method:

class Employee:  # ...
    def __init__(self, fname, lname, ssn):
        self.fname = fname
        self.lname = lname
        self.ssn = ssn

Now we can do this:

>>> joel = Employee(fname="Joel", lname="Burton", ssn="123-45-6789")

However, we’re still not done: if we print out one of those employees, we get a very unhelpful, generic representation, like <Employee object at 0x1234>. This makes it tedious to debug programs, as we have to constantly print out other information to figure out which employee this is.

Therefore, it’s common to add a __repr__ method; this is called when you evaluate an object (like in a print statement or a debugging session). A common __repr__ method for a class like this would be:

class Employee:  # ...
    def __repr__(self):
        return (f'Employee(fname="{self.fname}", lname="{self.lname}",'
                f' ssn="{self.ssn}")')

Now, we we look at an employee, we’ll see something nicer:

>>> joel
Employee(fname="Joel", lname="Burton", ssn="123-45-6789")

There’s another thing to think about: if we make the same employee twice, Python won’t consider them the same person. By default, object comparison in Python is by “identity” — essentially, is this the same exact object in memory.

We can see this with:

>>> e1 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789")
>>> e2 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789")

>>> e1 is e2   # check object identity
False

>>> e1 == e2
False

That’s definitely the right answer for is: they’re not the exact same object in memory, so this should be false. However, in many applications, we’d want them to be considered equal to each other.

To do this, we’d need to implement anothe special method, __eq__:

class Employee:  # ...
    def __eq__(self, other):
        return (self.fname == other.fname and
                self.lname == other.lname and
                self.ssn == other.ssn)

And now they will compare equally:

>>> e1 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789")
>>> e2 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789")

>>> e1 is e2   # check object identity, we still want this false
False

>>> e1 == e2
True

Introducing: Dataclasses

Python 3.7 introduces a new feature, “data classes”, which make this much easier.

To use, you decorate a class with the dataclass decorator from the dataclasses library. Then, you list your class attributes, along with their types, in the body of the class itself:

from dataclasses import dataclass

@dataclass
class EmployeeDC:
    fname: str
    lname: str
    ssn: str

Now, we get all of the behavior above:

>>> e1 = EmployeeDC(fname="Joel", lname="Burton", ssn="123-45-6789")

>>> e1
EmployeeDC(fname='Joel', lname='Burton', ssn='123-45-6789')

>>> e2 = EmployeeDC(fname="Joel", lname="Burton", ssn="123-45-6789")

>>> e1 is e2   # check object identity, we still want this false
False

>>> e1 == e2
True

There’s More

That’s a basic example of dataclasses — however, there are many more features that can provide more flexibility:

you can decide which fields count for comparison (perhaps two employees should be considered the same if their SSN attributes are the same, but their names could be different0
you can make instances “frozen” (attributes cannot be changed)
frozen classes can be hashed; this would allow them to be keys in a dictionary or added to a set (normally instances of objects cannot be used as keys unless you implement the special method hash)
default values for attributes, including dynamic of mutable values
you can add a post_init method can do extra work on instantiation

To read about these features, see A Brief Tour of Python 3.7 Data Classes.

If you like this post, Share with your friends on