In June, the Python Software Foundation released the newest version of Python,3.7. In this post, we'll look at one of the nicest new features, "data classes".

Still On Python 2?
It's possible that you may be used to working in an older fork of Python,Python 2, particularly if you've been reading older books, working on legacy software, or following outdated curricula. (If you're not certain, you can check the version number of Python when you start it up.)
Years ago, Python split into two projects: one continuing the current language (Python 2) and the other creating a more modern version, Python 3. The differences between these two included many parts around string/Unicode handling, printing, and lasy evaluation, and are covered well elsewhere. Python 2 has continued to be developed, but in maintenance mode: no new features or syntax have been added to the language in years, and only bug fixes are contributed to it. The PSF announced years ago that Python 2.7 would be the last release in the Python 2 language, and it has a maintenance end of life in 2020 — after that, even bugs in it will no longer be fixed.
The upshot of this is that Python 3 is far more innovative, and includes many new features not in Python 2.
As 2020 approaches, it becomes even more critical to move to Python 3 — and new features like those in 3.7 can make these even more enjoyable than before.
Data Classes
The Challenge
It’s common when creating classes in Python (or any OO language) to have a fair amount of boilerplate code. For example, imagine an Employee class:
class Employee: """Employee of our company."""
However, we want to be able to instantiate employees by providing attributes, so we need to write an init method:
class Employee: # ... def __init__(self, fname, lname, ssn): self.fname = fname self.lname = lname self.ssn = ssn
Now we can do this:
>>> joel = Employee(fname="Joel", lname="Burton", ssn="123-45-6789")
However, we’re still not done: if we print out one of those employees, we get a very unhelpful, generic representation, like <Employee object at 0x1234>
. This makes it tedious to debug programs, as we have to constantly print out other information to figure out which employee this is.
Therefore, it’s common to add a __repr__
method; this is called when you evaluate an object (like in a print statement or a debugging session). A common __repr__
method for a class like this would be:
class Employee: # ... def __repr__(self): return (f'Employee(fname="{self.fname}", lname="{self.lname}",' f' ssn="{self.ssn}")')
Now, we we look at an employee, we’ll see something nicer:
>>> joel Employee(fname="Joel", lname="Burton", ssn="123-45-6789")
There’s another thing to think about: if we make the same employee twice, Python won’t consider them the same person. By default, object comparison in Python is by “identity” — essentially, is this the same exact object in memory.
We can see this with:
>>> e1 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789") >>> e2 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789") >>> e1 is e2 # check object identity False >>> e1 == e2 False
That’s definitely the right answer for is: they’re not the exact same object in memory, so this should be false. However, in many applications, we’d want them to be considered equal to each other.
To do this, we’d need to implement anothe special method, __eq__
:
class Employee: # ... def __eq__(self, other): return (self.fname == other.fname and self.lname == other.lname and self.ssn == other.ssn)
And now they will compare equally:
>>> e1 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789") >>> e2 = Employee(fname="Joel", lname="Burton", ssn="123-45-6789") >>> e1 is e2 # check object identity, we still want this false False >>> e1 == e2 True
Introducing: Dataclasses
Python 3.7 introduces a new feature, “data classes”, which make this much easier.
To use, you decorate a class with the dataclass decorator from the dataclasses library. Then, you list your class attributes, along with their types, in the body of the class itself:
from dataclasses import dataclass @dataclass class EmployeeDC: fname: str lname: str ssn: str
Now, we get all of the behavior above:
>>> e1 = EmployeeDC(fname="Joel", lname="Burton", ssn="123-45-6789") >>> e1 EmployeeDC(fname='Joel', lname='Burton', ssn='123-45-6789') >>> e2 = EmployeeDC(fname="Joel", lname="Burton", ssn="123-45-6789") >>> e1 is e2 # check object identity, we still want this false False >>> e1 == e2 True
There’s More
That’s a basic example of dataclasses — however, there are many more features that can provide more flexibility:
- you can decide which fields count for comparison (perhaps two employees should be considered the same if their SSN attributes are the same, but their names could be different0
- you can make instances “frozen” (attributes cannot be changed)
- frozen classes can be hashed; this would allow them to be keys in a dictionary or added to a set (normally instances of objects cannot be used as keys unless you implement the special method hash)
- default values for attributes, including dynamic of mutable values
- you can add a post_init method can do extra work on instantiation
To read about these features, see A Brief Tour of Python 3.7 Data Classes.