Mastering Python Dataclasses: Tips and Tricks
Dataclasses are a relatively recent addition to Python. They were introduced in PEP 557 and included in Python 3.7 and later versions. A data class is designed to hold only data values.
class Person:
def __init__(self, name, age, city):
self.name = name
self.age = age
self.city = city
In this example, we’ve defined a simple Person
class with attributes for name, age, and city. This class requires an __init__
method to initialise its attributes. Now, let’s reimplement the same class using a data class:
from dataclasses import dataclass
@dataclass
class PersonDataclass:
name: str
age: int
city: str
With dataclasses, the process becomes simpler. You no longer need to manually write an explicit __init__
method or manage attribute assignments. The @dataclass
decorator will generate these methods for you, making your code more concise and easier to understand.
But it is not limited to __init__
. Dataclasses also provide efficient default implementations for standard methods like ‘repr’, ‘eq’, and ‘hash’, saving you time and effort writing these methods yourself.
Let’s create objects of both classes and see
person1 = Person('John Doe',30,'New York')
person2 = Person('John Doe',30,'New York')
persondc1 = PersonDataclass('Jack',32,'Seattle')
persondc2 = PersonDataclass('Jack',32,'Seattle')
person1 == person2
persondc1 = persondc2
print(person1)
print(persondc2)
To do the same print in class
class Person():
def __init__(self, name, age, height, email):
self.name = name
self.age = age
self.height = height
self.email = email
def __repr__(self):
return (f'{self.__class__.__name__}(name={self.name}, age={self.age}, height={self.height}, email={self.email})')
person = Person('Joe', 25, 1.85, 'joe@dataquest.io')
print(person)
We can always overwrite it if we want to customise the representation of our class:
@dataclass
class Person():
name: str
age: int
height: float
email: str
def __repr__(self):
return (f'''This is a {self.__class__.__name__} called {self.name}.''')
person = Person('Joe', 25, 1.85, 'joe@dataquest.io')
print(person)
we can also combine dataclass
with the typing
modules to create attributes of any kind in the class. For instance, let’s add a house_coordinates
attribute to the Person
:
from typing import Tuple
@dataclass
class PersonDataclass():
name: str
age: int
city: float
house_coordinates: Tuple
print(Person('Jack', 32, 'Seattle', (40.748441, -73.985664)))
Following the same logic, we can create a data class to hold multiple instances of the Person
class:
from typing import List
@dataclass
class People():
people: List[Person]
joe = Person('Joe', 25, 1.85, 'joe@dataquest.io', (40.748441, -73.985664))
mary = Person('Mary', 43, 1.67, 'mary@dataquest.io', (-73.985664, 40.748441))
print(People([joe, mary]))
As we saw above, when using the dataclass
decorator, the __init__
, __repr__
, and __eq__
methods are implemented for us.
But what about other things we want to data classes like hashing, sorting and comparison
@dataclass(order=True)
class Person():
name: str
age: int
height: float
email: str
joe = Person('Joe', 25, 1.85, 'joe@dataquest.io')
mary = Person('Mary', 43, 1.67, 'mary@dataquest.io')
print(joe > mary)
The first is the field
function. This function customises one attribute of a data class individually, allowing us to define new attributes that depend on another and are only created after the object is instantiated.
In our sorting problem, we’ll use field
to create a sort_index
attribute in our class. This attribute can only be made after the object is instantiated and is what dataclasses
uses for sorting:
from dataclasses import dataclass, field
@dataclass(order=True)
class Person():
sort_index: int = field(init=False, repr=False)
name: str
age: int
height: float
email: str
The two arguments we passed as False
state that this attribute isn’t in the __init__
and shouldn’t be displayed when we call __repr__
. The documentation provides other parameters in the field
function.
After referencing this new attribute, we’ll use the second new tool: the __post_int__
method. As it goes by the name, this method is executed right after the __init__
method. We’ll use __post_int__
to define the sort_index
right after the creation of the object. For example, let’s say we want to compare people based on age. Here’s how:
@dataclass(order=True)
class Person():
sort_index: int = field(init=False, repr=False)
name: str
age: int
height: float
email: str
def __post_init__(self):
self.sort_index = self.age
If we make the same comparison, we know that Joe is younger than Mary:
joe = Person('Joe', 25, 1.85, 'joe@dataquest.io')
mary = Person('Mary', 43, 1.67, 'mary@dataquest.io')
print(joe > mary)
Inheritance with dataclasses
The dataclasses
module also supports inheritance, which means we can create a data class that uses the attributes of another data class. Still using our Person
class, we’ll create a new Employee
class that inherits all the attributes from Person
. So we have Person
:
@dataclass(order=True)
class Person():
name: str
age: int
height: float
email: str
And the new Employee
class:
@dataclass(order=True)
class Employee(Person):
salary: int
department: str
Now, we can create an object of the Employee
class using all the attributes of the Person
class:
print(Employee('Joe', 25, 1.85, 'joe@dataquest.io', 100000, 'Marketing'))
More details about dataclasses are available here
Conclusion
In conclusion, dataclasses in Python offer a powerful and concise way to define classes focused on data storage. By automating method generation, reducing boilerplate code, and providing flexibility, dataclasses enhance code readability and streamline development. While they might not be a one-size-fits-all solution, incorporating dataclasses into your Python toolkit can lead to more maintainable, expressive, and efficient code, especially in scenarios where simplicity and data-centric design are vital considerations.