Mastering Python Dataclasses: Tips and Tricks

Published May 29, 2024

Dataclasses are a relatively recent addition to Python. They were introduced in PEP 557 and included in Python 3.7 and later versions. A data class is designed to hold only data values.

class Person:
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city

In this example, we’ve defined a simple Person class with attributes for name, age, and city. This class requires an __init__ method to initialise its attributes. Now, let’s reimplement the same class using a data class:

from dataclasses import dataclass

@dataclass
class PersonDataclass:
    name: str
    age: int
    city: str

With dataclasses, the process becomes simpler. You no longer need to manually write an explicit __init__ method or manage attribute assignments. The @dataclass decorator will generate these methods for you, making your code more concise and easier to understand.

But it is not limited to __init__. Dataclasses also provide efficient default implementations for standard methods like ‘repr’, ‘eq’, and ‘hash’, saving you time and effort writing these methods yourself.

Let’s create objects of both classes and see


person1 = Person('John Doe',30,'New York')
person2 = Person('John Doe',30,'New York')
persondc1 = PersonDataclass('Jack',32,'Seattle')
persondc2 = PersonDataclass('Jack',32,'Seattle')
person1 == person2
persondc1 = persondc2
print(person1)
print(persondc2)

To do the same print in class


class Person():
    def __init__(self, name, age, height, email):
        self.name = name
        self.age = age
        self.height = height
        self.email = email

    def __repr__(self):
        return (f'{self.__class__.__name__}(name={self.name}, age={self.age}, height={self.height}, email={self.email})')

person = Person('Joe', 25, 1.85, 'joe@dataquest.io')
print(person)

We can always overwrite it if we want to customise the representation of our class:

@dataclass
class Person():
    name: str
    age: int
    height: float
    email: str

    def __repr__(self):
        return (f'''This is a {self.__class__.__name__} called {self.name}.''')

person = Person('Joe', 25, 1.85, 'joe@dataquest.io')
print(person)

we can also combine dataclass with the typing modules to create attributes of any kind in the class. For instance, let’s add a house_coordinates attribute to the Person:

from typing import Tuple

@dataclass
class PersonDataclass():
    name: str
    age: int
    city: float
    house_coordinates: Tuple

print(Person('Jack', 32, 'Seattle', (40.748441, -73.985664)))

Following the same logic, we can create a data class to hold multiple instances of the Person class:

from typing import List

@dataclass
class People():
    people: List[Person]

joe = Person('Joe', 25, 1.85, 'joe@dataquest.io', (40.748441, -73.985664))
mary = Person('Mary', 43, 1.67, 'mary@dataquest.io', (-73.985664, 40.748441))

print(People([joe, mary]))

As we saw above, when using the dataclass decorator, the __init__, __repr__, and __eq__ methods are implemented for us.

But what about other things we want to data classes like hashing, sorting and comparison


@dataclass(order=True)
class Person():
    name: str
    age: int
    height: float
    email: str


joe = Person('Joe', 25, 1.85, 'joe@dataquest.io')
mary = Person('Mary', 43, 1.67, 'mary@dataquest.io')

print(joe > mary)

The first is the field function. This function customises one attribute of a data class individually, allowing us to define new attributes that depend on another and are only created after the object is instantiated.

In our sorting problem, we’ll use field to create a sort_index attribute in our class. This attribute can only be made after the object is instantiated and is what dataclasses uses for sorting:

from dataclasses import dataclass, field

@dataclass(order=True)
class Person():
    sort_index: int = field(init=False, repr=False)
    name: str
    age: int
    height: float
    email: str

The two arguments we passed as False state that this attribute isn’t in the __init__ and shouldn’t be displayed when we call __repr__. The documentation provides other parameters in the field function.

After referencing this new attribute, we’ll use the second new tool: the __post_int__ method. As it goes by the name, this method is executed right after the __init__ method. We’ll use __post_int__ to define the sort_index right after the creation of the object. For example, let’s say we want to compare people based on age. Here’s how:

@dataclass(order=True)
class Person():
    sort_index: int = field(init=False, repr=False)
    name: str
    age: int
    height: float
    email: str

    def __post_init__(self):
        self.sort_index = self.age

If we make the same comparison, we know that Joe is younger than Mary:


joe = Person('Joe', 25, 1.85, 'joe@dataquest.io')
mary = Person('Mary', 43, 1.67, 'mary@dataquest.io')

print(joe > mary)

Inheritance with dataclasses
The dataclasses module also supports inheritance, which means we can create a data class that uses the attributes of another data class. Still using our Person class, we’ll create a new Employee class that inherits all the attributes from Person. So we have Person:

@dataclass(order=True)
class Person():
    name: str
    age: int
    height: float
    email: str

And the new Employee class:

@dataclass(order=True)
class Employee(Person):
    salary: int
    department: str

Now, we can create an object of the Employee class using all the attributes of the Person class:

print(Employee('Joe', 25, 1.85, 'joe@dataquest.io', 100000, 'Marketing'))

More details about dataclasses are available here

Conclusion
In conclusion, dataclasses in Python offer a powerful and concise way to define classes focused on data storage. By automating method generation, reducing boilerplate code, and providing flexibility, dataclasses enhance code readability and streamline development. While they might not be a one-size-fits-all solution, incorporating dataclasses into your Python toolkit can lead to more maintainable, expressive, and efficient code, especially in scenarios where simplicity and data-centric design are vital considerations.

Python

Report

Enjoy this post? Give Surya Pratap Singh a like if it's helpful.

Surya Pratap Singh

Python Programmer | Data Engineer | ETL developer

11 years of hands-on practice at growing engineering teams, building data-pipeline and big data technologies. I have developed many data pipelines and ETL from scratch to end, web panels for reporting and Hadoop administration. I ...

Discover and read more posts from Surya Pratap Singh

get started