You're reading from Python Object-Oriented Programming Build robust and maintainable object-oriented Python applications and libraries

Product type Paperback

Published in Jul 2021

Publisher Packt

ISBN-13 9781801077262

Length 714 pages

Edition 4th Edition

Languages

Python

Concepts

Object Oriented Programming

Author (1):

Dusty Phillips

View More author details

Table of Contents (17) Chapters

Preface

1. Object-Oriented Design

2. Objects in Python FREE CHAPTER

3. When Objects Are Alike

4. Expecting the Unexpected

5. When to Use Object-Oriented Programming

6. Abstract Base Classes and Operator Overloading

7. Python Data Structures

8. The Intersection of Object-Oriented and Functional Programming

9. Strings, Serialization, and File Paths

10. The Iterator Pattern

11. Common Design Patterns

12. Advanced Design Patterns

13. Testing Object-Oriented Programs

14. Concurrency

15. Other Books You May Enjoy

16. Index

Dataclasses

Since Python 3.7, dataclasses let us define ordinary objects with a clean syntax for specifying attributes. They look – superficially – very similar to named tuples. This is a pleasant approach that makes it easy to understand how they work.

Here's a dataclass version of our Stock example:

>>> from dataclasses import dataclass
>>> @dataclass
... class Stock:
...     symbol: str
...     current: float
...     high: float
...     low: float

For this case, the definition is nearly identical to the NamedTuple definition.

The dataclass function is applied as a class decorator, using the @ operator. We encountered decorators in Chapter 6, Abstract Base Classes and Operator Overloading. We'll dig into them deeply in Chapter 11, Common Design Patterns. This class definition syntax isn't much less verbose than an ordinary class with __init__(), but it gives us access to several additional dataclass features.

It's important to recognize that the names are provided at the class level, but are not actually creating class-level attributes. The class level names are used to build several methods, including the __init__() method; each instance will have the expected attributes. The decorator transforms what we wrote into the more complex definition of a class with the expected attributes and parameters to __init__().

Because dataclass objects can be stateful, mutable objects, there are a number of extra features available. We'll start with some basics. Here's an example of creating an instance of the Stock dataclass.

>>> s = Stock("AAPL", 123.52, 137.98, 53.15)

Once instantiated, the Stock object can be used like any ordinary class. You can access and update attributes as follows:

>>> s
Stock(symbol='AAPL', current=123.52, high=137.98, low=53.15)
>>> s.current
123.52
>>> s.current = 122.25
>>> s
Stock(symbol='AAPL', current=122.25, high=137.98, low=53.15)

As with other objects, we can add attributes beyond those formally declared as part of the dataclass. This isn't always the best idea, but it's supported because this is an ordinary mutable object:

>>> s.unexpected_attribute = 'allowed'
>>> s.unexpected_attribute
'allowed'

Adding attributes isn't available for frozen dataclasses, which we'll talk about later in this section. At first glance, it seems like dataclasses don't give many benefits over an ordinary class definition with an appropriate constructor. Here's an ordinary class that's similar to the dataclass:

>>> class StockOrdinary:
...     def __init__(self, name: str, current: float, high: float, low: ... float) -> None:
...         self.name = name
...         self.current = current
...         self.high = high
...         self.low = low
>>> s_ord = StockOrdinary("AAPL", 123.52, 137.98, 53.15)

One obvious benefit to a dataclass is we only need to state the attribute names once, saving the repetition in the __init__() parameters and body. But wait, that's not all! The dataclass also provides a much more useful string representation than we get from the implicit superclass, object. By default, dataclasses include an equality comparison, also. This can be turned off in the cases where it doesn't make sense. The following example compares the manually built class to these dataclass features:

>>> s_ord
<__main__.StockOrdinary object at 0x7fb833c63f10>
>>> s_ord_2 = StockOrdinary("AAPL", 123.52, 137.98, 53.15)
>>> s_ord == s_ord_2
False

The class built manually has an awful default representation, and the lack of an equality test can make life difficult. We'd prefer the behavior of the Stock class defined as a dataclass.

>>> stock2 = Stock(symbol='AAPL', current=122.25, high=137.98, low=53.15)
>>> s == stock2
True

Class definitions decorated with @dataclass also have many other useful features. For example, you can specify a default value for the attributes of a dataclass. Perhaps the market is currently closed and you don't know what the values for the day are:

@dataclass
class StockDefaults:
    name: str
    current: float = 0.0
    high: float = 0.0
    low: float = 0.0

You can construct this class with just the stock name; the rest of the values will take on the defaults. But you can still specify values if you prefer, as follows:

>>> StockDefaults("GOOG")
StockDefaults(name='GOOG', current=0.0, high=0.0, low=0.0)
>>> StockDefaults("GOOG", 1826.77, 1847.20, 1013.54)
StockDefaults(name='GOOG', current=1826.77, high=1847.2, low=1013.54)

We saw earlier that dataclasses support equality comparison by default. If all the attributes compare as equal, then the dataclass objects as a whole also compare as equal. By default, dataclasses do not support other comparisons, such as less than or greater than, and they can't be sorted. However, you can easily add comparisons if you wish, demonstrated as follows:

@dataclass(order=True)
class StockOrdered:
    name: str
    current: float = 0.0
    high: float = 0.0
    low: float = 0.0

It's okay to ask "Is that all that's needed?" The answer is yes. The order=True parameter to the decorator leads to the creation of all of the comparison special methods. This change gives us the opportunity to sort and compare the instances of this class. It works like this:

>>> stock_ordered1 = StockOrdered("GOOG", 1826.77, 1847.20, 1013.54)
>>> stock_ordered2 = StockOrdered("GOOG")
>>> stock_ordered3 = StockOrdered("GOOG", 1728.28, high=1733.18, low=1666.33)
>>> stock_ordered1 < stock_ordered2
False
>>> stock_ordered1 > stock_ordered2
True
>>> from pprint import pprint
>>> pprint(sorted([stock_ordered1, stock_ordered2, stock_ordered3]))
[StockOrdered(name='GOOG', current=0.0, high=0.0, low=0.0),
 StockOrdered(name='GOOG', current=1728.28, high=1733.18, low=1666.33),
 StockOrdered(name='GOOG', current=1826.77, high=1847.2, low=1013.54)]

When the dataclass decorator receives the order=True argument, it will, by default, compare the values based on each of the attributes in the order they were defined. So, in this case, it first compares the name attribute values of the two objects. If those are the same, it compares the current attribute values. If those are also the same, it will move on to high and will even include low if all the other attributes are equal. The rules follow the definition of a tuple: the order of definition is the order of comparison.

Another interesting feature of dataclasses is frozen=True. This creates a class that's similar to a typing.NamedTuple. There are some differences in what we get as features. We'd need to use @dataclass(frozen=True, ordered=True) to create structures. This leads to a question of "Which is better?", which – of course – depends on the details of a given use case. We haven't explored all of the optional features of dataclasses, like initialization-only fields and the __post_init__() method. Some applications don't need all of these features, and a simple NamedTuple may be adequate.

There are a few other approaches. Outside the standard library, packages like attrs, pydantic, and marshmallow provide attribute definition capabilities that are – in some ways – similar to dataclasses. Other packages outside the standard library offer additional features. See https://jackmckew.dev/dataclasses-vs-attrs-vs-pydantic.html for a comparison.

We've looked at two ways to create unique classes with specific attribute values, named tuples and dataclasses. It's often easier to start with dataclasses and add specialized methods. This can save us a bit of programming because some of the basics, like initialization, comparison, and string representations, are handled elegantly for us.

It's time to look at Python's built-in generic collections, dict, list, and set. We'll start by exploring dictionaries.