Subtleties of Python
A good software engineer understands how crucial attention to detail is; minute details, if overlooked, can make a world of difference between a working unit and a disaster. That’s why writing clean code matters a lot—and clean code isn’t just about neat indentation and formatting; it’s about paying attention to those details that can affect production.
In this article, you’ll see a couple of short cases of problematic code in Python and how they can be improved. Please note that these are just examples and in no way must you interpret them to universally apply for real-world problems.
Mutable objects and attributes
One of the most attractive features of functional programming is immutability. Mutation is simply not allowed in functional software. However, in most applications defined with objects, the objects are prone to mutation, changing their internal state or representation. Agreed that there can be immutable objects, but such cases are few and far in between.
Take a look at the following code snippet. (Disclaimer: There’s something really, really wrong in it!)
sub¬tleties0.py (Source)
"""Mutable objects as class attributes."""
import logging
logger = logging.getLogger(__name__)
class Query:
PARAMETERS = {"limit": 100, "offset": 0}
def run_query(self, query, limit=None, offset=None):
if limit is not None:
self.PARAMETERS.update(limit=limit)
if offset is not None:
self.PARAMETERS.update(offset=offset)
return self._run(query, **self.PARAMETERS)
@staticmethod
def _run(query, limit, offset):
logger.info("running %s [%s, %s]", query, limit, offset)
It can often be pretty tempting to mutate dictionaries when using them to pass keyword arguments in order to adapt the dictionary to the function signature you want to call. However, you must always keep in mind the scope and the consequences that mutation can bring.
In the above case, the dictionary being mutated belongs to the class, thereby changing the class—this changes the default values to the values from the last update. Plus, owing to the fact that the dictionary belongs to the class, all instances, including the new ones, will carry on with this.
>>> q = Query()
>>> q.run_query("select 1")
running select 1 [100, 0]
>>> q.run_query("select 1", limit=50)
running select 1 [50, 0]
>>> q.run_query("select 1")
running select 1 [50, 0]
>>> q.PARAMETERS
{'limit': 50, 'offset': 0}
>>> new_query = Query()
>>> new_query.PARAMETERS
{'limit': 50, 'offset': 0}
As you can see, this is extremely unstable and fragile.
Here are some general recommendations to tackle this:
- Don’t change mutable objects passed by parameters to functions. Create new copies of the objects whenever possible, and return them accordingly.
- Don’t mutate class attributes.
- Try not to set mutable objects as class attributes.
However, there are exceptions to these rules; after all pragmatism beats purity. Here are some:
• For item 1, you must always consider the trade-off between memory and speed. If the object is too big (perhaps a large dictionary), running copy.deepcopy() on it will be slow, and it will take a lot of memory, so it’s probably faster to just modify it in place.
• The ex¬cep¬tion for rule [2] is when us¬ing de¬scrip¬tors, when you are count¬ing on that side effec¬t. Oth¬er than that, there should¬n’t be any rea¬son to go on such a dan¬ger¬ous path.
• Rule [3] should¬n’t be a prob¬lem if the at¬tributes are read¬-on¬ly. In that case, set¬ting dic¬tio-nar¬ies and lists as class at¬tributes might be fine, but even though you may be currently sure of its immutability, you can’t as¬sure no¬body will ever break that rule in the fu¬ture.
Iterators
Python’s iterator protocol allows you to treat an entire set of objects by their behavior, irrespective of their internal representation.
For example, consider the following:
for i in myiterable: ...
What is myiterable in the above code? It may be a list, a tuple, a dictionary, or a string, and it will still work fine. In fact, you can also rely on all methods that use this protocol:
mylist.extend(myiterable)
Unfortunately though, along with all its great advantages, there are a few demerits that accompany this amazing feature. For instance, the following will run incredibly slow:
def process_files(files_to_process, target_directory):
for file_ in files_to_process:
# ...
shutil.copy2(file_, target_directory)
Can you see what’s happening? Here, the compiler doesn’t exactly know what files_to_process is exactly (tuple, list, or dictionary).
Strings can also be iterable. Suppose you pass a single file (say, /home/ubuntu/foo). Every character is iterated over, starting with the / and then h and so on, which, in fact, slows down the program. Using a better interface can help tackle this issue. For example:
def process_files(*files_to_process, target_directory):
for file_ in files_to_process:
# ...
shutil.copy2(file_, target_directory)
In the above example, the function signature uses a cleaner interface, as it allows multiple files as arguments, thereby eliminating the issue explained previously. Moreover, it also makes target_directory to be keyword only, which is more explicit even.
Hope you enjoyed reading this article. If you want to learn more about how to refactor legacy code, you can explore Clean Code in Python by Mariano Anaya. Packed with numerous hands-on examples and problems, Clean Code in Python is a must-read for team leads, software architects, and senior engineers keen on improving legacy systems to improve efficiency and save costs.