Code Behind “Dotomorrow” — Context Manager in Python

Technical Details behind my new package!

陳家威
6 min readJan 17, 2024

I came up with my first python package called Dotomorrow when I was working on my data science project. Readers can checkout my github repo to get a basic idea of what this package does before proceeding.

Motivation

While executing a time-consuming function over a parameter set, I realized the necessity of autosaving the procedure upon keyboard interruption or any sort of error. Re-executing was a heartbreaking experience.

A scenario for instance is:

parameters = range(1000)
results = []
for par in parameters:
result = some_function(par) # takes hours to execute
results.append(result)

Or, when scraping a website without having different proxies to rotate among, a sleep function must be implemented to avoid being blocked by the host:

urls = [...]
results = []
for url in urls:
sleep(5)
rq = requests.get(url)
content = rq.json()
# do something with it

One solution is to save a file manually after each iteration, for example:

parameters = range(1000)
results = []
for par in parameters:
result = some_function(par) # takes hours to execute
save_to_file(result)

However, on the next time you execute this piece of code, you’ll have to identify the parameters that are not yet being executed, so the code must be modified into something like this:

parameters = range(1000)
remaining_parameters = get_remaining_parameters(file_path, parameters)#
results = []
for par in remaining_parameters:
result = some_function(par) # takes hours to execute
save_to_file(result)

The pseudocode makes it look neat enough to implement them before executing every time-consuming loops, but the actual implementation under the hood can be messy . It is nice to have a syntax candy that wraps all the tedious works under minimum modification on the original statements.

The Dotomorrow package that I wrote is a solution for this kind of task. As one simply wraps the loop within a with...as... environment, the preloading and saving of the parameters and results are handled automatically:

with SaverIterator("cache1", parameters) as si:
for par in si:
result = some_function(par) # takes hours to execute
si += result

In this article, I will explain how these syntax can be designed in python. Specifically :

  1. Designing the with as statement
  2. Iterating on si
  3. Overriding the += operator to append the object.

Constructor

We start by creating a class object that holds some of the data that we need:

  1. A file path for the cache
  2. The parameter list

Upon instantiation, the SavedIterator object must read the cache file (if any), and identify the parameters that were executed and saved in the cache file

class SavedIterator:
def __init__(self, file_path:str, parameter_space):
self.__file_path = file_path
self.__params = parameter_space
self.__has_prev_result = False
self.results = self.__load_results()
self.__remaining_params = \
[p for p in self.__params if p not in self.results.keys()]

def __load_results(self):
# read from self.__file_path
# change self.__has_prev_result to True if file found
...

The __remaining_params attribute then contains all the remaining parameters that are not yet executed under previous executions.

Context Manager

A nice feature in python is the context manager. It allows users to pre-execute some commands before and after the indented code segment. A common example is the open() function:

with open("DATA/texts.txt", "r") as f:
# do something with the file

f.open() and f.close() are executed behind the scene so that users don’t have to bother managing these commands every time.

For our case, we want to

  1. Read the cached files before the loop
  2. Save to the cached files after the loop is interrupted

How to Implement the Context Manager?

The fundamental way of creating a context manager is by creating a class with two specific methods:

class SavedIterator:
# see previous block
def __init__(self, ...):...

# executed when entering the context manager
def __enter__(self):
...

# executed after exiting the context manager
def __exit__(self, exc_type, exc_val, exc_tb):
...

In our case, we simply print out some words, indicating the numbers of remaining parameters upon entering the context manager:

def __enter__(self):
if self.__has_prev_result:
print(f"Cache found in {self.__file_path}. {len(self.__remaining_params)} iterations remained.")
else:
print("No cache file found. Staring from scratch.")
return self

Returning self ensures that the object itself can be used in the context manager. In other words, in the statement

with SavedIterator("cache", parameters) as si:

The si is then the SavedIterator class object.

Upon exiting, we want to replace the cache file with the new data, and perhaps add some text to indicate users about the current situation:

def __exit__(self, exc_type, exc_value, trace_back):
self.__save_results()
self.__current_param = None
if exc_type is KeyboardInterrupt:
print("Interrupted. File autosaved in ", self.__file_path)
return True

The __exit__ method takes three parameters. exit_type indicates the type that causes the exit. In our case we are specifically interested in KeyboardInterrupt. Other errors can be detected as well. exit_value contains the content of exception, and trace_back contains the trace back of the error (you’re welcome). These are not very important, as we want to focus on saving the results.

Once the two methods are defined, we can then use SavedIterator as a context manager:

with SavedIterator("cache1", parameters) as si:
...

Iteration

The next part of the syntax candy comes to integrating the si object with the for loop:

with SavedIterator("cache1", parameters) as si:
for par in si:
...

This might not be intuitively the best design, but this is surely a minimalistic appearance for a syntax candy, and I love it!

Note that under the hood when we are running a for loop, python extracts a result from the object being iterated by calling the __iter__ method.

Therefore, in order to let python “extract” some elements from the SavedIterator class, we need to make sure the SavedIterator object has an __iter__ method:

class SavedIterator:

def __iter__(self, ...):
...

Although being a “method”, it should not be "returning" anything. Instead, this method should “yield” a result every time it is called.

The idea is very simple. If there are still parameters remaining in the __remaining_parameters attribute, the __iter__ method should continue yielding elements from __remaining_parameters . Hence:

def __iter__(self):
while len(self.__remaining_params)>0:
self.__current_param = self.__remaining_params.pop(0)
yield self.__current_param

Calling the pop(0) on the list returns the first element of the list and removes it. The first element is then passed to the for statement as the next element.

The final feature is to append the result under current iteration into the SavedIterator object.

Overriding the Plus Equal

In principal, an intuitive design could be:

with SaverIterator("cache1", parameters) as si:
for par in si:
result = some_function(par) # takes hours to execute
si.append(result)

Where the append function adds the result to a dict in the SavedIterator object with the current parameter as the key:

class SavedIterator:
...

def append(self, result):
self.results[self.__current_param] = result
return self

However, I found it neat and creative to use the += operator as an alternative to the boring append syntax. Therefore, I override the __iadd__ function.

In python, a brand new class often lack the ability to do addition/ subtraction/ comparison/ etc. To enable python to interpret something like A1+A2 for some class instances, we have to manually define the __add__ dunder method for the class. Other operators such as -,*,/,>,<,=,>>,<<,@ each corresponds to a “dunder method”.

For the operator += , the dunder method turns out to be __iadd__ . Hence define:

class SaverIterator:
def __iadd__(self, result):
self.append(result)
return self

Remember to return self , as si+= resultis equivalent to si = si + result ; returning self ensures that the si variable still refers to the same SavedIterator instance.

Conclusion

Dotomorrow requires not many difficult programming skills, but a deep understanding on how python works is necessary. In the future, I’ll add some other features such as allowing it to be implemented on a nested loop or caching the files on somewhere else.

If you find this project interesting, feel free to share your thoughts and modifications on Github!

--

--

陳家威

Graduate student in Economics. Aficionado of data science & causal inference