Code Behind “Dotomorrow” — Context Manager in Python
I came up with my first python package called Dotomorrow
when I was working on my data science project. Readers can checkout my github repo to get a basic idea of what this package does before proceeding.
Motivation
While executing a time-consuming function over a parameter set, I realized the necessity of autosaving the procedure upon keyboard interruption or any sort of error. Re-executing was a heartbreaking experience.
A scenario for instance is:
parameters = range(1000)
results = []
for par in parameters:
result = some_function(par) # takes hours to execute
results.append(result)
Or, when scraping a website without having different proxies to rotate among, a sleep
function must be implemented to avoid being blocked by the host:
urls = [...]
results = []
for url in urls:
sleep(5)
rq = requests.get(url)
content = rq.json()
# do something with it
One solution is to save a file manually after each iteration, for example:
parameters = range(1000)
results = []
for par in parameters:
result = some_function(par) # takes hours to execute
save_to_file(result)
However, on the next time you execute this piece of code, you’ll have to identify the parameters that are not yet being executed, so the code must be modified into something like this:
parameters = range(1000)
remaining_parameters = get_remaining_parameters(file_path, parameters)#
results = []
for par in remaining_parameters:
result = some_function(par) # takes hours to execute
save_to_file(result)
The pseudocode makes it look neat enough to implement them before executing every time-consuming loops, but the actual implementation under the hood can be messy . It is nice to have a syntax candy that wraps all the tedious works under minimum modification on the original statements.
The Dotomorrow
package that I wrote is a solution for this kind of task. As one simply wraps the loop within a with...as...
environment, the preloading and saving of the parameters and results are handled automatically:
with SaverIterator("cache1", parameters) as si:
for par in si:
result = some_function(par) # takes hours to execute
si += result
In this article, I will explain how these syntax can be designed in python. Specifically :
- Designing the
with as
statement - Iterating on
si
- Overriding the
+=
operator to append the object.
Constructor
We start by creating a class object that holds some of the data that we need:
- A file path for the cache
- The parameter list
Upon instantiation, the SavedIterator
object must read the cache file (if any), and identify the parameters that were executed and saved in the cache file
class SavedIterator:
def __init__(self, file_path:str, parameter_space):
self.__file_path = file_path
self.__params = parameter_space
self.__has_prev_result = False
self.results = self.__load_results()
self.__remaining_params = \
[p for p in self.__params if p not in self.results.keys()]
def __load_results(self):
# read from self.__file_path
# change self.__has_prev_result to True if file found
...
The __remaining_params
attribute then contains all the remaining parameters that are not yet executed under previous executions.
Context Manager
A nice feature in python is the context manager. It allows users to pre-execute some commands before and after the indented code segment. A common example is the open()
function:
with open("DATA/texts.txt", "r") as f:
# do something with the file
f.open()
and f.close()
are executed behind the scene so that users don’t have to bother managing these commands every time.
For our case, we want to
- Read the cached files before the loop
- Save to the cached files after the loop is interrupted
How to Implement the Context Manager?
The fundamental way of creating a context manager is by creating a class with two specific methods:
class SavedIterator:
# see previous block
def __init__(self, ...):...
# executed when entering the context manager
def __enter__(self):
...
# executed after exiting the context manager
def __exit__(self, exc_type, exc_val, exc_tb):
...
In our case, we simply print out some words, indicating the numbers of remaining parameters upon entering the context manager:
def __enter__(self):
if self.__has_prev_result:
print(f"Cache found in {self.__file_path}. {len(self.__remaining_params)} iterations remained.")
else:
print("No cache file found. Staring from scratch.")
return self
Returning self
ensures that the object itself can be used in the context manager. In other words, in the statement
with SavedIterator("cache", parameters) as si:
The si
is then the SavedIterator
class object.
Upon exiting, we want to replace the cache file with the new data, and perhaps add some text to indicate users about the current situation:
def __exit__(self, exc_type, exc_value, trace_back):
self.__save_results()
self.__current_param = None
if exc_type is KeyboardInterrupt:
print("Interrupted. File autosaved in ", self.__file_path)
return True
The __exit__
method takes three parameters. exit_type
indicates the type that causes the exit. In our case we are specifically interested in KeyboardInterrupt.
Other errors can be detected as well. exit_value
contains the content of exception, and trace_back
contains the trace back of the error (you’re welcome). These are not very important, as we want to focus on saving the results.
Once the two methods are defined, we can then use SavedIterator
as a context manager:
with SavedIterator("cache1", parameters) as si:
...
Iteration
The next part of the syntax candy comes to integrating the si
object with the for loop:
with SavedIterator("cache1", parameters) as si:
for par in si:
...
This might not be intuitively the best design, but this is surely a minimalistic appearance for a syntax candy, and I love it!
Note that under the hood when we are running a for loop, python extracts a result from the object being iterated by calling the __iter__
method.
Therefore, in order to let python “extract” some elements from the SavedIterator
class, we need to make sure the SavedIterator
object has an __iter__
method:
class SavedIterator:
def __iter__(self, ...):
...
Although being a “method”, it should not be "returning" anything. Instead, this method should “yield” a result every time it is called.
The idea is very simple. If there are still parameters remaining in the __remaining_parameters
attribute, the __iter__
method should continue yielding elements from __remaining_parameters
. Hence:
def __iter__(self):
while len(self.__remaining_params)>0:
self.__current_param = self.__remaining_params.pop(0)
yield self.__current_param
Calling the pop(0)
on the list returns the first element of the list and removes it. The first element is then passed to the for statement as the next element.
The final feature is to append the result under current iteration into the SavedIterator
object.
Overriding the Plus Equal
In principal, an intuitive design could be:
with SaverIterator("cache1", parameters) as si:
for par in si:
result = some_function(par) # takes hours to execute
si.append(result)
Where the append function adds the result to a dict
in the SavedIterator
object with the current parameter as the key:
class SavedIterator:
...
def append(self, result):
self.results[self.__current_param] = result
return self
However, I found it neat and creative to use the +=
operator as an alternative to the boring append
syntax. Therefore, I override the __iadd__
function.
In python, a brand new class often lack the ability to do addition/ subtraction/ comparison/ etc. To enable python to interpret something like A1+A2
for some class instances, we have to manually define the __add__
dunder method for the class. Other operators such as -,*,/,>,<,=,>>,<<,@
each corresponds to a “dunder method”.
For the operator +=
, the dunder method turns out to be __iadd__
. Hence define:
class SaverIterator:
def __iadd__(self, result):
self.append(result)
return self
Remember to return self
, as si+= result
is equivalent to si = si + result
; returning self
ensures that the si
variable still refers to the same SavedIterator
instance.
Conclusion
Dotomorrow requires not many difficult programming skills, but a deep understanding on how python works is necessary. In the future, I’ll add some other features such as allowing it to be implemented on a nested loop or caching the files on somewhere else.
If you find this project interesting, feel free to share your thoughts and modifications on Github!