Catching exceptions with multi-processing
Fabien
fabien.maussion at gmail.com
Fri Jun 19 10:01:02 EDT 2015
Folks,
I am developing a tool which works on individual entities (glaciers) and
do a lot of operations on them. There are many tasks to do, one after
each other, and each task follows the same interface:
def task_1(path_to_glacier_dir):
open file1 in path_to_glacier_dir
do stuff
if dont_work:
raise RuntimeError("didnt work")
write file2 in path_to_glacier_dir
This way, the tasks can be run in parallel very easily:
import multiprocessing as mp
pool = mp.Pool(4)
dirs = [list_of_dirs]
pool.map(task1, dirs, chunksize=1)
pool.map(task2, dirs, chunksize=1)
pool.map(task3, dirs, chunksize=1)
... and so forth. I tested the tool for about a hundred glaciers but now
it has to run for thousands of them. There are going to be errors, some
of them are even expected for special outliers. What I would like the
tool to do is that in case of error, it writes the identifier of the
problematic glacier somewhere, the error encountered and more info if
possible. Because of multiprocessing, I can't write in a shared file, so
I thought that the individual processes should write a unique "error
file" in a dedicated directory.
What I don't know how to, however, is how to do this at minimal cost and
in a generic way for all tasks. Also, the task2 should not be run if
task1 threw an error. Sometimes (for debugging), I'd rather keep the
normal behavior of raising an error and stopping the program.
Do I have to wrap all tasks with a "try: exept:" block? How to switch
between behaviors? All the solutions I could think about look quite ugly
to me. And it seems that this is a general problem that someone cleverer
than me had solved before ;-)
Thanks,
Fabien
More information about the Python-list
mailing list