[Matplotlib-users] Running matplotlib on massively parallel compute resources

Kevin Buckley kevin.buckley.pawsey.org.au at gmail.com
Wed Jan 15 22:41:31 EST 2020


We've recently seen an issue where someone running multiple instances
of jobs on our supercomputer, all of which have a matplotlib component
that thus runs on the compute nodes, rather than as part of any
post-processing on our anciliary services.

Some of these jobs ended up hanging and, in a number of cases, we have
observed that the hanging process is what we belive to be the matplotlib-
spawned

   fc-list --format=%{file}\n

Is there anything, in the way that matplotlib is written, that might
see race conditions, around access to the per-user font cache, or
other matplotlib data, being created?

Furthermore, is there a way that our users could define a per-job font
cache directory, by using the job-ID, and thereby explcitly avoiding
any inter-job interference resulting from their "massively parallel"
matplotlib invocations?

Here's hoping that matplotlib is the cause, and, if so, that there's an
easy solution, when you know how to use matplotlib.


More information about the Matplotlib-users mailing list