Mailman 3 Fil v0.11.0, a memory profiler for scientists and data scientists - Python-announce-list

19 Nov 2020

      Your code reads some data, processes it, and uses too much memory. In order to reduce memory usage, you need to figure out:

 1. Where peak memory usage is, also known as the high-water mark.
 2. What code was responsible for allocating the memory that was present at that peak moment.
That's exactly what Fil will help you find.

Fil an open source memory profiler designed for data processing applications written in Python, and includes native support for Jupyter. It is designed to be high-performance and easy to use.
At the moment it only runs on Linux and macOS.

You can learn more about Fil at https://pythonspeed.com/fil or on GitHub at https://github.com/pythonspeed/filprofiler/.

v0.11 includes performance improvements and less intrusive behavior under Jupyter.

Fil vs. other Python memory tools

There are two distinct patterns of Python usage, each with its own source of memory problems.

In a long-running server, memory usage can grow indefinitely due to memory leaks. That is, some memory is not being freed.

 * If the issue is in Python code, tools like `tracemalloc` https://docs.python.org/3/library/tracemalloc.html and Pympler https://pypi.org/project/Pympler/ can tell you which objects are leaking and what is preventing them from being leaked.
 * If you're leaking memory in C code, you can use tools like Valgrind https://valgrind.org/.
Fil, however, is not aimed at memory leaks, but at the other use case: data processing applications. These applications load in data, process it somehow, and then finish running.

The problem with these applications is that they can, on purpose or by mistake, allocate huge amounts of memory. It might get freed soon after, but if you allocate 16GB RAM and only have 8GB in your computer, the lack of leaks doesn't help you.

Fil will therefore tell you, in an easy to understand way:

 1. Where peak memory usage is, also known as the high-water mark.
 2. What code was responsible for allocating the memory that was present at that peak moment.
 3. This includes C/Fortran/C++/whatever extensions that don't use Python's memory allocation API (`tracemalloc` only does Python memory APIs).

Fil v0.11.0, a memory profiler for scientists and data scientists

Itamar Turner-Trauring

tags

participants (1)