[New-bugs-announce] [issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used

Klamann report at bugs.python.org
Tue May 9 17:37:02 EDT 2017


New submission from Klamann:

The Executor's map() function accepts a function and an iterable that holds the function arguments for each call to the function that should be made. This iterable could be a generator, and as such it could reference data that won't fit into memory.

The behaviour I would expect is that the Executor requests the next element from the iterable whenever a thread, process or whatever is ready to make the next function call.

But what actually happens is that the entire iterable gets converted into a list right after the map function is called and therefore any underlying generator will load all referenced data into memory. Here's where the list gets built from the iterable:
https://github.com/python/cpython/blob/3.6/Lib/concurrent/futures/_base.py#L548

The way I see it, there's no reason to convert the iterable to a list in the map function (or any other place in the Executor). Just replacing the list comprehension with a generator expression would probably fix that.


Here's an example that illustrates the issue:

    from concurrent.futures import ThreadPoolExecutor
    import time
    
    def generate():
        for i in range(10):
            print("generating input", i)
            yield i
    
    def work(i):
        print("working on input", i)
        time.sleep(1)
    
    with ThreadPoolExecutor(max_workers=2) as executor:
        generator = generate()
        executor.map(work, generator)

The output is:

    generating input 0
    working on input 0
    generating input 1
    working on input 1
    generating input 2
    generating input 3
    generating input 4
    generating input 5
    generating input 6
    generating input 7
    generating input 8
    generating input 9
    working on input 2
    working on input 3
    working on input 4
    working on input 5
    working on input 6
    working on input 7
    working on input 8
    working on input 9

Ideally, the lines should alternate, but currently all input is generated immediately.

----------
messages: 293353
nosy: Klamann
priority: normal
severity: normal
status: open
title: concurrent.futures.Executor.map() consumes all memory when big generators are used
type: resource usage
versions: Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30323>
_______________________________________


More information about the New-bugs-announce mailing list