[Python-Dev] futures API

Fri Dec 10 18:41:52 CET 2010

Oops. I accidentally replied off-list:

On Dec 10, 2010, at 5:36 AM, Thomas Nagy wrote:

> --- El jue, 9/12/10, Brian Quinlan escribió:
>> On Dec 9, 2010, at 4:26 AM, Thomas Nagy wrote:
>>
>>> I am looking forward to replacing a piece of code (http://code.google.com/p/waf/source/browse/trunk/waflib/Runner.py#86 
>>> )
>> by the futures module which was announced in python 3.2
>> beta. I am a bit stuck with it, so I have a few questions
>> about the futures:
>>>
>>> 1. Is the futures API frozen?
>>
>> Yes.
>>
>>> 2. How hard would it be to return the tasks processed
>> in an output queue to process/consume the results while they
>> are returned? The code does not seem to be very open for
>> monkey patching.
>>
>> You can associate a callback with a submitted future. That
>> callback could add the future to your queue.
>
> Ok, it works. I was thinking the object was cleaned up immediately  
> after it was used.
>
>>> 3. How hard would it be to add new tasks dynamically
>> (after a task is executed) and have the futures object never
>> complete?
>>
>> I'm not sure that I understand your question. You can
>> submit new work to an Executor at until time until it is
>> shutdown and a work item can take as long to complete as you
>> want. If you are contemplating tasks that don't complete
>> then maybe you could be better just scheduling a thread.
>>
>>> 4. Is there a performance evaluation of the futures
>> code (execution overhead) ?
>>
>> No. Scott Dial did make some performance improvements so he
>> might have a handle on its overhead.
>
> Ok.
>
> I have a process running for a long time, and which may use futures  
> of different max_workers count. I think it is not too far-fetched to  
> create a new futures object each time. Yet, the execution becomes  
> slower after each call, for example with http://freehackers.org/~tnagy/futures_test.py 
> :
>
> """
> import concurrent.futures
> from queue import Queue
> import datetime
>
> class counter(object):
>    def __init__(self, fut):
>        self.fut = fut
>
>    def run(self):
>        def look_busy(num, obj):
>            tot = 0
>            for x in range(num):
>                tot += x
>            obj.out_q.put(tot)
>
>        start = datetime.datetime.utcnow()
>        self.count = 0
>        self.out_q = Queue(0)
>        for x in range(1000):
>            self.count += 1
>            self.fut.submit(look_busy, self.count, self)
>
>        while self.count:
>            self.count -= 1
>            self.out_q.get()
>
>        delta = datetime.datetime.utcnow() - start
>        print(delta.total_seconds())
>
> fut = concurrent.futures.ThreadPoolExecutor(max_workers=20)
> for x in range(100):
>    # comment the following line
>    fut = concurrent.futures.ThreadPoolExecutor(max_workers=20)
>    c = counter(fut)
>    c.run()
> """
>
> The runtime grows after each step:
> 0.216451
> 0.225186
> 0.223725
> 0.222274
> 0.230964
> 0.240531
> 0.24137
> 0.252393
> 0.249948
> 0.257153
> ...
>
> Is there a mistake in this piece of code?

There is no mistake that I can see but I suspect that the circular  
references that you are building are causing the ThreadPoolExecutor to  
take a long time to be collected. Try adding:

	c = counter(fut)
	c.run()
+	fut.shutdown()

Even if that fixes your problem, I still don't fully understand these  
numbers because I would expect the runtime to fall after a while as  
ThreadPoolExecutors are collected.

Cheers,
Brian

> Thanks,
> Thomas
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20101210/1b45d817/attachment.html>