On 16 Jan 2010, at 00:56, Anh Hai Trinh wrote:
I'm not sure that I'd agree with the simpler API part though :-)
I was referring to your old API. Still, we are both obviously very biased here :-p
For sure. I'm definitely used to looking at Future-style code so I find the model intuitive.
Does ThreadPool using some sort of balancing strategy if poolsize where set to < len(URLs)?
Yes, of course! Otherwise it wouldn't really qualify as a pool.
"retrieve" seems to take multiple url arguments.
Correct. `retrieve` is simply a generator that retrieve URLs sequentially, the ThreadPool distributes the input stream so that each workers get an iterator over its work load.
That's a neat idea - it saves you the overhead of a function call.
If delicate job control is necessary, an Executor can be used. It is implemented on top of the pool, and offers submit(*items) which returns job ids to be used for cancel() and status(). Jobs can be submitted and canceled concurrently.
What type is each "item" supposed to be?
Whatever your iterator-processing function is supposed to process. The URLs example can be written using an Executor as:
e = Executor(ThreadPool, retrieve) e.submit(*URLs) e.close() print list(e.result)
There are two common scenarios where I have seen Future-like things used: 1. Do the same operation on different data e.g. copy some local files to several remote servers 2. Do several different operations on different data e.g. parallelizing code like this: db = setup_database(host, port) data = parse_big_xml_file(request.body) save_data_in_db(data, db) I'm trying to get a handle on how streams accommodates the second case. With futures, I would write something like this: db_future = executor.submit(setup_database, host, port) data_future = executor.submit(parse_big_xml_file, data) # Maybe do something else here. wait( [db_future, data_future], timeout=10, # If either function raises then we can't complete the operation so # there is no reason to make the user wait. return_when=FIRST_EXCEPTION) db = db_future.result(timeout=0) data = data.result(timeout=0) save_data_in_db(data, db) Cheers, Brian
Can I wait on several items?
Do you mean wait for several particular input values to be completed? As of this moment, yes but rather inefficiently. I have not considered it is a useful feature, especially when taking a wholesale, list-processing view: that a worker pool process its input stream _out_of_order_. If you just want to wait for several particular items, it means you need their outputs _in_order_, so why do you want to use a worker pool in the first place?
However, I'd be happy to implement something like Executor.submit(*items, wait=True).
Cheers, aht _______________________________________________ stdlib-sig mailing list stdlib-sig@python.org http://mail.python.org/mailman/listinfo/stdlib-sig