On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan <brian@sweetapp.com> wrote:

On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:

Where's the current version of the PEP?

http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt

On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian@sweetapp.com> wrote:

On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:

* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include?

I can do that.

I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.

Would it be reasonable to make Future an ABC, make a _Future that subclasses
it for internal usage and let other Executor subclasses define their own
Futures.

What interface are you proposing for the Future ABC? It'll need to
support wait() and as_completed() from non-library Futures. I wouldn't
mind making the type just a duck-type (it probably wouldn't even need
an ABC), although I'd like to give people trying to implement their
own Executors as much help as possible. I'd assumed that giving Future
some public hooks would be easier than fixing the wait() interface,
but I could be wrong.

See below.

* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.

* Do you want to make calling Executor.shutdown(wait=True) from within
the same Executor 1) detect the problem and raise an exception, 2)
deadlock, 3) unspecified behavior, or 4) wait for all other threads
and then let the current one continue?

What about a note saying that using any futures functions or methods from
inside a scheduled call is likely to lead to deadlock unless care is taken?

Jesse pointed out that one of the first things people try to do when
using concurrency libraries is to try to use them inside themselves.
I've also tried to use a futures library that forbade nested use
('cause I wrote it), and it was a real pain.

You can use the API from within Executor-invoked functions - you just have to be careful.

It's the job of the PEP (and later the docs) to explain exactly what care is needed. Or were you asking if I was ok with adding that explanation to the PEP? I think that explanation is the minimum requirement (that's what I meant by "Could you specify in what circumstances a pure computational
Future-based program may deadlock?"), but it would be better if it could never deadlock, which is achievable by stealing work.
It should be easy enough to detect that the caller of
Executor.shutdown is one of the Executor's threads or processes, but I
wouldn't mind making the obviously incorrect "wait for my own
completion" deadlock or throw an exception, and it would make sense to
give Executor implementors their choice of which to do.

* This is a nit, but I think that the parameter names for
ThreadPoolExecutor and ProcessPoolExecutor should be the same so
people can parametrize their code on those constructors. Right now
they're "max_threads" and "max_processes", respectively. I might
suggest "max_workers".

I'm not sure that I like that. In general consolidating the constructors for
executors is not going to be possible.

In general, yes, but in this case they're the same, and we should try
to avoid gratuitous differences.

num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it.

Thanks. 

* I'd like users to be able to write Executors besides the simple
ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable
that, could you document what the subclassing interface for Executor
looks like? that is, what code do user-written Executors need to
include? I don't think it should include direct access to
future._state like ThreadPoolExecutor uses, if at all possible.

One of the difficulties here is:
1. i don't want to commit to the internal implementation of Futures

Yep, that's why to avoid requiring them to have direct access to the
internal variables.

2. it might be hard to make it clear which methods are public to users and
which methods are public to executor implementors

One way to do it would be to create another type for implementors and
pass it to the Future constructor.

If we change the future interface like so:

class Future(object):
  # Existing public methods
  ...
  # For executors only
  def set_result(self):
    ...
  def set_exception(self):
    ...
  def check_cancel_and_notify(self):
    # returns True if the Future was cancelled and
    # notifies anyone who cares i.e. waiters for
    # wait() and as_completed
 
 
Then an executor implementor need only implement:

def submit(self, fn, *args, **kwargs):

With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course.

Thoughts?

Could you write up the submit() implementations you're thinking of? That kind of interface extension seems right.
 
* Could you specify in what circumstances a pure computational
Future-based program may deadlock? (Ideally, that would be "never".)
Your current implementation includes two such deadlocks, for which
I've attached a test.

Thanks for the tests but I wasn't planning on changing this behavior. I
don't really like the idea of using the calling thread to perform the wait
because:
1. not all executors will be able to implement that behavior

Why not?

What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible?

If the executor can't send itself across the network, you're fine since it'll be impossible to create cycles. If the executor can add threads dynamically when it notices that it's not using enough of the CPU, it's also fine since you remove the limited resource. If the executor can be copied and cannot add threads, then it sent the data one way somehow, so it should be able to send the data the other way to execute locally. It _is_ possible to run out of memory or stack space. Is that what you're worried about?

Thread pools can implement it,

Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures?

Yes, AFAIK work stealing suffices for systems made up only of futures and executors.  Non-future blocking objects can reintroduce deadlocks, but I believe futures alone can't.
 

Cheers,
Brian

and process pools make it
impossible to create cycles, so they also can't deadlock.

2. it can only be made to work if no wait time is specified

With a wait time, you have to avoid stealing work, but it's also
guaranteed not to deadlock, so it's fine.