[Python-ideas] micro-threading PEP proposal (long) -- take 2!

Wed Aug 27 20:12:38 CEST 2008

Thank you for your response.  I have written up a python-style pseudo 
code for both the C level code and the Python level code and posted them 
as two separate posts.  I changed the subject somewhat on them, so they 
don't show up on the same thread...  :-(

Here is some specific feedback on your questions.

Antoine Pitrou wrote:
> Probably. However, since the core of your proposal is itself far from trivial, I
> suggest you concentrate on it in this PEP; the higher-level constructs can be
> deferred ( :-)) to another PEP.
>   
I had considered that.  But the core by itself accomplishes nothing, 
except to serve as a foundation for some kind of higher-level 
constructs, so I put them together.  I guess having separate PEPs allows 
them to evolve more independently.  (I'm new to this PEP process).

If I split them, so I keep posting updated versions on python-ideas?  Or 
do I just accumulate the changes offline and post the completed PEP much 
later?
>   
>> #. An addition of non_blocking modes of accessing files, sockets, time.sleep
>>    and other functions that may block.  It is not clear yet exactly what 
>> these
>>    will look like.  The possibilities are:
>>
>>    - Add an argument to the object creation functions to specify blocking or
>>      non-blocking.
>>    - Add an operation to change the blocking mode after the object has been
>>      created.
>>    - Add new non-blocking versions of the methods on the objects that may
>>      block (e.g., read_d/write_d/send_d/recv_d/sleep_d).
>>    - Some combination of these.
>>     
>
> Sounds ok. FWIW, the py3k IO stack is supposed to be ready for non-blocking IO,
> but this possibility is almost completely untested as of yet.
>   
Good to know.  I'll have to look at this.
>   
>> #. Micro_thread objects.  Each of these will have a re-usable C deferred
>>    object attached to it, since each micro_thread can only be suspended
>>    waiting for one thing at a time.  The current micro_thread would be 
>> stored
>>    within a C global variable, much like ``_PyThreadState_Current``.
>>     
>
> By "global", you mean "thread-local", no? That is, there is (at most) one
> currently running micro-thread per OS-level thread.
>   
Yes!
>   
>>    There are three usage scenarios, aided by three different functions to
>>    create micro-threads:
>>     
>
> I suggest you fold those usage scenarios into one simple primitive that launches
> a single micro-thread and provides a way to wait for its result (using a
> CDeferred I suppose?). Higher-level stuff ("start_in_parallel") does not seem
> critical for the usefulness of the PEP.
>   
I have a single micro_thread class with a couple of optional arguments 
that affects it operation, so you may be right.  I have included the 
higher-level stuff ("start_in_parallel") in the Python level pseudo code 
to give everybody a feel of what's involved.
>   
>>       This final scenario uses *micro_pipes* to allow threads to 
>> cooperatively
>>       solve problems (much like unix pipes)::
>>     
>
> What is the added value of "micro pipes" compared to, e.g., a standard Python
> list or deque? Are they non-blocking?
>   
Micro_pipes connect two micro_threads, much like unix pipes join two 
unix processes.  Each thread will suspend if the other thread isn't 
ready.  The micro_pipes use the C_deferreds to suspend the thread and 
allow other threads to run.

So micro_pipes don't store a sequence of values (like lists or deques), 
but pass individual values on from one thread to another.  The 
implementation proposed in the Python level pseudo code only stores one 
value and will block the writer when it tries to write a second value 
before the reader has read the first value.  This buffer size of one 
could be expanded, but I've been working on the premise that this should 
be kept as simple as possible for a first out, and then allowed to grow 
after more experience is gained with it.  I've seen many software 
projects (and I'm guilty of this myself) where they include all kinds of 
stuff that really isn't that useful.  And, once released, these things 
are hard to take back.  So I'm consciously trying to keep the first out 
to a bare minimum of features that can grow later.
>   
>>    - ``close()`` to cause a ``StopIteration`` on the ``__next__`` call.
>>      A ``put`` done after a ``close`` silently terminates the micro_thread
>>      doing the ``put`` (in case the receiving side closes the micro_pipe).
>>     
>
> Silencing this sounds like a bad idea.
>   
Yes, I think "silently" means raising a MicroThreadExit exception in the 
``put`` and then silently ignoring it when it is finally re-raised by 
the top function of the thread (thus, terminating the thread, but 
allowing clean code to run on the way down).
>   
>>    So each micro_thread may have a *stdout* micro_pipe assigned to them and
>>    may also be assigned a *stdin* micro_pipe (some other micro_thread's 
>> stdout
>>    micro_pipe).
>>     
>
> Hmm, is it really necessary? Shouldn't micro-threads just create their own pipes
> when they need them? The stdin/stdout analogy is only meaningful in certain
> types of workloads.
>   
They seem necessary to handle exception situations.  For example, when a 
reader thread on a pipe dies with an exception, how is the write thread 
notified?  What mechanism knows that this pipe was being read by the 
errant thread so that it will never be read from again?  Lacking some 
kind of mechanism like this may mean that the writer thread is suspended 
forever.  And the same applies in reverse if the writer thread dies.  
The reader is left hanging forever.  So the pipes need to be "attached" 
to the threads so that an exception in one thread can also affect other 
interested threads.
>   
>> ``PyDeferred_CDeferred`` is written as a new exception type for use by the
>> C code to defer execution.  This is a subclass of ``NotImplementedError``.
>> Instances are not raised as a normal exception (e.g., with
>> ``PyErr_SetObject``), but by calling ``PyNotifier_Defer`` (described in the
>> Notifier_ section, below).  This registers the ``PyDeferred_CDeferred``
>> associated with the currently running micro_thread as the current error 
>> object,
>>     
>
> I'm not sure I understand this right. Does this mean there is a single,
> pre-constructed CDeferred object for each micro-thread? If yes, then this
> deviates slightly from the Twisted model where many deferreds can be created
> dynamically, chained together etc.
>   
Yes, a single deferred for each micro-thread.  And yes, this differs 
some from the Twisted model.  But, again, this helps to "connect the 
dots" between threads for exception propagation.  I think that it will 
also give slightly better performance because fewer memory allocations 
are required.
>   
>> One peculiar thing about the stored callbacks, is that they're not really a
>> queue.  When the C deferred is first used and has no saved callbacks,
>> the callbacks are saved in straight FIFO manor.  Let's say that four
>> callbacks are saved in this order: ``D'``, ``C'``, ``B'``, ``A'`` (meaning
>> that ``A`` called ``B``, called ``C``, called ``D`` which deferred):
>>     
>
> In this example, can you give the C pseudo-code and the equivalent Twisted
> Python (pseudo-)code?
>   
Do you mean the pseudo code of the deferred implementation, or the 
pseudo code for using the deferreds?
>
> Last point: you should try to get some Twisted guys involved in the writing of
> the PEP if you want it to succeed.
>   
Good suggestion!  I was hoping that some might show up here, but ...  I 
guess I need to go looking for them!

Thanks!

-bruce