[Python-ideas] micro-threading PEP proposal (long) -- take 2!

Sat Aug 30 05:58:35 CEST 2008

On Mon, Aug 25, 2008 at 9:48 AM, Bruce Frederiksen <dangyogi at gmail.com> wrote:
[snip]
> I don't imagine that this PEP represents an /easy/ way to solve this
> problem, but do imagine that it is the /right/ way to solve it.  Other
> similar proposals have been made in past years that looked at easier ways
> out.  These have all been rejected.  But I don't think that there are really
> any easy ways out that are robust solutions, and so I offer this one.  If I
> am wrong, and the reason that the prior proposals were rejected is due to a
> lack of need, rather than a lack of robustness, then this proposal should
> also be rejected.  This might be the case if, for example, all Python
> programs end up being unavoidably CPU bound so that micro-threading would
> provide little real benefit.

It all depends on what you're doing. If you're waiting on a lot of
RPCs to complete and doing light-weight operations to process the
responses, then you're probably fine with micro-threads (unless, of
course, those RPC responses are themselves pretty big and require a
lot of deserialization work, in which case, micro-threads will hurt
more than they help).

> Motivation
> ==========
>
> The popularity of the Twisted project has demonstrated the need for a
> micro-threading alternative to the standard Posix thread_ [#thread-module]_
> and threading_ [#threading-module]_ packages.

It in no way demonstrates that. I would say that popularity of Twisted
indicates that "a micro-threading alternative to the
standard...threading packages" can survive and indeed thrive outside
of the standard library. If you feel that Twisted's popularity does
indeed demonstrate something in this area, please back up that
assertion.

> Micro-threading allows large
> numbers (1000's) of simultaneous connections to Python servers, as well
> as fan-outs to large numbers of downstream connections.
>
> The advantages to the Twisted approach over Posix threads are:
>
> #. much less memory is required per thread
> #. faster thread creation
> #. faster context switching (I'm guessing on this one, is this really true?)

That you don't know is, frankly, not reassuring.

> #. synchronization between threads is easier because there is no preemption,
>  making it much easier to write critical sections of code.
>
> The disadvantages are:
>
> #. the Python developer must write his/her program in an event driven style
> #. the approach can not be used with standard Python code that wasn't
>  written in this event driven style
> #. the approach does not take advantage of multiple processor architectures
> #. since there is no preemption, a long running micro-thread will starve
>  other micro-threads

By long-running, you mean "non-yielding", right? Don't CPU-intensive
operations generally fall into this category? Combined with the first
two disadvantages, this means that a developer using this system has
to vet all libraries they might want to use (and all libraries in
their transitive dependency closure), looking for places that might
destabilize the ability of micro-threads to cooperatively yield. That
sounds like an incredibly error-prone and painstaking waste of
developer time.

> This PEP attempts to retain all of the advantages that Twisted has
> demonstrated,

Please don't assume that everyone reading your PEP is familiar with Twisted.

> and to resolve the first two disadvantages to make the
> advantages accessible to all Python programs, including legacy programs
> not written in the Twisted style.  This should make it very easy for legacy
> programs like WSGI apps, Django and TurboGears to reap the benefits of
> Twisted.

So you say, but I see nothing in this entire PEP (and I'll freely
admit I started skimming it after page five or so) that specifically
references these disadvantages or demonstrates how they're being
solved.

> This PEP does not address the last two disadvantages, and thus also has
> these disadvantages itself.

Starvation is a pretty big disadvantage to simply gloss over.

>  In addition, the current built-in ``iter`` and ``next`` functions would be
>  modified so that they may be called with no arguments.  In this case, they
>  would use the current micro_thread's *stdin* pipe as their argument.

I don't understand this. Please explain in more detail why adding this
new (and unexpected) functionality to iter() and next() is desirable
as opposed to adding new functions/methods.

> C Deferred
> ----------
>
> ``PyDeferred_CDeferred`` is written as a new exception type for use by the
> C code to defer execution.  This is a subclass of ``NotImplementedError``.

Why is this a subclass of NotImplementedError and not a direct
subclass of Exception? This is an odd choice of parent class.

> Instances are not raised as a normal exception (e.g., with
> ``PyErr_SetObject``), but by calling ``PyNotifier_Defer`` (described in the
> Notifier_ section, below).  This registers the ``PyDeferred_CDeferred``
> associated with the currently running micro_thread as the current error
> object,
> but also readies it for its primary job -- deferring execution.  As an
> exception, it creates its own error message, if needed, which is
> "Deferred execution not yet implemented by %s" % c_function_name.

And what happens if I use PyErr_SetObject() instead of this new
function? Is a TypeError raised?

> ``PyErr_ExceptionMatches`` may be used with these.  This allows them to be
> treated as exceptions by non micro-threading aware (*unmodified*) C
> functions.

So it's possible for non-micro-threading aware code to simply swallow
these new exceptions? That seems...unwise.

Collin Winter