Hello
There's a pattern I am doing all the time: filtering out some elements of a
list, and cleaning them in the same move.
For example, if I have a multi line text, where I want to:
- keep non empty lines
- clean non empty lines
I am doing:
>>> text = """
... this is a multi-line text\t
...
... \t\twith
...
... muliple lines."""
>>> [l.strip() for l in text.split('\n') if l.strip() != '']
['this is a multi-line text', 'with', 'muliple lines.']
It is not optimal, because I call strip() twice. I could use ifilter then
imap or even use a real loop, but I
want my simple, concise, list comprehension ! And I couldn't find a simple
way to express it.
The pattern can be generically resumed like this :
[transform(e) for e in seq if some_test(transform(e))]
So what about using the 'as' keyword to extend lists comprehensions, and
to avoid calling transform() twice ?
Could be:
[transform(e) as transformed for e in seq if some_test(transformed)]
In my use case I would simply have to write;:
[l.strip() as stripped for l in text.split('\n') if stripped != '']
Which seems to me clear and concise.
Regards,
Tarek
--
Tarek Ziadé | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/
Hello,
I have written a PEP, and I was warned by none other than Mr. Van Rossum
himself that it would be controversial at best. Well, here goes anyway.
First, let me introduce myself. I am an aerospace engineer, and I am using
Python to develop a research prototype of a conflict alerting aid for air
traffic controllers. It is intended to replace the current legacy system
which has been operational since the 1970s. I have also developed a free
Python package to represent physical scalars in a unique manner that can be
as efficient as built-in numeric types. I am using it extensively in my
conflict alerting aid. You can read about it in the current edition of The
Python Papers at http://pythonpapers.org or on my website at
http://RussP.us/scalar.htm .
Now to the PEP. Let me start by saying that I fully understand the history
and controversy regarding the explicit use of "self" in Python. I am not
going to say that it was a mistake, nor am I going to say that the first
argument of a class instance method should not refer to the instance for
which it is called. What I will say is that I think Python syntax could be
significantly simplified with the simple little convention that I am
proposing. All I ask is that you carefully read my proposal before you
reply. Thank you.
Russ
There are several different blocks of code you could tack onto a loop (I've
deliberately chosen somewhat unusual words to express these here):
for x in items:
# body
interstitially:
# things to do between loop iteration
# (executed after each iteration in the loop when there is a next
value)
subsequently:
# things to do after the last element of the loop is processed
# (when the loop is not exited by break)
contrariwise:
# things to do if the list was empty
For example:
result = ""
for x in items:
result += str(x)
interstitially:
result += ", "
contrariwise:
result = "no data"
When I first learned that Python had an 'else' clause on loops, I assumed it
meant 'contrariwise'. I was surprised that it actually meant 'subsequently'.
To be more clear, contrariwise is essentially equivalent to:
empty = True
for x in items:
empty = False
# body
if empty:
# do contrariwise code
and interstitially is essentially equivalent to:
first = True
for x2 in items:
if not first:
# do interstitial code
first = False
x = x2
# body
I think these are common/useful paradigms. I'm curious what others think.
--- Bruce
Here's one way to test if the loop you just came out of was empty:
for item in range(0):
print item
else:
if "item" not in locals():
print "Empty list."
Obviously, this only works if "item" has not been used in the current
scope before. If anyone knows of a more efficient way to check if a
variable name has been used besides using locals(), let me know.
Another way to do it is if you know that some particular value, such
as None, will never appear in your list-like-object you can set item
to that value in advance:
item = None
for item in range(0):
print item
else:
if item is None:
print "Empty list."
Again, this has the limitation that it only works if you can guaranty
that your guard value won't be used by the loop. The advantage is that
you don't have to create locals() and then do a __contains__ on it,
which presumably takes longer than a simple assignment and identity
check.
-- Carl Johnson
It was suggested that I rearrange the micro-threading PEP proposal to
place the juicy Python stuff up front.
So I've done this here. And now that I see that people are back from
the toils of creating 3.0b3 and starting to comment again on
python-ideas, it seems like a good time to repost this! (I guess my
first post was really bad timing..., sorry about that!)
I ask that you provide feedback. I have no direct need for this, so
don't really have a horse in the race. But it was an idea that I
thought might be very useful to the Python community, seeing the
emphasis on web servers, so am making an effort here to run it up the
flagpole... If this ends up going into a trial version, I am prepared
to help considerably with the implementation. If you don't think that
Python needs such silliness, that's OK and I'd like to hear that too (it
will mean a lot less work for me! ;-) ).
I don't imagine that this PEP represents an /easy/ way to solve this
problem, but do imagine that it is the /right/ way to solve it. Other
similar proposals have been made in past years that looked at easier
ways out. These have all been rejected. But I don't think that there
are really any easy ways out that are robust solutions, and so I offer
this one. If I am wrong, and the reason that the prior proposals were
rejected is due to a lack of need, rather than a lack of robustness,
then this proposal should also be rejected. This might be the case if,
for example, all Python programs end up being unavoidably CPU bound so
that micro-threading would provide little real benefit.
If there /is/ a perceived need for this, then I am sure that this PEP
would benefit from your TLC and other ideas!
If you read the previous version, the only changes here are a little
more specificity in the Python section.
Thank you for your attention on this!
-bruce
Abstract
========
This PEP adds micro-threading (or `green threads`_) at the C level so that
micro-threading is built in and can be used with very little coding effort
at the Python level.
The implementation is quite similar to the Twisted_ [#twisted-fn]_
Deferred_/Reactor_ model, but applied at the C level by extending the
`C API`_ [#c_api]_ slightly. Doing this provides the Twisted
capabilities to Python, but without requiring the Python programmer to code
in the Twisted event driven style. Thus, legacy Python code would gain the
benefits that Twisted provides with very little modification.
Burying the event driven mechanism in the C level also makes the same
benefits available to Python GUI interface tools so that the Python
programmers don't have to deal with event driven programming there either.
This capability is also used to provide some of the features that
`Stackless Python`_ [#stackless]_ provides, such as microthreads and
channels (here, called micro_pipes).
.. _Twisted: http://twistedmatrix.com/trac/
.. _Deferred:
http://twistedmatrix.com/projects/core/documentation/howto/defer.html
.. _Reactor:
http://twistedmatrix.com/projects/core/documentation/howto/reactor-basics.h…
.. _C API: http://docs.python.org/api/api.html
.. _green threads: http://en.wikipedia.org/wiki/Green_threads
Motivation
==========
The popularity of the Twisted project has demonstrated the need for a
micro-threading alternative to the standard Posix thread_ [#thread-module]_
and threading_ [#threading-module]_ packages. Micro-threading allows large
numbers (1000's) of simultaneous connections to Python servers, as well
as fan-outs to large numbers of downstream connections.
The advantages to the Twisted approach over Posix threads are:
#. much less memory is required per thread
#. faster thread creation
#. faster context switching (I'm guessing on this one, is this really true?)
#. synchronization between threads is easier because there is no preemption,
making it much easier to write critical sections of code.
The disadvantages are:
#. the Python developer must write his/her program in an event driven style
#. the approach can not be used with standard Python code that wasn't
written in this event driven style
#. the approach does not take advantage of multiple processor architectures
#. since there is no preemption, a long running micro-thread will starve
other micro-threads
This PEP attempts to retain all of the advantages that Twisted has
demonstrated, and to resolve the first two disadvantages to make the
advantages accessible to all Python programs, including legacy programs
not written in the Twisted style. This should make it very easy for legacy
programs like WSGI apps, Django and TurboGears to reap the benefits of
Twisted.
Another example of event driven mechanisms are the GUI/windows events. This
PEP also makes it easy for Python GUI interface toolkits (like wxpython
and qtpython) to hide the GUI/windows event driven style of programming from
the Python programmer. For example, you would no longer need to use modal
dialog boxes just to make the programming easier.
This PEP does not address the last two disadvantages, and thus also has
these disadvantages itself.
The primary inspiration for this PEP comes from the Twisted_ [#twisted-fn]_
project.
If the C level deals with the Deferred objects, then the Python level
wouldn't
have to. And if that is the case, this would greatly lower the bar to
Python
programmers desiring the benefits that Twisted provides and make those
benefits available to all Python programmers essentially for free.
The secondary inspiration was to treat the Deferreds as a special case of
exceptions, which are already designed to unwind the C stack. This lets us
take a more piecemeal approach to implementing the PEP at the C level
because
an unmodified C function used in a situation where its execution would have
to be deferred is gracefully caught as a standard exception. In addition,
this exception can report the name of the unmodified C function in its
message. So we don't need to change *everything* that might be affected on
a first roll out.
It also adds deferred processing without adding additional checks after each
C function call to see whether to defer execution. The check that is
already
being done for exceptions doubles as a check for deferred processing.
Finally, once Python has this deferred mechanism in place at the C level,
many things become quite easy at the Python level. This includes full
micro-threading, micro-pipes between micro-threads, new-style generators
that
can delegate responsibility for generating values to called functions
without
having to intervene between their caller and the called function, parallel
execution constructs (``parallel_map``).
It is expected that many more of these kind of devices will be easily
implementable once the underlying deferred mechanism in place.
Specification of Python Layer Enhancements
==========================================
Fortunately, at the Python level, the programmer does not see the underlying
`C deferred`_, `reactor function`_, or notifier_ objects. The Python
programmer will see three things:
#. An addition of non_blocking modes of accessing files, sockets, time.sleep
and other functions that may block. It is not clear yet exactly what
these
will look like. The possibilities are:
- Add an argument to the object creation functions to specify blocking or
non-blocking.
- Add an operation to change the blocking mode after the object has been
created.
- Add new non-blocking versions of the methods on the objects that may
block (e.g., read_d/write_d/send_d/recv_d/sleep_d).
- Some combination of these.
If an object is used in blocking mode, then all micro-threads (within its
Posix thread_) will block. So the Python programmer must set
non-blocking
mode on these objects as a first step towards taking advantage of
micro-threading.
It may also be useful to add a locking capability to files and sockets so
that code (like traceback.print_exception) that outputs several lines can
prevent other output from being intermingled with it.
#. Micro_thread objects. Each of these will have a re-usable C deferred
object attached to it, since each micro_thread can only be suspended
waiting for one thing at a time. The current micro_thread would be
stored
within a C global variable, much like ``_PyThreadState_Current``. If the
Python programmer isn't interested in micro_threading, micro_threads
can be
safely ignored (like posix threads, you get one for free, but don't
have to
be aware of it). If the programmer *is* interested in micro-threading,
then s/he must create additional micro_threads. Each micro-thread
would be
created to run a single Python function. When that function returns, the
micro-thread is finished.
There are three usage scenarios, aided by three different functions to
create micro-threads:
#. Create a micro-thread to do something, without regard to the final
value returned from *function*. An example here would be a web server
that has a top-level ``socket.accept`` loop that runs a
``handle_client`` function in a separate micro_thread on each new
connection. Once launched, the ``socket.accept`` thread is no longer
interested in the ``handle_client`` threads.
In this case, the normal return value of the ``handle_client``
function
can be discarded. But what should be done with exceptions that
are not
caught in the child threads?
Therefore, this style of use would allow a top-level exception handler
for the new thread::
start_and_forget(function, *args,
exception_handler=traceback.print_exception,
**kws)
The parent thread does not need to do any kind of *wait* after the
child
thread is complete. It will either complete normally and go away
silently (with any final return value ignored), or raise an uncaught
exception, which is passed to the indicated exception_handler, and
then
go away without further ado.
#. Create micro_threads to run multiple long-running *functions* in
parallel where the final return value from each *function* is
needed by
the parent thread::
thread = start_in_parallel(function, *args, **kws)
In this case, the parent thread is expected to do a *thread.wait()*
when it is ready for the return value of the function. Thus,
completed
micro_threads will form zombie threads until their parents retrieve
their final return values (much like unix processes).
On doing the *wait*, an uncaught exception in the child
micro_thread is
re-raised in the parent micro_thread.
It might be nice, for example, to have a ``parallel_map`` function
that
will create a micro_thread for each element of its *iterable* argument
in order to run the mapping function on all of them in parallel
and then
return an iterable of the waited for results.
#. In the above examples, the child micro_threads are completely
independent of each other -- i.e., they don't communicate with each
other except for child threads returning a final value to their
parents.
This final scenario uses *micro_pipes* to allow threads to
cooperatively
solve problems (much like unix pipes)::
pipe = generate(function, *args, **kws)
These micro_threads have a micro_pipe associated with them (called
*stdout*). When a micro_thread is finished it goes away silently (and
the final return value from the *function* is ignored).
The pipe looks like a normal Python iterator, but is designed to
be read
by a different micro-thread than the one generating the values.
Uncaught exceptions in the micro_thread generating the values are
propagated through the micro_pipe to the micro_pipe's reader.
#. Micro_pipes. Micro_pipes are one-way pipes that allow synchronized
communication between micro_threads.
The protocol for the receiving side of the pipe is simply the standard
Python iterator protocol. Thus, for example, they can be directly used
in ``for`` statements.
The sending side has these methods:
- ``put(object)`` to send *object* to the receiving side (retrieved with
the ``__next__`` method).
- ``take_from(iterable)`` to send a series of objects to the receiving
side.
- ``close()`` to cause a ``StopIteration`` on the ``__next__`` call.
A ``put`` done after a ``close`` silently terminates the micro_thread
doing the ``put`` (in case the receiving side closes the micro_pipe).
Micro_pipes are automatically associated with micro_threads, making
it less
likely to hang the program:
>>> pipe = micro_pipe()
>>> next(pipe) # hangs the program! No micro_thread created to feed
pipe...
So each micro_thread may have a *stdout* micro_pipe assigned to them and
may also be assigned a *stdin* micro_pipe (some other micro_thread's
stdout
micro_pipe). When the micro_thread terminates, it automatically calls
``close`` on its stdin and stdout micro_pipes.
To easily access the stdout micro_pipe of the current micro_thread, new
``put`` and ``take_from`` built-in functions are provided::
put(object)
take_from(iterable)
In addition, the current built-in ``iter`` and ``next`` functions
would be
modified so that they may be called with no arguments. In this case,
they
would use the current micro_thread's *stdin* pipe as their argument.
Micro_pipes let us write generator functions in a new way by having the
generator do ``put(object)`` rather than ``yield object``. In this case,
the generator function has no ``yield`` statement, so is not treated
specially by the compiler. Basically this means that calling a new-style
generator does not automatically create a new micro_thread (sort of what
calling an old-style generator does).
The ``put(object)`` does the same thing as ``yield object``, but
allows the generator to share the micro_pipe with other new-style
generator functions (by simply calling them) and old-style generators (or
any iterable) by calling ``take_from`` on them. This lets the generator
delegate to other generators without having to get involved with passing
the results back to its caller.
For example, a generator to output all the odd numbers from 1-n::
def odd(n):
take_from(range(1, n, 2))
These "new-style" generators would have to be run in their own
micro_thread:
>>> pipe = generate(odd, 100)
>>> # now pipe is an iterable representing the generator:
>>> print tuple(pipe)
The generator is then not restricted to having its own micro_thread. It
could also be used as a helper by other generators from the other
generator's micro_thread without having to create additional
micro-threads
or doing "bucket brigades" to yield values from the helper back to the
other generator's caller. For example::
def even(n):
take_from(range(2, n, 2))
def odd_even(n):
odd(n)
even(n)
At this point ``generate`` could be called on any of these three
generators
(``odd``, ``even`` or ``odd_even``).
Specification of C Layer Enhancements
=====================================
This is where most of the work is to implement this PEP. These are the
underlying mechanisms that make the whole thing "tick".
Basically, this is a C Deferred that micro-thread aware C functions deal
with
to be put to sleep and avoid blocking; and a Reactor to wake the Deferreds
back up when the event occurs that they are waiting for. This is very
similar
in concept to the Twisted Deferred and Reactor, just done at the C level so
that Python programmers don't have to deal with them.
C Deferred
----------
``PyDeferred_CDeferred`` is written as a new exception type for use by the
C code to defer execution. This is a subclass of ``NotImplementedError``.
Instances are not raised as a normal exception (e.g., with
``PyErr_SetObject``), but by calling ``PyNotifier_Defer`` (described in the
Notifier_ section, below). This registers the ``PyDeferred_CDeferred``
associated with the currently running micro_thread as the current error
object,
but also readies it for its primary job -- deferring execution. As an
exception, it creates its own error message, if needed, which is
"Deferred execution not yet implemented by %s" % c_function_name.
``PyErr_ExceptionMatches`` may be used with these. This allows them to be
treated as exceptions by non micro-threading aware (*unmodified*) C
functions.
But these C deferred objects serve as special indicators that are treated
differently than normal exceptions by micro-threading aware (*modified*)
C code. Modified C functions do this by calling ``PyDeferred_AddCallback``,
or explicitly checking ``PyErr_ExceptionMatches(PyDeferred_CDeferred)``
after
receiving an error return status from a called function.
``PyDeferred_CDeferred`` instances offer the following methods (in addition
to the normal exception methods):
- ``int PyDeferred_AddCallbackEx(PyObject *deferred, const char
*caller_name,
const char *called_name, PyObject *(*callback_fn)(PyObject
*returned_object,
void *state), void *state)``
- The *caller_name* and *called_name* are case sensitive. The
*called_name*
must match exactly the *caller_name* used by the called function when it
dealt with this *deferred*. If the names are different, the *deferred*
knows that an intervening unmodified C function was called. This is
what
triggers it to then act like an exception.
The *called_name* must be ``NULL`` when called by the function that
executed the ``PyNotifier_Defer`` to initiate the deferring process.
- The *callback_fn* will be called with the ``PyObject`` of the results of
the prior registered callback_fn. An exception is passed to
*callback_fn* by setting the exception and passing ``NULL`` (just like
returning an exception from a C function). In the case that the
*deferred* initially accepts some *callback_fns* after a
``PyNotifier_Defer`` is done, and then later has to reject them (because
of encountering the exception case, above), it will pass itself again,
now acting like an exception, to all of these new callback_fns to allow
them to clean up. It then returns 0 to continue to be treated as an
exception (see the explanation for ``PyDeferred_Callback``, below).
- The *callback_fn* is always guaranteed to be called exactly once at some
point in the future. It will be passed the same *state* value as was
passed with it to ``PyDeferred_AddCallback``. It is up to the
*callback_fn* to deal with the memory management of this *state* object.
- The *callback_fn* may be ``NULL`` if no callback is required. But in
this case ``PyDeferred_AddCallback`` must still be called to notify the
*deferred* that the C function is micro-threading aware.
- This returns 0 if it fails (is acting like an exception), 1 otherwise.
If it fails, the caller should do any needed clean up because the caller
won't be resumed by the *deferred* (i.e., *callback_fn* will not be
called).
- ``int PyDeferred_AddCallback(const char *caller_name, const char
*called_name,
PyObject *(*callback_fn)(PyObject *returned_object, void *state),
void *state)``
- Same as ``PyDeferred_AddCallbackEx``, except that the deferred object is
taken from the *value* object returned by ``PyErr_Fetch``. If the
*type*
returned by ``PyErr_Fetch`` is not ``PyDeferred_CDeferred``, 0 is
returned.
Thus, this function can be called after any exception and then other
standard exception processing done if 0 is returned (including checking
for other kinds of exceptions).
- ``int PyDeferred_IsExceptionEx(PyObject *deferred)``
- Returns 1 if *deferred* is in exception mode, 0 otherwise.
- ``int PyDeferred_IsException(void)``
- Same as ``PyDeferred_IsExceptionEx``, except that the deferred object is
taken from the *value* object returned by ``PyErr_Fetch``. If the
*type*
returned by ``PyErr_Fetch`` is not ``PyDeferred_CDeferred``, 1 is
returned.
Thus, this function can be called after any exception and then other
standard exception processing done if 1 is returned (including checking
for other kinds of exceptions).
- ``int PyDeferred_Callback(PyObject *deferred, PyObject *returned_object)``
- This is called by the `reactor function`_ to resume execution of a
micro_thread after the *deferred* has been scheduled with
``PyReactor_Schedule`` or ``PyReactor_ScheduleException``.
- This calls the callback_fn sequence stored in *deferred* passing
*returned_object* to the first registered callback_fn, and each
callback_fn's returned ``PyObject`` to the next registered callback_fn.
- To signal an exception to the callbacks, first set the error indicator
(e.g. with ``PyErr_SetString``) and then call ``PyDeferred_Callback``
passing ``NULL`` as the *returned_object* (just like returning ``NULL``
from a C function to signal an exception).
- If a callback_fn wants to defer execution, this same *deferred* object
will be used by ``PyNotifier_Defer`` (since the callback_fn is
running in
the same micro_thread). The *deferred* keeps the newly added
callback_fns
in the proper sequence relative the existing callback_fns that have not
yet been executed (described below). When *deferred* is returned from a
callback_fn, no further callback_fns are called.
Note that this check is also done on the starting *returned_object*, so
that if this *deferred* exception is passed in, then none of its
callback_fns are executed and it simply returns.
- If a callback_fn defers, a final check is done to see if its name
was the
last one registered by a ``PyDeferred_AddCallback`` call. If not,
and if
this *deferred* has not already been set into exception mode, the
*deferred* sets itself into exception mode and raises itself through the
entire callback_fn sequence. This should end up terminating the
micro_thread.
- If a callback_fn starts to defer (by calling ``PyNotifier_Defer``)
and then
later raises some other exception, the *deferred* will know that
it's been
activated but not returned as the final error object by the callback_fn.
In this case, the *deferred* raises a ``SystemError`` attaching the
other
exception to it as its ``__cause__`` and runs this through all new
callback_fns that were added subsequent to the
``PyNotifier_Defer``. The
``SystemError`` exception is then cleared and the other exception
reestablished (it will have the *deferred* as its ``__context__``). The
other exception is then passed to the remaining callback_fns to
terminate
the micro_thread.
- If no callback_fn defers, then the micro_thread is finished executing.
The results of the last callback_fn are treated as the final result
of the
micro_thread. If the micro_thread has an ``exception_handler``, the
``exception_handler`` is used on the final exception (if there is
one) and
the micro_thread is deleted.
If the micro_thread has no ``exception_handler``, the final return value
(or exception) is stored in the micro_thread and the micro_thread is
converted into a zombie state. This will also result in a ``close``
being done on the micro_thread's stdout micro_pipe.
- Returns 0 on error, 1 otherwise. Note that an error from the final
callback_fn does not cause a 0 to be returned here. Only if
``PyDeferred_Callback`` itself has a problem that it can't deal with is
0 returned.
Each micro_thread has its own C deferred object associated with it. This is
possible because each micro_thread may only be suspended for one thing at a
time. This also allows us to re-use C deferreds and, through the following
trick, means that we don't need a lot of C deferred instances when a
micro_thread is deferred many times at different points in the call stack.
One peculiar thing about the stored callbacks, is that they're not really a
queue. When the C deferred is first used and has no saved callbacks,
the callbacks are saved in straight FIFO manor. Let's say that four
callbacks are saved in this order: ``D'``, ``C'``, ``B'``, ``A'`` (meaning
that ``A`` called ``B``, called ``C``, called ``D`` which deferred):
- after ``D'`` is added, the queue looks like: ``D'``
- after ``C'`` is added, the queue looks like: ``D'``, ``C'``
- after ``B'`` is added, the queue looks like: ``D'``, ``C'``, ``B'``
- after ``A'`` is added, the queue looks like: ``D'``, ``C'``, ``B'``,
``A'``
Upon resumption, ``D'`` is called, then ``C'`` is called. ``C'`` then calls
``E`` which calls ``F`` which now wants to defer execution again.
``B'`` and
``A'`` are still in the deferred's callback queue. When ``F'``, then ``E'``
then ``C''`` are pushed, they go in front of the callbacks still present
from the last defer:
- after ``F'`` is added, the queue looks like: ``F'``, ``B'``, ``A'``
- after ``E'`` is added, the queue looks like: ``F'``, ``E'``, ``B'``,
``A'``
- after ``C''`` is added, the queue looks like: ``F'``, ``E'``, ``C''``,
``B'``, ``A'``
These callback functions are basically a reflection of the C stack at the
point the micro_thread is deferred.
Reactor Design
--------------
The Reactor design is divided into two levels:
- The top level `reactor function`_. There is only one long running
invocation of this function per standard Posix thread_.
- A list of Notifiers_. Each of these knows how to check for a different
type of external event, such as a file being ready for IO, a signal
having been received, or a GUI/windows event.
.. _Notifiers: Notifier_
Reactor Function
''''''''''''''''
There is a reactor function instance for each Posix thread. All instances
share the same set of ``NotifierList``, ``TimedWaitSeconds`` and
``EventCheckingThreshold`` parameters.
The master ``NotifierList`` is a list of classes that are instantiated when
the reactor function is created. This list is maintained in descending
``PyNotifier_Priority`` order.
The reactor function pops (deferred, returned_object) pairs, doing
``PyDeferred_Callback`` on each, until either the ``EventCheckingThreshold``
number of deferreds have been popped, or there are no more deferreds
scheduled.
It then runs its copy of the ``NotifierList`` to give each notifier_ a
chance
to poll for its events. If there are then still no deferreds scheduled, it
goes to each notifier in turn asking it to do a ``PyNotifier_TimedWait`` for
``TimedWaitSeconds`` until one returns 1. Then it polls the
remaining notifiers again and goes back to running scheduled deferreds.
If there is only one notifier, a ``PyNotifier_WaitForever`` is used, rather
than first polling with ``PyNotifier_Poll`` and then
``PyNotifier_TimedWait``.
If all but one notifier returns -1 on the initial poll pass (such that only
one notifier has any deferreds), a ``PyNotifier_WaitForever`` is used on
that
notifier on the second pass rather than ``PyNotifier_TimedWait``.
If all notifiers return -1 on the initial poll pass and there are no
deferreds
scheduled, the reactor function is done and returns to terminate its Posix
thread.
The reactor function also manages a list of timers for the notifiers. It
calls ``PyNotifier_Timeout`` each time a timer pops.
The following functions use the reactor function for the current Posix
thread.
- ``int PyReactor_Schedule(PyObject *deferred, PyObject *returned_object)``
- Returns 0 on error, 1 otherwise.
- ``int PyReactor_ScheduleException(PyObject *deferred,
PyObject *exc_type, PyObject *exc_value, PyObject *exc_traceback)``
- Returns 0 on error, 1 otherwise.
- ``int PyReactor_Run(void)``
- At least one ``PyReactor_Schedule`` must be done first, or
``PyReactor_Run`` will return immediately.
- This only returns when there is nothing left to do.
- Returns 0 on error, 1 otherwise.
- ``int PyReactor_SetTimer(PyObject *notifier, PyObject *deferred,
double seconds)``
- Returns 0 on error, 1 otherwise.
- ``int PyReactor_ClearTimer(PyObject *notifier, PyObject *deferred)``
- Returns 0 on error, 1 otherwise.
These functions apply globally to all reactor functions (all Posix threads):
- ``int PyReactor_AddNotifier(PyObject *notifier_class)``
- The *notifier_class* is added to the NotifierList in proper priority
order.
- The same NotifierList is used by all reactor functions (all Posix
threads).
- Returns 0 on error, 1 otherwise.
- ``int PyReactor_RemoveNotifier(PyObject *notifier_class)``
- The *notifier_class* is removed from the NotifierList.
- Returns 0 on error, 1 otherwise. It is an error if the *notifier_class*
was not in the NotifierList.
- ``int PyReactor_SetEventCheckingThreshold(long num_continues)``
- Returns 0 on error, 1 otherwise.
- ``int PyReactor_SetTimedWaitSeconds(double seconds)``
- Returns 0 on error, 1 otherwise.
Notifier
''''''''
Each notifier knows how to check for a different kind of event. The
notifiers
must release the GIL lock prior to suspending the Posix thread.
- ``int PyNotifier_Priority(PyObject *notifier_class)``
- Returns the priority of this *notifier_class* (-1 for error). Higher
numbers have higher priorities.
- ``int PyNotifier_RegisterDeferred(PyObject *notifier, PyObject *deferred,
PyObject *wait_reason, double max_wait_seconds)``
- *Max_wait_seconds* of 0.0 means no time limit. Otherwise, register
*deferred* with ``PyReactor_SetTimer`` (above).
- Adds *deferred* to the list of waiting objects, for *wait_reason*.
- The meaning of *wait_reason* is determined by the notifier. It can be
used, for example, to indicate whether to wait for input or output on a
file.
- Returns 0 on error, 1 otherwise.
- ``void PyNotifier_Defer(PyObject *notifier, PyObject *wait_reason,
double max_wait_seconds)``
- Passes the deferred of the current micro_thread to
``PyNotifier_RegisterDeferred``, and then raises the deferred as an
exception. *Wait_reason* and *max_wait_seconds* are passed on to
``PyNotifier_RegisterDeferred``.
- This function has no return value. It always generates an exception.
- ``int PyNotifier_Poll(PyObject *notifier)``
- Poll for events and schedule the appropriate ``PyDeferred_CDeferreds``.
Do not cause the process to be put to sleep. Return -1 if no deferreds
are waiting for this events, 0 on error, 1 on success (whether or
not any
events were discovered).
- ``int PyNotifier_TimedWait(PyObject *notifier, double seconds)``
- Wait for events and schedule the appropriate deferreds. Do not
cause the
Posix thread to be put to sleep for more than the indicated number
of *seconds*. Return -2 if *notifier* is not capable of doing timed
sleeps, -1 if no deferreds are waiting for events, 0 on error, 1 on
success (whether or not any events were discovered). Return a 1 if
the wait was terminated due to the process having received a signal.
- If *notifier* is not capable of doing timed waits, it should still do a
poll and should still return -1 if no deferreds are waiting for events.
- ``int PyNotifier_WaitForever(PyObject *notifier)``
- Suspend the process until an event occurs and schedule the appropriate
deferreds. The process may be put to sleep indefinitely.
Return -1 if no deferreds are waiting for events, 0 on error, 1 on
success
(whether or not any ``PyDeferred_CDeferreds`` were scheduled).
Return a 1 if the wait was terminated due to the process having received
a signal.
- ``int PyNotifier_Timeout(PyObject *notifier, PyObject *deferred)``
- Called by `reactor function`_ when the timer set by
``PyReactor_SetTimer``
expires.
- Deregisters *deferred*.
- Passes a ``TimeoutException`` to *deferred* using
``PyDeferred_Callback``.
- Return 0 on error, 1 otherwise.
- ``int PyNotifier_DeregisterDeferred(PyObject *notifier, PyObject
*deferred,
PyObject *returned_object)``
- Deregisters *deferred*.
- Passes *returned_object* to *deferred* using ``PyDeferred_Callback``.
- *Returned_object* may be ``NULL`` to indicate an exception to the
callbacks.
- Returns 0 on error, 1 otherwise.
Open Questions
==============
#. How are tracebacks handled?
#. Do we:
#. Treat each Python-to-Python call as a separate C call, with it's own
callback_fn?
#. Only register one callback_fn for each continuous string of
Python-to-Python calls and then process them iteratively rather than
recursively in the callback_fn (but not in the original calls)? or
#. Treat Python-to-Python calls iteratively both in the original calls
and in the callback_fn?
#. How is process termination handled?
- I guess we can keep a list of micro_threads and terminate each of them.
There's a question of whether to allow the micro_threads to complete or
to abort them mid-stream. Kind of like a unix shutdown. Maybe two
kinds
of process termination?
#. How does this impact the debugger/profiler/sys.settrace?
#. Should functions (C and Python) that may defer be indicated with some
naming convention (e.g., ends in '_d') to make it easier for programmers
to avoid them within their critical sections of code (in terms of
synchronization)?
#. Do we really need to expose micro_pipes to the Python programmer as
anything more than iterables, or can we just use the built-in ``put`` and
``take_from`` functions?
Rationale
=========
Impact on Other Python Implementations
--------------------------------------
The heart of this approach, the C deferred, reactor function and notifiers,
are not exposed to the Python level. This leaves their implementation open
so that other implementations of Python (e.g., Jython_ [#jython-project]_,
IronPython_ [#ironpy]_ and PyPy_ [#pypy_project]_) are not constrained by
the choices made for CPython.
Also, the interfaces to the new Python-level objects (micro_threads,
micro_pipes) are kept to a minimum thus hiding design decisions made within
the underlying implementation so as not to unduly constrain other Python
implementations that wish to support compatible features.
Other Approaches
----------------
Here's a brief comparison to other approaches to micro-threading in Python:
- `Stackless Python`_ [#stackless]_
- As near as I can tell, stackless went through two incarnations:
#. The first incarnation involved an implementation of Frame
continuations
which were then used to provide the rest of the stackless
functionality.
- A new ``Py_UnwindToken`` was created to unwind the stack. This is
similar to the new ``PyDeferred_CDeferred`` proposed in this PEP,
except that ``Py_UnwindToken`` is treated as a special case of a
normal ``PyObject`` return value, while the
``PyDeferred_CDeferred``
is treated as a special case of a normal exception.
It's not clear whether C functions are exposed to this special
value.
So either C functions can't be unwound, or unmodified C
functions may
behave strangely. There is mention of trouble if a C function
calls
a Python function. I also saw no mention of being able to defer
execution rather than block the whole program.
This PEP treats the requests to defer as special exceptions, which
are already designed to unwind the C stack.
- Another difference between the two styles of continuations is that
the stackless continuation is designed to be able to be continued
multiple times. In other words, you can continue the execution of
the program from the point the continuation was made as many times
as you wish, passing different seed values each time.
The ``PyDeferred_CDeferred`` described in this PEP (like the
Twisted
Deferred) is designed to be continued only once.
- The stackless approach provides a Python-level continuation
mechanism (at the Frame level) that only makes Python functions
continuable. It provides no way for C functions to register
continuations so that C functions can be unwound from the stack
and later continued (other than those related to the byte code
interpreter).
In contrast, this PEP proposes a C-level continuation mechanism
very similar to the Twisted Deferred. Each C function registers a
callback to be run when the deferred is continued. From this
perspective, the byte code interpreter is just another C function.
#. The second incarnation involved a way of hacking the underlying C
stack to copy it and later restore it as a means of continuing the
execution.
- This doesn't appear to be portable to different CPU/C Compiler
configurations.
- This doesn't deal with other global state (global/static variables,
file pointers, etc) that may also be used by this saved stack.
- In contrast, this PEP uses a single C stack and makes no
assumptions
about the underlying C stack implementation. It is completely
portable to any CPU/C compiler configuration.
- `py.magic.greenlet: Lightweight concurrent programming`_ [#greenlets]_
This takes its implementation from the second incarnation of stackless and
copies the C stack for re-use. It has the same portability questions that
the second generation of stackless does.
It does not include a reactor component, though one could be written
for it.
- `Implementing "weightless threads" with Python generators`_ [#weightless]_
- This requires you code each thread as generators. The generator
executes a ``yield`` to relinquish control.
- It's not clear how this scales. It seems that to pause in a lower
Python function, it and all intermediate functions must be generators.
- python-safethread_ [#safethread]_
- This is an alternate implementation to thread_ that adds monitors to
mutable types, deadlock detection, improves exception propagation
across threads and program finalization, and removes the GIL lock. As
such, it is not a "micro" threading approach, though by removing the GIL
lock it may be able to better utilize multiple processor configurations
than the approach proposed in this PEP.
- `Sandboxed Threads in Python`_ [#sandboxed-threads]_
- Another alternate implementation to thread_, this one only shares
immutable objects between threads, modifying the referencing counting
system to avoid synchronization issues with the reference count for
shared objects. Again, not a "micro" threading approach, but perhaps
also better with multiple processors.
.. _Jython: http://www.jython.org/Project/
.. _IronPython:
http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython
.. _PyPy: http://codespeak.net/pypy/dist/pypy/doc/home.html
.. _Implementing "weightless threads" with Python generators:
http://www.ibm.com/developerworks/library/l-pythrd.html
.. _python-safethread: https://launchpad.net/python-safethread
.. _Sandboxed Threads in Python:
http://mail.python.org/pipermail/python-dev/2005-October/057082.html
.. _Stackless Python: http://www.stackless.com/
.. _thread: http://docs.python.org/lib/module-thread.html
.. _threading: http://docs.python.org/lib/module-threading.html
.. _`py.magic.greenlet: Lightweight concurrent programming`:
http://codespeak.net/py/dist/greenlet.html
Backwards Compatibility
=======================
This PEP doesn't break any existing code. Existing code just won't take
advantage of any of the new features.
But there are two possible problem areas:
#. Python code uses micro-threading, but then causes an unmodified C
function
to call a modified C function which tries to defer execution.
In this case an exception will be generated stating that this C function
needs to be converted before the program will work.
#. Python code originally written in a single threaded environment is
now used
in a micro-threaded environment. The old code was not written taking
synchronization issues into account, which may cause problems if the old
code calls a function which defers in the middle of a critical section.
This could cause very strange behavior, but can't result in any C-level
errors (e.g., segmentation violation).
This old code would have to be fixed to run with the new features. I
expect that this will not be a frequent problem as these
interruptions can
only occur at a few places (where functions that defer are called).
References
==========
.. [#twisted-fn] Twisted, Twisted Matrix Labs
(http://twistedmatrix.com/trac/)
.. [#c_api] Python/C API Reference Manual, Rossum
(http://docs.python.org/api/api.html)
.. [#stackless] Stackless Python, Tismer
(http://www.stackless.com/)
.. [#thread-module] thread -- Multiple threads of control
(http://docs.python.org/lib/module-thread.html)
.. [#threading-module] threading -- Higher-level threading interface
(http://docs.python.org/lib/module-threading.html)
.. [#jython-project] The Jython Project
(http://www.jython.org/Project/)
.. [#ironpy] IronPython
(http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython)
.. [#pypy_project] PyPy[home]
(http://codespeak.net/pypy/dist/pypy/doc/home.html)
.. [#greenlets] py.magic.greenlet: Lightweight concurrent programming
(http://codespeak.net/py/dist/greenlet.html)
.. [#weightless] Charming Python: Implementing "weightless threads" with
Python generators, Mertz
(http://www.ibm.com/developerworks/library/l-pythrd.html)
.. [#safethread] Threading extensions to the Python Language,
(https://launchpad.net/python-safethread)
.. [#sandboxed-threads] Sandboxed Threads in Python, Olsen
(http://mail.python.org/pipermail/python-dev/2005-October/057082.html)
Copyright
=========
This document has been placed in the public domain.
Hi,
It happens that I'm just interested in whether an expression raises an
exception, not the return value. This might look something like
try:
check_status()
except IOError:
cleanup()
which could be written more simply as
if check_status() raises IOError:
cleanup()
Also, instead of the inconvenient
self.assertRaises(ZeroDivisionError, lambda: 1/0)
one could just write
assert 1/0 raises ZeroDivisionError
Something like this would especially be useful for those of us who
aren't fans of the unittest framework. Alternatively, just the
assert--raises form could be permitted.
Thoughts?
Fredrik
Here is the Python level pseudo code (for the micro_threads themselves
and the micro_pipes).
-bruce
On Mon, Aug 25, 2008 at 12:48 PM, Bruce Frederiksen <dangyogi(a)gmail.com> wrote:
> [...]
I've written up the C level code for this PEP in a python pseudo code
to hopefully make it more clear. I have left the explanations out to
keep this shorter. Refer to the PEP for the explanations.
I will also post the python level code for the PEP as a separate post.
-bruce
------------- cut here -------------
On Sun, Aug 24, 2008 at 9:39 PM, Brett Cannon <brett(a)python.org> wrote:
> On Sun, Aug 24, 2008 at 7:25 PM, Russ Paielli <russ.paielli(a)gmail.com>
> wrote:
>
> > Think about this way: it's 80% less clutter. I am a compulsive
> minimalist,
> > and one of the reasons I like Python is because it minimizes clutter. I
> > probably let clutter bother me more than I should. I really appreciate
> the
> > lack of semicolons all over the place. Some would call that trivial, but
> I
> > call it significant.
>
> You call it clutter, I call it information. We have kept 'self'
> explicit for a reason; it's self-documenting.
>
First, "self." conveys no more "information" than the "$" I am proposing,
but it requires five times as many characters. If that's not clutter, I
don't know what is.
Second, "self." actually conveys *less* information than "$", because it's
meaning depends on whether or not the first formal argument was actually
"self".
For the record, I gladly concede that I probably don't know as much about
Python as most of the other people on this mailing list (as I wrote earlier,
I am an aerospace engineer). But I also sense that many Python experts have
a blind spot on this matter for some reason. I guess these folks are just so
used to seeing "self." everywhere that it has burned itself into their brain
to the point that they don't see it as the clutter that it is.