Or Changing the Version Comparison Module in Distutils (again)
We've discussed a bit on distutils-sig about allowing
http://semver.org/versions in Python packages. Ronald's suggestion to
replace - with ~ as a
filename parts separator made me think of it again, because semver also
uses the popular - character.
The gist of semver:
Major.Minor.Patch (always 3 numbers)
And the big feature: no non-lexicographical sorting.
Right now, setuptools replaces every run of non-alphanumeric characters in
versions (and in project names) to a single dash (-). This would have to
change to at least allow +, and the regexp for recognizing an installed
dist would have to allow the plus as well. Current setuptools handling:
version = version.replace(' ','.')
return re.sub('[^A-Za-z0-9.]+', '-', version)
Semver would be an upgrade from the existing conventions because it is easy
to remember (no special-case sorting for forgettable words 'dev' and
'post'), because you can legally put Mercurial revision numbers in your
package's pre- or post- version, and because the meaning of Major, Minor,
Patch is defined. For compatibility I think it would be enough to say you
could not mix semver and PEP-386 in the same major.minor.patch release.
Vinay Sajip's distlib has some experimental support for semver.
On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk <shibturn(a)gmail.com> wrote:
> The difference in speed between AF_INET sockets and pipes on Windows is much
> bigger than the difference between AF_INET sockets and pipes on Unix.
> (Who knows, maybe it is just my firewall which is causing the slowdown...)
Here's another unscientific benchmark: I wrote a stupid "http" server
(stupider than echosvr.py actually) that accepts HTTP requests and
responds with the shortest possible "200 Ok" response. This should
provide an adequate benchmark of how fast the event loop, scheduler,
and transport are at accepting and closing connections (and reading
and writing small amounts). On my linux box at work, over localhost,
it seems I can handle 10K requests (sent using 'ab' over localhost) in
1.6 seconds. Is that good or bad? The box has insane amounts of memory
and 12 cores (?) and rates at around 115K pystones.
(I tried to repro this on my Mac, but I am running into problems,
perhaps due to system limits.)
--Guido van Rossum (python.org/~guido)
First of all, sorry for the lengthy email. I've really tried to make it concise
and I hope that I didn't fail entirely. At the beginning I want to describe
the framework my company has been working on for several years, and on which we
successfully deployed several web applications that are exposed to 1000s of users
today. It survived multiple architecture reviews and redesigns, so I believe its
design is worth to be introduced here.
Secondly, I'm really glad that many advanced python developers find that use of
"yield" is viable for async API, and that it even may be "The Right Way". Because
when we started working on our framework that sounded nuts (and still sounds...)
I'll describe here only the core functionality, not touching message bus & dispatch,
protocols design, IO layers, etc. If someone gets interested - I can start another
The very core of the system is Scheduler. I prefer it to be called "Scheduler",
and not "Reactor" or something else, because it's not just an event loop. It loops
over micro-threads, where a micro-thread is a primitive that holds a pointer to the
current running/suspended task. Task can be anything, from coroutine, to a Sleep
command. A Task may be suspended because of IO waiting, a lock primitive, a timeout
or something else. You can even write programs that are not IO-bound at all.
To the code. So when you have::
bar_value = yield bar()
defined, and then executed, 'foo' will send a Task object (wrapped around 'bar'),
so that it will be executed in the foo's micro-thread.
And because we return a Task, we can also do::
or even (alike coroutines with Futures)::
bar_value_promise = yield bar().with_timeout(1).async()
bar_value = yield bar_value_promise
So far there is nothing new. The need for something "new" emerged when we started
to use it in "real world" applications. Consider you have some ORM, and the
following piece of code::
topics = FE.select([
]).filter(FE.publication_date < FE.publication_date.now(),
FE.category == self.category)
for topic in topics:
Everything is lazily-loaded, so a DB query here can be run at virtually any point.
When you iterate it pre-fetches objects, or addressing an attribute which wasn't
told to be loaded, etc. The thing is that there is no way to express with 'yield'
all that semantics. There is no 'for yield' statement, there is no pretty way of
resolving an attribute with 'yield'.
So even if you decide to write everything around you from scratch supporting
'yields', you still can't make a nice python API for some problems.
Another problem is that "writing everything from scratch" thing. Nobody wants it.
We always want to reuse, nobody wants to write an SMTP client from scratch, when
there is a decent one available right in the stdlib.
So the solution was simple. Incorporate greenlets.
With greenlets we got a 'yield_()' function, that can be called from any coroutine,
and from framework user's point of view it is the same as 'yield' statement.
Now we were able to create a green-socket object, that looks as a plain stdlib
socket, and fix our ORM. With it help we also were able to wrap lots and lots
of existing libraries in a nice 'yield'-style design, without rewriting their
At the end - we have a hybrid approach. For 95% we use explicit 'yields', and for
the rest 5% - well, we know that when we use ORM it may do some implicit 'yields',
but that's OK.
Now, with adopting greenlets a whole new optimization set of strategies became
available. For instance, we can substitute 'yield' statements with 'yield_'
command transparently by messing with opcodes, and by tweaking 'yield_' and
reusing 'Task' objects we can achieve near regular-python-call performance, but
with a tight control over our coroutines & micro-threads. And when PyPy finally
adds support for Python 3, STM & JIT-able continulets, it would be very interesting
to see how we can improve performance even further.
The whole point of this text was to show, that pure 'yield' approach will not
work. Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding'
and 'yield-fromming'. There are so many ways of doing that: with @coroutine
decorator, plain generators, futures and Tasks, and perhaps more. And I honestly
don't know with one is the right one.
What we really need now (and I think Guido has already mentioned that) is a
callback-based (Deferreds, Futures, plain callbacks) design that is easy to
plug-and-play in any coroutine-framework. It has to be low-level and simple.
Sort of WSGI for async frameworks ;)
We also need to work on the stdlib, so that it is easy to inject a custom socket
in any object. Ideally, by passing it in the constructor (as everybody hates
With all that said, I'd be happy to dedicate a fair amount of my time to help
with the design and implementation.
To save people scrolling to get to the interesting parts, I'll lead with the links:
Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal
Source code: https://bitbucket.org/stevedower/tulip/src
(Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.)
Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform).
There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism.
For the sake of a quick example, I've modified Guido's main.doit function (http://code.google.com/p/tulip/source/browse/main.py) to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same):
TIMEOUT = 2
cs = CancellationSource()
tasks = set()
task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs)
task2 = urlfetch('127.0.0.1', 8080, path='/home', cancel_source=cs)
task3 = urlfetch('python.org', 80, path='/', cancel_source=cs)
task4 = urlfetch('xkcd.com', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs)
## for t in tasks: t.start() # tasks start as soon as they are called - this function does not exist
yield delay(0.2) # I believe this is equivalent to scheduling.with_timeout(0.2, ...)?
winners = [t.result() for t in tasks if t.done()]
print('And the winners are:', [w for w in winners])
results =  # This 'wait all' loop could easily be a helper function
for t in tasks: # Unfortunately, [(yield t) for t in tasks] does not work :(
print('And the players were:', [r for r in results])
This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads.
However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach.
In trying to reduce the repetition of option arguments to python scripts I
basically needed to allow some structure to the program to be able to be
automatically mangled so it could be used in
a) the getopt() call
b) the -h (give call usage) option in the program
c) Synopsis subheading in the man page
d) Options subheading in the man page
rather than having to keep all in synch just because someone wanted a "-j"
Because it requires a programmed man page creation, Sphinx, pydoc et al
haven't been really of any use, since they are YAML (Yet Another Markup
Language) as far as I've been able to tell, not really able to allow runtime
changes to reflect in document generation. I may have missed how, however...
So I used a dictionary and wrote a program to generate man pages based on that
dictionary and included function calls to automate the four repetitions above
into one structure, rather similar to what you need for ArgParse.
A dictionary allowed me to check the ordering, existence and allow optional
and updatable sections to be used in man page writing. It also gave me a
reason to use docstrings in public functions.
I know man pages are passe and GUIs generally don't bother at all, but it
still seems to me that adding some core python utility to express a man page
and allow programmatic use of the construction both to define the program and
its description is still a large gap.
Making man pages easier to write would be enough, but I also think that if
newcomers could see some utility in writing documentation inside the programs,
they would do so more readily. And this learnt behaviour is useful elsewhere.
The attached program (if it appears!) is my solution, basically baby python.
It still has one redundant repetition because getopt() does it that way. And
it has some possibly silly but useful markup based on the basic python data
types (e.g. it displays a list differently from a scalar string).
It is meant to illustrate what I felt was not possible with python as-is to
see if there is a way to make this work done redundant.
There are a few other people out there who have had to roll-their-own answer
to the same problems. They solved it slightly differently and didn't include an
ability to enforce "good practice" in man page creation which I think is
So I do feel there is room for python to stop us flailing around trying to find
our own solution.
Is there agreement from others?
I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
Would it be a good idea to have a built-in list of strings containing
the reserved identifiers of python such as 'assert', 'import', etc?
The reason I think this would be useful is that whenever I write a
class with user defined methods I always have to exclude the reserved
keywords. So for instance myinstance.mymethod( ) is okay but
myinstance.assert( ) is not. In these cases I use the convention
myinstance._assert( ), etc. In order to test for these cases I hard
code the keywords in a list and test from there. I take the list of
keywords from http://docs.python.org/reference/lexical_analysis.html#keywords
But what if these change in the future?
So if I would have a built-in list containing all the keywords of the
given interpreter version in question my life would be that much
What do you think?
Psss, psss, put it down! - http://www.cafepress.com/putitdown