Mailman 3 October 2012 - Python-ideas

Allowing semver in packaging
by Daniel Holth 31 Oct '12

31 Oct '12

Or Changing the Version Comparison Module in Distutils (again) We've discussed a bit on distutils-sig about allowing http://semver.org/versions in Python packages. Ronald's suggestion to replace - with ~ as a filename parts separator made me think of it again, because semver also uses the popular - character. The gist of semver: Major.Minor.Patch (always 3 numbers) 1.0.0-prerelease.version 1.0.0+postrelease.version 1.0.0-pre+post And the big feature: no non-lexicographical sorting. Right now, setuptools replaces every run of non-alphanumeric characters in versions (and in project names) to a single dash (-). This would have to change to at least allow +, and the regexp for recognizing an installed dist would have to allow the plus as well. Current setuptools handling: def safe_version(version): version = version.replace(' ','.') return re.sub('[^A-Za-z0-9.]+', '-', version) Semver would be an upgrade from the existing conventions because it is easy to remember (no special-case sorting for forgettable words 'dev' and 'post'), because you can legally put Mercurial revision numbers in your package's pre- or post- version, and because the meaning of Major, Minor, Patch is defined. For compatibility I think it would be enough to say you could not mix semver and PEP-386 in the same major.minor.patch release. Vinay Sajip's distlib has some experimental support for semver.

1 0

Re: [Python-ideas] Async API: some code to review
by Guido van Rossum 30 Oct '12

30 Oct '12

On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk <shibturn(a)gmail.com> wrote: > The difference in speed between AF_INET sockets and pipes on Windows is much > bigger than the difference between AF_INET sockets and pipes on Unix. > > (Who knows, maybe it is just my firewall which is causing the slowdown...) Here's another unscientific benchmark: I wrote a stupid "http" server (stupider than echosvr.py actually) that accepts HTTP requests and responds with the shortest possible "200 Ok" response. This should provide an adequate benchmark of how fast the event loop, scheduler, and transport are at accepting and closing connections (and reading and writing small amounts). On my linux box at work, over localhost, it seems I can handle 10K requests (sent using 'ab' over localhost) in 1.6 seconds. Is that good or bad? The box has insane amounts of memory and 12 cores (?) and rates at around 115K pystones. (I tried to repro this on my Mac, but I am running into problems, perhaps due to system limits.) -- --Guido van Rossum (python.org/~guido)

4 3

Async API
by Yury Selivanov 30 Oct '12

30 Oct '12

Hello, First of all, sorry for the lengthy email. I've really tried to make it concise and I hope that I didn't fail entirely. At the beginning I want to describe the framework my company has been working on for several years, and on which we successfully deployed several web applications that are exposed to 1000s of users today. It survived multiple architecture reviews and redesigns, so I believe its design is worth to be introduced here. Secondly, I'm really glad that many advanced python developers find that use of "yield" is viable for async API, and that it even may be "The Right Way". Because when we started working on our framework that sounded nuts (and still sounds...) The framework ============= I'll describe here only the core functionality, not touching message bus & dispatch, protocols design, IO layers, etc. If someone gets interested - I can start another thread. The very core of the system is Scheduler. I prefer it to be called "Scheduler", and not "Reactor" or something else, because it's not just an event loop. It loops over micro-threads, where a micro-thread is a primitive that holds a pointer to the current running/suspended task. Task can be anything, from coroutine, to a Sleep command. A Task may be suspended because of IO waiting, a lock primitive, a timeout or something else. You can even write programs that are not IO-bound at all. To the code. So when you have:: @coroutine def foo(): bar_value = yield bar() defined, and then executed, 'foo' will send a Task object (wrapped around 'bar'), so that it will be executed in the foo's micro-thread. And because we return a Task, we can also do:: yield bar().with_timeout(1) or even (alike coroutines with Futures):: bar_value_promise = yield bar().with_timeout(1).async() [some code] bar_value = yield bar_value_promise So far there is nothing new. The need for something "new" emerged when we started to use it in "real world" applications. Consider you have some ORM, and the following piece of code:: topics = FE.select([ FE.publication_date, FE.body, FE.category, (FE.creator, [ (FE.creator.subject, [ (gpi, [ gpi.avatar ]) ]) ]) ]).filter(FE.publication_date < FE.publication_date.now(), FE.category == self.category) and later:: for topic in topics: print(topic.avatar.bucket.path, topic.category.name) Everything is lazily-loaded, so a DB query here can be run at virtually any point. When you iterate it pre-fetches objects, or addressing an attribute which wasn't told to be loaded, etc. The thing is that there is no way to express with 'yield' all that semantics. There is no 'for yield' statement, there is no pretty way of resolving an attribute with 'yield'. So even if you decide to write everything around you from scratch supporting 'yields', you still can't make a nice python API for some problems. Another problem is that "writing everything from scratch" thing. Nobody wants it. We always want to reuse, nobody wants to write an SMTP client from scratch, when there is a decent one available right in the stdlib. So the solution was simple. Incorporate greenlets. With greenlets we got a 'yield_()' function, that can be called from any coroutine, and from framework user's point of view it is the same as 'yield' statement. Now we were able to create a green-socket object, that looks as a plain stdlib socket, and fix our ORM. With it help we also were able to wrap lots and lots of existing libraries in a nice 'yield'-style design, without rewriting their internals. At the end - we have a hybrid approach. For 95% we use explicit 'yields', and for the rest 5% - well, we know that when we use ORM it may do some implicit 'yields', but that's OK. Now, with adopting greenlets a whole new optimization set of strategies became available. For instance, we can substitute 'yield' statements with 'yield_' command transparently by messing with opcodes, and by tweaking 'yield_' and reusing 'Task' objects we can achieve near regular-python-call performance, but with a tight control over our coroutines & micro-threads. And when PyPy finally adds support for Python 3, STM & JIT-able continulets, it would be very interesting to see how we can improve performance even further. Conclusion ========== The whole point of this text was to show, that pure 'yield' approach will not work. Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding' and 'yield-fromming'. There are so many ways of doing that: with @coroutine decorator, plain generators, futures and Tasks, and perhaps more. And I honestly don't know with one is the right one. What we really need now (and I think Guido has already mentioned that) is a callback-based (Deferreds, Futures, plain callbacks) design that is easy to plug-and-play in any coroutine-framework. It has to be low-level and simple. Sort of WSGI for async frameworks ;) We also need to work on the stdlib, so that it is easy to inject a custom socket in any object. Ideally, by passing it in the constructor (as everybody hates monkey-patching.) With all that said, I'd be happy to dedicate a fair amount of my time to help with the design and implementation. Thank you! Yury

14 88

Async API: some more code to review
by Steve Dower 30 Oct '12

30 Oct '12

To save people scrolling to get to the interesting parts, I'll lead with the links: Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal Source code: https://bitbucket.org/stevedower/tulip/src (Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.) Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform). There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism. For the sake of a quick example, I've modified Guido's main.doit function (http://code.google.com/p/tulip/source/browse/main.py) to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same): @async def doit(): TIMEOUT = 2 cs = CancellationSource() cs.cancel_after(TIMEOUT) tasks = set() task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs) tasks.add(task1) task2 = urlfetch('127.0.0.1', 8080, path='/home', cancel_source=cs) tasks.add(task2) task3 = urlfetch('python.org', 80, path='/', cancel_source=cs) tasks.add(task3) task4 = urlfetch('xkcd.com', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs) tasks.add(task4) ## for t in tasks: t.start() # tasks start as soon as they are called - this function does not exist yield delay(0.2) # I believe this is equivalent to scheduling.with_timeout(0.2, ...)? winners = [t.result() for t in tasks if t.done()] print('And the winners are:', [w for w in winners]) results = [] # This 'wait all' loop could easily be a helper function for t in tasks: # Unfortunately, [(yield t) for t in tasks] does not work :( results.append((yield t)) print('And the players were:', [r for r in results]) return results This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads. However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach. Cheers, Steve

3 8

docs.python.org
by Yury Selivanov 29 Oct '12

29 Oct '12

Hi, I remember a discussion to make docs.python.org pointed to py3k docs by default. Are we still going to do that? - Yury

23 59

Enabling man page structure for python
by Mark Hackett 29 Oct '12

29 Oct '12

In trying to reduce the repetition of option arguments to python scripts I basically needed to allow some structure to the program to be able to be automatically mangled so it could be used in a) the getopt() call b) the -h (give call usage) option in the program c) Synopsis subheading in the man page d) Options subheading in the man page rather than having to keep all in synch just because someone wanted a "-j" option added. Because it requires a programmed man page creation, Sphinx, pydoc et al haven't been really of any use, since they are YAML (Yet Another Markup Language) as far as I've been able to tell, not really able to allow runtime changes to reflect in document generation. I may have missed how, however... So I used a dictionary and wrote a program to generate man pages based on that dictionary and included function calls to automate the four repetitions above into one structure, rather similar to what you need for ArgParse. A dictionary allowed me to check the ordering, existence and allow optional and updatable sections to be used in man page writing. It also gave me a reason to use docstrings in public functions. I know man pages are passe and GUIs generally don't bother at all, but it still seems to me that adding some core python utility to express a man page and allow programmatic use of the construction both to define the program and its description is still a large gap. Making man pages easier to write would be enough, but I also think that if newcomers could see some utility in writing documentation inside the programs, they would do so more readily. And this learnt behaviour is useful elsewhere. The attached program (if it appears!) is my solution, basically baby python. It still has one redundant repetition because getopt() does it that way. And it has some possibly silly but useful markup based on the basic python data types (e.g. it displays a list differently from a scalar string). It is meant to illustrate what I felt was not possible with python as-is to see if there is a way to make this work done redundant. There are a few other people out there who have had to roll-their-own answer to the same problems. They solved it slightly differently and didn't include an ability to enforce "good practice" in man page creation which I think is warranted. So I do feel there is room for python to stop us flailing around trying to find our own solution. Is there agreement from others?

5 6

Re: [Python-ideas] Async API
by Daniel Foerster 28 Oct '12

28 Oct '12

So, are threads still an option? I feel that many of these problems with generators could be solved with threads.

6 8

i18n and Python tracebacks
by Andre Roberge 28 Oct '12

28 Oct '12

I think it would be a good idea if Python tracebacks could be translated into languages other than English - and it would set a good example. For example, using French as my default local language, instead of >>> 1/0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero I might get something like >>> 1/0 Suivi d'erreur (appel le plus récent en dernier) : Fichier "<stdin>", à la ligne 1, dans <module> ZeroDivisionError: division entière ou modulo par zéro André

9 15

Re: [Python-ideas] docs.python.org
by massimo.dipierro＠gmail.com 26 Oct '12

26 Oct '12

1 0

list of reserved identifiers in program?
by Daniel Fetchinson 26 Oct '12

26 Oct '12

Hi folks, Would it be a good idea to have a built-in list of strings containing the reserved identifiers of python such as 'assert', 'import', etc? The reason I think this would be useful is that whenever I write a class with user defined methods I always have to exclude the reserved keywords. So for instance myinstance.mymethod( ) is okay but myinstance.assert( ) is not. In these cases I use the convention myinstance._assert( ), etc. In order to test for these cases I hard code the keywords in a list and test from there. I take the list of keywords from http://docs.python.org/reference/lexical_analysis.html#keywords But what if these change in the future? So if I would have a built-in list containing all the keywords of the given interpreter version in question my life would be that much easier. What do you think? Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown

6 6