Mailman 3 October 2012 - Python-ideas

Re: [Python-ideas] The async API of the future: Reactors
by Rene Nejsum Oct. 15, 2012

Oct. 15, 2012

On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido(a)python.org> wrote: > On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene(a)stranden.com> wrote: >> On the high level (Python) basically what you need is that the queue.get() >> can handle: >> 1) Python objects (as today) >> 2) timeout (as today, maybe in mills instead of seconds) >> 3) Network (socket input/state change) >> 4) File desc input/state change >> 5) Other I/O changes like … [View More]serial comm, etc. >> 6) Maybe also yield based coroutine support ? >> >> This requires support from the underlaying >> OS. A support which is probably not there today ? >> >> As far as I can see, having this one extended queue.get() would nicely enable >> all high level concurrency issues in Python. > > [...] > >> I believe a "super" queue.get() would solve all use cases. >> >> I have no idea on how difficult it would be to implement in >> a cross platform manner. > > Hm. I know that a common (and often right!) recommendation for thread > communication is to use the queue module. But that module is meant to > work with threads. I think that the correct I/O primitives are more > likely to come by looking at what Tornado and Twisted have done than > by trying to "pimp up" the queue module -- it's good for what it does, > but trying to add all that new functionality to it doesn't sound like > a good fit. You are probably right about the queue class. Maybe it should be a new class, but I still believe I would be an excellent fit for doing concurrent stuff if Python had a multiplexer message queue, Python is high-level enough to be able to hide thread/select/read etc. A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which is a kind of Erlang implementation for Python, making objects concurrent and return values Futures, without adding much new code. Methods are sent asynchronous, simply by doing standard obj.method(). obj is a proxy for the real object sending method() as a message to the real object running in a separate thread. Return value is a Future. So you can do val = obj.method() … continue async with method() … and do some other stuff, until: print val which will hang waiting for the Future to complete, if it's not. It has been used in a couple of projects, making it much easier to do concurrent systems. But, it would be great if the object/task could wait for more events than queue.get() br /Rene > > -- > --Guido van Rossum (python.org/~guido) [View Less]

3 4

Re: [Python-ideas] Is there a good reason to use * for multiplication?
by Ram Rachum Oct. 15, 2012

Oct. 15, 2012

On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde <syrion(a)gmail.com> wrote: > Is anything gained from this addition? To give a practical answer, I could say that for newbies it's one small confusion that could removed from the language. You and I have been programming for a long time so we take it for granted that * means multiplication, but for any other person that's just another weird idiosyncrasy that further alienates programming. Also, I think that using * for multiplication is … [View More]ugly. > > On Fri, Oct 12, 2012 at 4:37 PM, Ram Rachum <ram.rachum(a)gmail.com> wrote: > > > > > > On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham <mikegraham(a)gmail.com> > wrote: > >> > >> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum <ram.rachum(a)gmail.com> > wrote: > >> > Hi everybody, > >> > > >> > Today a funny thought occurred to me. Ever since I've learned to > program > >> > when I was a child, I've taken for granted that when programming, the > >> > sign > >> > used for multiplication is *. But now that I think about it, why? Now > >> > that > >> > we have Unicode, why not use · ? > >> > > >> > Do you think that we can make Python support · in addition to *? > >> > > >> > I can think of a couple of problems, but none of them seem like > >> > deal-breakers: > >> > > >> > - Backward compatibility: Python already uses *, but I don't see a > >> > backward > >> > compatibility problem with supporting · additionally. Let people use > >> > whichever they want, like spaces and tabs. > >> > - Input methods: I personally use an IDE that could be easily set to > >> > automatically convert * to · where appropriate and to allow manual > input > >> > of > >> > ·. People on Linux can type Alt-. . Anyone else can set up a script > >> > that'll > >> > let them type · using whichever keyboard combination they want. I > admit > >> > this > >> > is pretty annoying, but since you can always use * if you want to, I > >> > figure > >> > that anyone who cares enough about using · instead of * (I bet that > >> > people > >> > in scientific computing would like that) would be willing to take the > >> > time > >> > to set it up. > >> > > >> > > >> > What do you think? > >> > > >> > > >> > Ram > >> > >> Python should not expect characters that are hard for most people to > >> type. > > > > > > No one will be forced to type it. If you can't type it, use *. > > > > > >> > >> Python should not expect characters that are still hard to > >> display on many common platforms. > > > > > > We allow people to have unicode variable names, if they wish, don't we? > So > > why not allow them to use unicode operator, if they wish, as a completely > > optional thing? > > > >> > >> > >> I think you'll find strong opposition to adding any non-ASCII > >> characters or characters that don't occur on almost all keyboards as > >> part of the language. > >> > >> Mike > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas(a)python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > [View Less]

5 4

The async API of the future: Twisted and Deferreds
by Guido van Rossum Oct. 15, 2012

Oct. 15, 2012

[This is the third spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre <jeanpierreda(a)gmail.com> wrote: > On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido(a)python.org> wrote: >> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre >> <jeanpierreda(a)gmail.com> wrote: >>> Could you be more specific? I've never heard Deferreds in particular >>> called "arcane". They're very … [View More]popular in e.g. the JS world, >> >> Really? Twisted is used in the JS world? Or do you just mean the >> pervasiveness of callback style async programming? > > Ah, I mean Deferreds. I attended a talk earlier this year all about > deferreds in JS, and not a single reference to Python or Twisted was > made! > > These are the examples I remember mentioned in the talk: > > - http://api.jquery.com/category/deferred-object/ (not very twistedish > at all, ill-liked by the speaker) > - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe > not a good example, mochikit tries to be "python in JS") > - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html > - https://github.com/kriskowal/q (also includes an explanation of why > the author likes deferreds) > > There were a few more that the speaker mentioned, but didn't cover. > One of his points was that the various systems of deferreds are subtly > different, some very badly so, and that it was a mess, but that > deferreds were still awesome. JS is a language where async programming > is mainstream, so lots of people try to make it easier, and they all > do it slightly differently. Thanks for those links. I followed the kriskowal/q link and was reminded of why Twisted's Deferreds are considered more awesome than Futures: it's the chaining. BUT... That's only important if callbacks are all the language lets you do! If your baseline is this: step1(function (value1) { step2(value1, function(value2) { step3(value2, function(value3) { step4(value3, function(value4) { // Do something with value4 }); }); }); }); then of course the alternative using Deferred looks better: Q.fcall(step1) .then(step2) .then(step3) .then(step4) .then(function (value4) { // Do something with value4 }, function (error) { // Handle any error from step1 through step4 }) .end(); (Both quoted literally from the kriskowal/q link.) I also don't doubt that using classic Futures you can't do this -- the chaining really matter for this style, and I presume this (modulo unimportant API differences) is what typical Twisted code looks like. However, Python has yield, and you can do much better (I'll write plain yield for now, but it works the same with yield-from): try: value1 = yield step1(<args>) value2 = yield step2(value1) value3 = yield step3(value2) # Do something with value4 except Exception: # Handle any error from step1 through step4 There's an outer function missing here, since you can't have a toplevel yield; I think that's the same for the JS case, typically. Also, strictly speaking the "Do something with value4" code should probably be in an else: clause after the except handler. But that actually leads nicely to the advantage: This form is more flexible, since it is easier to catch different exceptions at different points. It is also much easier to pass extra information around. E.g. what if your flow ends up having to pass both value1 and value2 into step3()? Sure, you can do that by making value2 a tuple (or a dict, or an object) incorporating value1 and the original value2, but that's exactly where this style becomes cumbersome, whereas in the yield-based form, such things can remain simple local variables. All in all I find it more readable. In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change. Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle), or they go a completely different route and use greenlets/gevent instead -- and get amazing performance and productivity that way too, even though they know it's monkey-patching their asses off... So, in the end, for Python 3.4 and beyond, I want to promote a style that mixes simple callbacks (perhaps augmented with simple Futures) and generator-based coroutines (either PEP 342, yield/send-based, or PEP 380 yield-from-based). I'm looking to Twisted for the best reactors (see other thread). But for transport/protocol implementations I think that generator/coroutines offers a cleaner, better interface than incorporating Deferred. I hope that the path forward for Twisted will be simple enough: it should be possible to hook Deferred into the simpler callback APIs (perhaps a new implementation using some form of adaptation, but keeping the interface the same). In a sense, the greenlet/gevent crowd will be the biggest losers, since they currently write async code without either callbacks or yield, using microthreads instead. I wouldn't want to have to start putting yield back everywhere into that code. But the stdlib will still support yield-free blocking calls (even if under the hood some of these use yield/send-based or yield-from-based couroutines) so the monkey-patchey tradition can continue. >> That's one of the >> things I am desperately trying to keep out of Python, I find that >> style unreadable and unmanageable (whenever I click on a button in a >> website and nothing happens I know someone has a bug in their >> callbacks). I understand you feel different; but I feel the general >> sentiment is that callback-based async programming is even harder than >> multi-threaded programming (and nobody is claiming that threads are >> easy :-). > > :S > > There are (at least?) four different styles of asynchronous > computation used in Twisted, and you seem to be confused as to which > ones I'm talking about. > > 1. Explicit callbacks: > > For example, reactor.callLater(t, lambda: print("woo hoo")) I actually like this, as it's a lowest-common-denominator approach which everyone can easily adapt to their purposes. See the thread I started about reactors. > 2. Method dispatch callbacks: > > Similar to the above, the reactor or somebody has a handle on your > object, and calls methods that you've defined when events happen > e.g. IProtocol's dataReceived method While I'm sure it's expedient and captures certain common patterns well, I like this the least of all -- calling fixed methods on an object sounds like a step back; it smells of the old Java way (before it had some equivalent of anonymous functions), and of asyncore, which (nearly) everybody agrees is kind of bad due to its insistence that you subclass its classes. (Notice how subclassing as the prevalent approach to structuring your code has gotten into a lot of discredit since 1996.) > 3. Deferred callbacks: > > When you ask for something to be done, it's set up, and you get an > object back, which you can add a pipeline of callbacks to that will be > called whenever whatever happens > e.g. twisted.internet.threads.deferToThread(print, > "x").addCallback(print, "x was printed in some other thread!") Discussed above. > 4. Generator coroutines > > These are a syntactic wrapper around deferreds. If you yield a > deferred, you will be sent the result if the deferred succeeds, or an > exception if the deferred fails. > e.g. examples from previous message Seeing them as syntactic sugar for Deferreds is one way of looking at it; no doubt this is how they're seen in the Twisted community because Deferreds are older and more entrenched. But there's no requirement that an architecture has to have Deferreds in order to use generator coroutines -- simple Futures will do just fine, and Greg Ewing has shown that using yield-from you can even do without those. (But he does use simple, explicit callbacks at the lowest level of his system.) > I don't see a reason for the first to exist at all, the second one is > kind of nice in some circumstances (see below), but perhaps overused. > > I feel like you're railing on the first and second when I'm talking > about the third and fourth. I could be wrong. I think you're wrong -- I was (and am) most concerned about the perceived complexity of the API offered by, and the typical looks of code using, Deferreds (i.e., #3). >>> and possibly elsewhere. Moreover, they're extremely similar to futures, so >>> if one is arcane so is the other. >> >> I love Futures, they represent a nice simple programming model. But I >> especially love that you can write async code using Futures and >> yield-based coroutines (what you call inlineCallbacks) and never have >> to write an explicit callback function. Ever. > > The reason explicit non-deferred callbacks are involved in Twisted is > because of situations in which deferreds are not present, because of > past history in Twisted. It is not at all a limitation of deferreds or > something futures are better at, best as I'm aware. > > (In case that's what you're getting at.) I don't think I was. It's clear to me (now) that Futures are simpler than Deferreds -- and I like Futures better because of it, because for the complex cases I would much rather use generator coroutines than Deferreds. > Anyway, one big issue is that generator coroutines can't really > effectively replace callbacks everywhere. Consider the GUI button > example you gave. How do you write that as a coroutine? > > I can see it being written like this: > > def mycoroutine(gui): > while True: > clickevent = yield gui.mybutton1.on_click() > # handle clickevent > > But that's probably worse than using callbacks. I touched on this briefly in the reactor thread. Basically, GUI callbacks are often level-triggered rather than edge-triggered, and IIUC Deferreds are not great for that either; and in a few cases where edge-triggered coding makes sense I *would* like to use a generator coroutine. >>> Neither is clearly better or more obvious than the other. If anything >>> I generally find deferred composition more useful than deferred >>> tee-ing, so I feel like composition is the correct base operator, but >>> you could pick another. >> >> If you're writing long complicated chains of callbacks that benefit >> from these features, IMO you are already doing it wrong. I understand >> that this is a matter of style where I won't be able to convince you. >> But style is important to me, so let's agree to disagree. [In a follow-up to yourself, you quoted starting from this point and appended "Nevermind that whole segment." I'm keeping it in here just for context of the thread.] > This is more than a matter of style, so at least for now I'd like to > hold off on calling it even. > > In my day to day silly, synchronous, python code, I do lots of > synchronous requests. For example, it's not unreasonable for me to > want to load two different files from disk, or make several database > interactions, etc. If I want to make this asynchronous, I have to find > a way to execute multiple things that could hypothetically block, at > the same time. If I can't do that easily, then the asynchronous > solution has failed, because its entire purpose is to do everything > that I do synchronously, except without blocking the main thread. > > Here's an example with lots of synchronous requests in Django: > > def view_paste(request, filekey): > try: > fileinfo= Pastes.objects.get(key=filekey) > except DoesNotExist: > t = loader.get_template('pastebin/error.html') > return HttpResponse(t.render(Context(dict(error='File does not exist')))) > > f = open(fileinfo.filename) > fcontents = f.read() > t = loader.get_template('pastebin/paste.html') > return HttpResponse(t.render(Context(dict(file=fcontents)))) > > How many blocking requests are there? Lots. This is, in a word, a > long, complicated chain of synchronous requests. This is also very > similar to what actual django code might look like in some > circumstances. Even if we might think this is unreasonable, some > subset of alteration of this is reasonable. Certainly we should be > able to, say, load multiple (!) objects from the database, and open > the template (possibly from disk), all potentially-blocking > operations. > > This is inherently a long, complicated chain of requests, whether we > implement it asynchronously or synchronously, or use Deferreds or > Futures, or write it in Java or Python. Some parts can be done at any > time before the end (loader.get_template(...)), some need to be done > in a certain order, and there's branching depending on what happens in > different cases. In order to even write this code _at all_, we need a > way to chain these IO actions together. If we can't chain them > together, we can't produce that final synthesis of results at the end. [This is here you write "Ugh, just realized way after the fact that of course you meant callbacks, not composition. I feel dumb. Nevermind that whole segment."] I'd like to come back to that Django example though. You are implying that there are some opportunities for concurrency here, and I agree, assuming we believe disk I/O is slow enough to bother making it asynchronously. (In App Engine it's not, and we can't anyways, but in other contexts I agree that it would be bad if a slow disk seek were to hold up all processing -- not to mention that it might really be NFS...) The potentially async operations I see are: (1) fileinfo = Pastes.objects.get(key=filekey) # I assume this is some kind of database query (2) loader.get_template('pastebin/error.html') (3) f = open(fileinfo.filename) # depends on (1) (4) fcontents = f.read() # depends on (3) (5) loader.get_template('pastebin/paste.html') How would you code that using Twisted Deferreds? Using Futures and generator coroutines, I would do it as follows. I'm hypothesizing that for every blocking API foo() there is a corresponding non-blocking API foo_async() with the same call signature, and returning a Future whose result is what the synchronous API returns (and raises what the synchronous call would raise, if there's an error). These are the conventions I use in NDB. I'm also inventing a @task decorator. @task def view_paste_async(request, filekey): # Create Futures -- no yields! f1 = Pastes.objects.get_async(key=filekey) # This won't raise f2 = loader.get_template_async('pastebin/error.html') f3 = loader.get_template_async('pastebin/paste.html') try: fileinfo= yield f1 except DoesNotExist: t = yield f2 return HttpResponse(t.render(Context(dict(error='File does not exist')))) f = yield open_async(fileinfo.filename) fcontents = yield f.read_async() t = yield f3 return HttpResponse(t.render(Context(dict(file=fcontents)))) You could easily decide not to bother loading the error template asynchronously (assuming most requests don't fail), and you could move the creation of f3 below the try/except. But you get the idea. Even if you do everything serially, inserting the yields and _async calls would make this more parallellizable without the use of threads. (If you were using threads, all this would be moot of course -- but then your limit on requests being handled concurrently probably goes way down.) > We _need_ a pipeline or something computationally equivalent or more > powerful. Results from past "deferred computations" need to be passed > forward into future "deferred computations", in order to implement > this at all. Yeah, and I think that a single generator using multiple yields is the ideal pipeline to me (see my example near the top based on kriskowal/q). > This is not a style issue, this is an issue of needing to be able to > solve problems that involve more than one computation where the > results of every computation matters somewhere. It's just that in this > case, some of the computations are computed asynchronously. And I think generators do this very well. >> I am totally open to learning from Twisted's experience. I hope that >> you are willing to share even the end result might not look like >> Twisted at all -- after all in Python 3.3 we have "yield from" and >> return from a generator and many years of experience with different >> styles of async APIs. In addition to Twisted, there's Tornado and >> Monocle, and then there's the whole greenlets/gevent and >> Stackless/microthreads community that we can't completely ignore. I >> believe somewhere is an ideal async architecture, and I hope you can >> help us discover it. >> >> (For example, I am very interested in Twisted's experiences writing >> real-world performant, robust reactors.) > > For that stuff, you'd have to speak to the main authors of Twisted. > I'm just a twisted user. :( They seem to be mostly ignoring this conversation, so your standing in as a proxy for them is much appreciated! > In the end it really doesn't matter what API you go with. The Twisted > people will wrap it up so that they are compatible, as far as that is > possible. And I want to ensure that that is possible and preferably easy, if I can do it without introducing too many warts in the API that non-Twisted users see and use. > I hope I haven't detracted too much from the main thrust of the > surrounding discussion. Futures/deferreds are a pretty big tangent, so > sorry. I justified it to myself by figuring that it'd probably come up > anyway, somehow, since these are useful abstractions for asynchronous > programming. Not at all. This has been a valuable refresher for me! -- --Guido van Rossum (python.org/~guido) [View Less]

9 25

PEP 428 - object-oriented filesystem paths
by Antoine Pitrou Oct. 15, 2012

Oct. 15, 2012

Hello, This PEP is a resurrection of the idea of having object-oriented filesystem paths in the stdlib. It comes with a general API proposal as well as a specific implementation (*). The implementation is young and discussion is quite open. (*) http://pypi.python.org/pypi/pathlib/ Regards Antoine. PS: You can all admire my ASCII-art skills. PEP: 428 Title: The pathlib module -- object-oriented filesystem paths Version: $Revision$ Last-Modified: $Date Author: Antoine Pitrou <solipsis(… [View More]a)pitrou.net> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-July-2012 Python-Version: 3.4 Post-History: Abstract ======== This PEP proposes the inclusion of a third-party module, `pathlib`_, in the standard library. The inclusion is proposed under the provisional label, as described in :pep:`411`. Therefore, API changes can be done, either as part of the PEP process, or after acceptance in the standard library (and until the provisional label is removed). The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them. .. _`pathlib`: http://pypi.python.org/pypi/pathlib/ Related work ============ An object-oriented API for filesystem paths has already been proposed and rejected in :pep:`355`. Several third-party implementations of the idea of object-oriented filesystem paths exist in the wild: * The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs and others, which provides a ``str``-subclassing ``Path`` class; * Twisted's slightly specialized `FilePath class`_; * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than ``str``; * `Unipath`_, a variation on the str-subclassing approach with two public classes, an ``AbstractPath`` class for operations which don't do I/O and a ``Path`` class for all common operations. This proposal attempts to learn from these previous attempts and the rejection of :pep:`355`. .. _`path.py module`: https://github.com/jaraco/path.py .. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.File… .. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass .. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview Why an object-oriented API ========================== The rationale to represent filesystem paths using dedicated classes is the same as for other kinds of stateless objects, such as dates, times or IP addresses. Python has been slowly moving away from strictly replicating the C language's APIs to providing better, more helpful abstractions around all kinds of common functionality. Even if this PEP isn't accepted, it is likely that another form of filesystem handling abstraction will be adopted one day into the standard library. Indeed, many people will prefer handling dates and times using the high-level objects provided by the ``datetime`` module, rather than using numeric timestamps and the ``time`` module API. Moreover, using a dedicated class allows to enable desirable behaviours by default, for example the case insensitivity of Windows paths. Proposal ======== Class hierarchy --------------- The `pathlib`_ module implements a simple hierarchy of classes:: +----------+ | | ---------| PurePath |-------- | | | | | +----------+ | | | | | | | v | v +---------------+ | +------------+ | | | | | | PurePosixPath | | | PureNTPath | | | | | | +---------------+ | +------------+ | v | | +------+ | | | | | | -------| Path |------ | | | | | | | | | +------+ | | | | | | | | | | v v v v +-----------+ +--------+ | | | | | PosixPath | | NTPath | | | | | +-----------+ +--------+ This hierarchy divides path classes along two dimensions: * a path class can be either pure or concrete: pure classes support only operations that don't need to do any actual I/O, which are most path manipulation operations; concrete classes support all the operations of pure classes, plus operations that do I/O. * a path class is of a given flavour according to the kind of operating system paths it represents. `pathlib`_ implements two flavours: NT paths for the filesystem semantics embodied in Windows systems, POSIX paths for other systems (``os.name``'s terminology is re-used here). Any pure class can be instantiated on any system: for example, you can manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects under Unix, and so on. However, concrete classes can only be instantiated on a matching system: indeed, it would be error-prone to start doing I/O with ``NTPath`` objects under Unix, or vice-versa. Furthermore, there are two base classes which also act as system-dependent factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a ``PureNTPath`` depending on the operating system. Similarly, ``Path`` will instantiate either a ``PosixPath`` or a ``NTPath``. It is expected that, in most uses, using the ``Path`` class is adequate, which is why it has the shortest name of all. No confusion with builtins -------------------------- In this proposal, the path classes do not derive from a builtin type. This contrasts with some other Path class proposals which were derived from ``str``. They also do not pretend to implement the sequence protocol: if you want a path to act as a sequence, you have to lookup a dedicate attribute (the ``parts`` attribute). By avoiding to pass as builtin types, the path classes minimize the potential for confusion if they are combined by accident with genuine builtin types. Immutability ------------ Path objects are immutable, which makes them hashable and also prevents a class of programming errors. Sane behaviour -------------- Little of the functionality from os.path is reused. Many os.path functions are tied by backwards compatibility to confusing or plain wrong behaviour (for example, the fact that ``os.path.abspath()`` simplifies ".." path components without resolving symlinks first). Also, using classes instead of plain strings helps make system-dependent behaviours natural. For example, comparing and ordering Windows path objects is case-insensitive, and path separators are automatically converted to the platform default. Useful notations ---------------- The API tries to provide useful notations all the while avoiding magic. Some examples:: >>> p = Path('/home/antoine/pathlib/setup.py') >>> p.name 'setup.py' >>> p.ext '.py' >>> p.root '/' >>> p.parts <PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']> >>> list(p.parents()) [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] >>> p.exists() True >>> p.st_size 928 Pure paths API ============== The philosophy of the ``PurePath`` API is to provide a consistent array of useful path manipulation operations, without exposing a hodge-podge of functions like ``os.path`` does. Definitions ----------- First a couple of conventions: * All paths can have a drive and a root. For POSIX paths, the drive is always empty. * A relative path has neither drive nor root. * A POSIX path is absolute if it has a root. A Windows path is absolute if it has both a drive *and* a root. A Windows UNC path (e.g. ``\\some\\share\\myfile.txt``) always has a drive and a root (here, ``\\some\\share`` and ``\\``, respectively). * A drive which has either a drive *or* a root is said to be anchored. Its anchor is the concatenation of the drive and root. Under POSIX, "anchored" is the same as "absolute". Construction and joining ------------------------ We will present construction and joining together since they expose similar semantics. The simplest way to construct a path is to pass it its string representation:: >>> PurePath('setup.py') PurePosixPath('setup.py') Extraneous path separators and ``"."`` components are eliminated:: >>> PurePath('a///b/c/./d/') PurePosixPath('a/b/c/d') If you pass several arguments, they will be automatically joined:: >>> PurePath('docs', 'Makefile') PurePosixPath('docs/Makefile') Joining semantics are similar to os.path.join, in that anchored paths ignore the information from the previously joined components:: >>> PurePath('/etc', '/usr', 'bin') PurePosixPath('/usr/bin') However, with Windows paths, the drive is retained as necessary:: >>> PureNTPath('c:/foo', '/Windows') PureNTPath('c:\\Windows') >>> PureNTPath('c:/foo', 'd:') PureNTPath('d:') Calling the constructor without any argument creates a path object pointing to the logical "current directory":: >>> PurePosixPath() PurePosixPath('.') A path can be joined with another using the ``__getitem__`` operator:: >>> p = PurePosixPath('foo') >>> p['bar'] PurePosixPath('foo/bar') >>> p[PurePosixPath('bar')] PurePosixPath('foo/bar') As with constructing, multiple path components can be specified at once:: >>> p['bar/xyzzy'] PurePosixPath('foo/bar/xyzzy') A join() method is also provided, with the same behaviour. It can serve as a factory function:: >>> path_factory = p.join >>> path_factory('bar') PurePosixPath('foo/bar') Representing ------------ To represent a path (e.g. to pass it to third-party libraries), just call ``str()`` on it:: >>> p = PurePath('/home/antoine/pathlib/setup.py') >>> str(p) '/home/antoine/pathlib/setup.py' >>> p = PureNTPath('c:/windows') >>> str(p) 'c:\\windows' To force the string representation with forward slashes, use the ``as_posix()`` method:: >>> p.as_posix() 'c:/windows' To get the bytes representation (which might be useful under Unix systems), call ``bytes()`` on it, or use the ``as_bytes()`` method:: >>> bytes(p) b'/home/antoine/pathlib/setup.py' Properties ---------- Five simple properties are provided on every path (each can be empty):: >>> p = PureNTPath('c:/pathlib/setup.py') >>> p.drive 'c:' >>> p.root '\\' >>> p.anchor 'c:\\' >>> p.name 'setup.py' >>> p.ext '.py' Sequence-like access -------------------- The ``parts`` property provides read-only sequence access to a path object:: >>> p = PurePosixPath('/etc/init.d') >>> p.parts <PurePosixPath.parts: ['/', 'etc', 'init.d']> Simple indexing returns the invidual path component as a string, while slicing returns a new path object constructed from the selected components:: >>> p.parts[-1] 'init.d' >>> p.parts[:-1] PurePosixPath('/etc') Windows paths handle the drive and the root as a single path component:: >>> p = PureNTPath('c:/setup.py') >>> p.parts <PureNTPath.parts: ['c:\\', 'setup.py']> >>> p.root '\\' >>> p.parts[0] 'c:\\' (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). The ``parent()`` method returns an ancestor of the path:: >>> p.parent() PureNTPath('c:\\python33\\bin') >>> p.parent(2) PureNTPath('c:\\python33') >>> p.parent(3) PureNTPath('c:\\') The ``parents()`` method automates repeated invocations of ``parent()``, until the anchor is reached:: >>> p = PureNTPath('c:/python33/bin/python.exe') >>> for parent in p.parents(): parent ... PureNTPath('c:\\python33\\bin') PureNTPath('c:\\python33') PureNTPath('c:\\') Querying -------- ``is_relative()`` returns True if the path is relative (see definition above), False otherwise. ``is_reserved()`` returns True if a Windows path is a reserved path such as ``CON`` or ``NUL``. It always returns False for POSIX paths. ``match()`` matches the path against a glob pattern:: >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY') True ``relative()`` returns a new relative path by stripping the drive and root:: >>> PurePosixPath('setup.py').relative() PurePosixPath('setup.py') >>> PurePosixPath('/setup.py').relative() PurePosixPath('setup.py') ``relative_to()`` computes the relative difference of a path to another:: >>> PurePosixPath('/usr/bin/python').relative_to('/usr') PurePosixPath('bin/python') ``normcase()`` returns a case-folded version of the path for NT paths:: >>> PurePosixPath('CAPS').normcase() PurePosixPath('CAPS') >>> PureNTPath('CAPS').normcase() PureNTPath('caps') Concrete paths API ================== In addition to the operations of the pure API, concrete paths provide additional methods which actually access the filesystem to query or mutate information. Constructing ------------ The classmethod ``cwd()`` creates a path object pointing to the current working directory in absolute form:: >>> Path.cwd() PosixPath('/home/antoine/pathlib') File metadata ------------- The ``stat()`` method caches and returns the file's stat() result; ``restat()`` forces refreshing of the cache. ``lstat()`` is also provided, but doesn't have any caching behaviour:: >>> p.stat() posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964) For ease of use, direct attribute access to the fields of the stat structure is provided over the path object itself:: >>> p.st_size 928 >>> p.st_mtime 1328287308.889562 Higher-level methods help examine the kind of the file:: >>> p.exists() True >>> p.is_file() True >>> p.is_dir() False >>> p.is_symlink() False The file owner and group names (rather than numeric ids) are queried through matching properties:: >>> p = Path('/etc/shadow') >>> p.owner 'root' >>> p.group 'shadow' Path resolution --------------- The ``resolve()`` method makes a path absolute, resolving any symlink on the way. It is the only operation which will remove "``..``" path components. Directory walking ----------------- Simple (non-recursive) directory access is done by iteration:: >>> p = Path('docs') >>> for child in p: child ... PosixPath('docs/conf.py') PosixPath('docs/_templates') PosixPath('docs/make.bat') PosixPath('docs/index.rst') PosixPath('docs/_build') PosixPath('docs/_static') PosixPath('docs/Makefile') This allows simple filtering through list comprehensions:: >>> p = Path('.') >>> [child for child in p if child.is_dir()] [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] Simple and recursive globbing is also provided:: >>> for child in p.glob('**/*.py'): child ... PosixPath('test_pathlib.py') PosixPath('setup.py') PosixPath('pathlib.py') PosixPath('docs/conf.py') PosixPath('build/lib/pathlib.py') File opening ------------ The ``open()`` method provides a file opening API similar to the builtin ``open()`` method:: >>> p = Path('setup.py') >>> with p.open() as f: f.readline() ... '#!/usr/bin/env python3\n' The ``raw_open()`` method, on the other hand, is similar to ``os.open``:: >>> fd = p.raw_open(os.O_RDONLY) >>> os.read(fd, 15) b'#!/usr/bin/env ' Filesystem alteration --------------------- Several common filesystem operations are provided as methods: ``touch()``, ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``, ``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be provided, for example some of the functionality of the shutil module. Experimental openat() support ----------------------------- On compatible POSIX systems, the concrete PosixPath class can take advantage of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of open file descriptors as necessary. Support is enabled by passing the *use_openat* argument to the constructor:: >>> p = Path(".", use_openat=True) Then all paths constructed by navigating this path (either by iteration or indexing) will also use the openat() family of functions. The point of using these functions is to avoid race conditions whereby a given directory is silently replaced with another (often a symbolic link to a sensitive system location) between two accesses. .. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 [View Less]

40 203

The async API of the future: PEP 3153 (async-pep)
by Guido van Rossum Oct. 14, 2012

Oct. 14, 2012

[Hopefully this is the last spin-off thread from "asyncore: included batteries don't fit"] [LvH] >> > If there's one take away idea from async-pep, it's reusable protocols. [Guido] >> Is there a newer version that what's on >> http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any >> specific proposals, after spending a lot of time giving a rationale >> and defining some terms. The version on >> https://github.com/lvh/async-pep doesn't … [View More]seem to be any more complete. [LvH] > Correct. So it's totally unfinished? > If I had to change it today, I'd throw out consumers and producers and just > stick to a protocol API. > > Do you feel that there should be less talk about rationale? No, but I feel that there should be some actual specification. I am also looking forward to an actual meaty bit of example code -- ISTR you mentioned you had something, but that it was incomplete, and I can't find the link. >> > The PEP should probably be a number of PEPs. At first sight, it seems >> > that this number is at least four: >> > >> > 1. Protocol and transport abstractions, making no mention of >> > asynchronous IO >> > (this is what I want 3153 to be, because it's small, manageable, and >> > virtually everyone appears to agree it's a fantastic idea) >> >> But the devil is in the details. *What* specifically are you >> proposing? How would you write a protocol handler/parser without any >> reference to I/O? Most protocols are two-way streets -- you read some >> stuff, and you write some stuff, then you read some more. (HTTP may be >> the exception here, if you don't keep the connection open.) > > It's not that there's *no* reference to IO: it's just that that reference is > abstracted away in data_received and the protocol's transport object, just > like Twisted's IProtocol. The words "data_received" don't even occur in the PEP. >> > 2. A base reactor interface >> >> I agree that this should be a separate PEP. But I do think that in >> practice there will be dependencies between the different PEPs you are >> proposing. > > Absolutely. > >> > 3. A way of structuring callbacks: probably deferreds with a built-in >> > inlineCallbacks for people who want to write synchronous-looking code >> > with >> > explicit yields for asynchronous procedures >> >> Your previous two ideas sound like you're not tied to backward >> compatibility with Tornado and/or Twisted (not even via an adaptation >> layer). Given that we're talking Python 3.4 here that's fine with me >> (though I think we should be careful to offer a path forward for those >> packages and their users, even if it means making changes to the >> libraries). > > I'm assuming that by previous ideas you mean points 1, 2: protocol interface > + reactor interface. Yes. > I don't see why twisted's IProtocol couldn't grow an adapter for stdlib > Protocols. Ditto for Tornado. Similarly, the reactor interface could be > *provided* (through a fairly simple translation layer) by different > implementations, including twisted. Right. >> But Twisted Deferred is pretty arcane, and I would much >> rather not use it as the basis of a forward-looking design. I'd much >> rather see what we can mooch off PEP 3148 (Futures). > > I think this needs to be addressed in a separate mail, since more stuff has > been said about deferreds in this thread. Yes, that's in the thread with subject "The async API of the future: Twisted and Deferreds". >> > 4+ adapting the stdlib tools to using these new things >> >> We at least need to have an idea for how this could be done. We're >> talking serious rewrites of many of our most fundamental existing >> synchronous protocol libraries (e.g. httplib, email, possibly even >> io.TextWrapper), most of which have had only scant updates even >> through the Python 3 transition apart from complications to deal with >> the bytes/str dichotomy. > > I certainly agree that this is a very large amount of work. However, it has > obvious huge advantages in terms of code reuse. I'm not sure if I understand > the technical barrier though. It should be quite easy to create a blocking > API with a protocol implementation that doesn't care; just call > data_received with all your data at once, and presto! (Since transports in > general don't provide guarantees as to how bytes will arrive, existing > Twisted IProtocols have to do this already anyway, and that seems to work > fine.) Hmm... I guess that depends on how your legacy code works. As Barry mentioned somewhere, the email package's feedparser() is an attempt at implementing this -- but he sounded he has doubts that it works as-is in an async environment. However I am more worried about pull-based APIs. Take (as an extreme example) the standard stream API for reading, especially TextIOWrapper. I could see how we could turn the *writing* APIs async easily enough, but I don't see how to do it for the reading end -- you can't seriously propose to read the entire file into the buffer and then satisfy all reads from memory. >> > Re: forward path for existing asyncore code. I don't remember this being >> > raised as an issue. If anything, it was mentioned in passing, and I think >> > the answer to it was something to the tune of "asyncore's API is broken, >> > fixing it is more important than backwards compat". Essentially I agree with >> > Guido that the important part is an upgrade path to a good third-party >> > library, which is the part about asyncore that REALLY sucks right now. >> >> I have the feeling that the main reason asyncore sucks is that it >> requires you to subclass its Dispatcher class, which has a rather >> treacherous interface. > > There's at least a few others, but sure, that's an obvious one. Many of the > objections I can raise however don't matter if there's already an *existing > working solution*. I mean, sure, it can't do SSL, but if you have code that > does what you want right now, then obviously SSL isn't actually needed. I think you mean this as an indication that providing the forward path for existing asyncore apps shouldn't be rocket science, right? Sure, I don't want to worry about that, I just want to make sure that we don't *completely* paint ourselves into the wrong corner when it comes to that. >> > Regardless, an API upgrade is probably a good idea. I'm not sure if it >> > should go in the first PEP: given the separation I've outlined above (which >> > may be too spread out...), there's no obvious place to put it besides it >> > being a new PEP. >> >> Aren't all your proposals API upgrades? > > Sorry, that was incredibly poor wording. I meant something more of an > adapter: an upgrade path for existing asyncore code to new and shiny 3153 > code. Yes, now it makes sense. >> > Re base reactor interface: drawing maximally from the lessons learned in >> > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, >> > etc), asynchronous-looking name lookup, fd handling are the important >> > parts. >> >> That actually sounds more concrete than I'd like a reactor interface >> to be. In the App Engine world, there is a definite need for a >> reactor, but it cannot talk about file descriptors at all -- all I/O >> is defined in terms of RPC operations which have their own (several >> layers of) async management but still need to be plugged in to user >> code that might want to benefit from other reactor functionality such >> as scheduling and placing a call at a certain moment in the future. > > I have a hard time understanding how that would work well outside of > something like GAE. IIUC, that level of abstraction was chosen because it > made sense for GAE (and I don't disagree), but I'm not sure it makes sense > here. I think I answered this in the reactors thread -- I propose an I/O object abstraction that is not directly tied to a file descriptor, but for which a concrete implementation can be made to support file descriptors, and another to support App Engine RPC. > In this example, where would eg the select/epoll/whatever calls happen? Is > it something that calls the reactor that then in turn calls whatever? App Engine doesn't have select/epoll/whatever, so it would have a reactor implementation that doesn't use them. But the standard Unix reactor would support file descriptors using select/etc. Please respond in the reactors thread. >> > call_every can be implemented in terms of call_later on a separate object, >> > so I think it should be (eg twisted.internet.task.LoopingCall). One thing >> > that is apparently forgotten about is event loop integration. The prime way >> > of having two event loops cooperate is *NOT* "run both in parallel", it's >> > "have one call the other". Even though not all loops support this, I think >> > it's important to get this as part of the interface (raise an exception for >> > all I care if it doesn't work). >> >> This is definitely one of the things we ought to get right. My own >> thoughts are slightly (perhaps only cosmetically) different again: >> ideally each event loop would have a primitive operation to tell it to >> run for a little while, and then some other code could tie several >> event loops together. > > As an API, that's pretty close to Twisted's IReactorCore.iterate, I think. > It'd work well enough. The issue is only with event loops that don't > cooperate so well. Again, a topic for the reactor thread. But I'm really hoping you'll make good on your promise of redoing async-pep, giving some actual specifications and example code, so I can play with it. -- --Guido van Rossum (python.org/~guido) [View Less]

3 4

Re: [Python-ideas] The async API of the future: Twisted and Deferreds
by Jasper St. Pierre Oct. 14, 2012

Oct. 14, 2012

(Sorry if this is in the wrong place, I'm joining the conversation and I'm not sure where mailman will put it) > Alternatively, yielding a future (or whatever ones calls the objects > returned by *_async()) could register *and* wait for the result. To > register without waiting one would yield a wrapper for the future. So > one could write What would registering a Future do? As far as I understood it, the plan here is that a Future was just a marker for an outstanding request: … [View More]

2 3

Re: [Python-ideas] The async API of the future: Twisted and Deferreds
by Itamar Turner-Trauring Oct. 14, 2012

Oct. 14, 2012

(Sorry if this doesn't end up in the right thread in mail clients; I've been reading this through a web UI and only just formally subscribed so can't reply directly to the correct email.) Code that uses generators is indeed often easier to read... but the problem is that this isn't just a difference in syntax, it has a significant semantic impact. Specifically, requiring yield means that you're re-introducing context switching. In inlineCallbacks, or coroutines, or any system that use yield as … [View More]

5 7

Re: [Python-ideas] PEP 428 - object-oriented filesystem paths
by Antoine Pitrou Oct. 13, 2012

Oct. 13, 2012

Le samedi 13 octobre 2012 à 19:47 +1000, Nick Coghlan a écrit : > The problem is that "Windows path" and "Posix path" aren't really > accurate. There are a bunch of degrees of freedom, which is *exactly* > the problem the context pattern is designed to deal with without a > combinatorial explosion of different types or mixins. > > The "how is the string format determined?" aspect could be handled > with separate methods, but how do you do case insensitive comparisons > … [View More]

5 7

PEP 428: poll about the joining syntax
by Antoine Pitrou Oct. 13, 2012

Oct. 13, 2012

Hello, Since there has been some controversy about the joining syntax used in PEP 428 (filesystem path objects), I would like to run an informal poll about it. Please answer with +1/+0/-0/-1 for each proposal: - `p[q]` joins path q to path p - `p + q` joins path q to path p - `p / q` joins path q to path p - `p.join(q)` joins path q to path p (you can include a rationale if you want, but don't forget to vote :-)) Thank you Antoine. -- Software development and contracting: http://pro.pitrou.net

44 81

Re: [Python-ideas] Floating point contexts in Python core
by Mark Adam Oct. 13, 2012

Oct. 13, 2012

On Fri, Oct 12, 2012 at 5:54 PM, Mark Adam <dreamingforward(a)gmail.com> wrote: > On Thu, Oct 11, 2012 at 8:03 PM, Steven D'Aprano <steve(a)pearwood.info> wrote: >>>> I would gladly give up a small amount of speed for better control >>>> over floats, such as whether 1/0.0 raised an exception or >>>> returned infinity. >>> >>> Umm, you would be giving up a *lot* of speed. Native floating point >>> happens right in the … [View More]

1 0