On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido(a)python.org> wrote:
> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene(a)stranden.com> wrote:
>> On the high level (Python) basically what you need is that the queue.get()
>> can handle:
>> 1) Python objects (as today)
>> 2) timeout (as today, maybe in mills instead of seconds)
>> 3) Network (socket input/state change)
>> 4) File desc input/state change
>> 5) Other I/O changes like serial comm, etc.
>> 6) Maybe also yield based coroutine support ?
>>
>> This requires support from the underlaying
>> OS. A support which is probably not there today ?
>>
>> As far as I can see, having this one extended queue.get() would nicely enable
>> all high level concurrency issues in Python.
>
> [...]
>
>> I believe a "super" queue.get() would solve all use cases.
>>
>> I have no idea on how difficult it would be to implement in
>> a cross platform manner.
>
> Hm. I know that a common (and often right!) recommendation for thread
> communication is to use the queue module. But that module is meant to
> work with threads. I think that the correct I/O primitives are more
> likely to come by looking at what Tornado and Twisted have done than
> by trying to "pimp up" the queue module -- it's good for what it does,
> but trying to add all that new functionality to it doesn't sound like
> a good fit.
You are probably right about the queue class. Maybe it should be a new class,
but I still believe I would be an excellent fit for doing concurrent stuff if Python
had a multiplexer message queue, Python is high-level enough to be able to
hide thread/select/read etc.
A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which
is a kind of Erlang implementation for Python, making objects concurrent and return
values Futures, without adding much new code. Methods are sent asynchronous, simply
by doing standard obj.method(). obj is a proxy for the real object sending method() as a
message to the real object running in a separate thread. Return value is a Future. So
you can do
val = obj.method()
… continue async with method()
… and do some other stuff, until:
print val
which will hang waiting for the Future to complete, if it's not.
It has been used in a couple of projects, making it much easier to do concurrent systems.
But, it would be great if the object/task could wait for more events than queue.get()
br
/Rene
>
> --
> --Guido van Rossum (python.org/~guido)
On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde <syrion(a)gmail.com> wrote:
> Is anything gained from this addition?
To give a practical answer, I could say that for newbies it's one small
confusion that could removed from the language. You and I have been
programming for a long time so we take it for granted that * means
multiplication, but for any other person that's just another
weird idiosyncrasy that further alienates programming.
Also, I think that using * for multiplication is ugly.
>
> On Fri, Oct 12, 2012 at 4:37 PM, Ram Rachum <ram.rachum(a)gmail.com> wrote:
> >
> >
> > On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham <mikegraham(a)gmail.com>
> wrote:
> >>
> >> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum <ram.rachum(a)gmail.com>
> wrote:
> >> > Hi everybody,
> >> >
> >> > Today a funny thought occurred to me. Ever since I've learned to
> program
> >> > when I was a child, I've taken for granted that when programming, the
> >> > sign
> >> > used for multiplication is *. But now that I think about it, why? Now
> >> > that
> >> > we have Unicode, why not use · ?
> >> >
> >> > Do you think that we can make Python support · in addition to *?
> >> >
> >> > I can think of a couple of problems, but none of them seem like
> >> > deal-breakers:
> >> >
> >> > - Backward compatibility: Python already uses *, but I don't see a
> >> > backward
> >> > compatibility problem with supporting · additionally. Let people use
> >> > whichever they want, like spaces and tabs.
> >> > - Input methods: I personally use an IDE that could be easily set to
> >> > automatically convert * to · where appropriate and to allow manual
> input
> >> > of
> >> > ·. People on Linux can type Alt-. . Anyone else can set up a script
> >> > that'll
> >> > let them type · using whichever keyboard combination they want. I
> admit
> >> > this
> >> > is pretty annoying, but since you can always use * if you want to, I
> >> > figure
> >> > that anyone who cares enough about using · instead of * (I bet that
> >> > people
> >> > in scientific computing would like that) would be willing to take the
> >> > time
> >> > to set it up.
> >> >
> >> >
> >> > What do you think?
> >> >
> >> >
> >> > Ram
> >>
> >> Python should not expect characters that are hard for most people to
> >> type.
> >
> >
> > No one will be forced to type it. If you can't type it, use *.
> >
> >
> >>
> >> Python should not expect characters that are still hard to
> >> display on many common platforms.
> >
> >
> > We allow people to have unicode variable names, if they wish, don't we?
> So
> > why not allow them to use unicode operator, if they wish, as a completely
> > optional thing?
> >
> >>
> >>
> >> I think you'll find strong opposition to adding any non-ASCII
> >> characters or characters that don't occur on almost all keyboards as
> >> part of the language.
> >>
> >> Mike
> >
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas(a)python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >
>
[This is the third spin-off thread from "asyncore: included batteries
don't fit"]
On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre
<jeanpierreda(a)gmail.com> wrote:
> On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido(a)python.org> wrote:
>> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
>> <jeanpierreda(a)gmail.com> wrote:
>>> Could you be more specific? I've never heard Deferreds in particular
>>> called "arcane". They're very popular in e.g. the JS world,
>>
>> Really? Twisted is used in the JS world? Or do you just mean the
>> pervasiveness of callback style async programming?
>
> Ah, I mean Deferreds. I attended a talk earlier this year all about
> deferreds in JS, and not a single reference to Python or Twisted was
> made!
>
> These are the examples I remember mentioned in the talk:
>
> - http://api.jquery.com/category/deferred-object/ (not very twistedish
> at all, ill-liked by the speaker)
> - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe
> not a good example, mochikit tries to be "python in JS")
> - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html
> - https://github.com/kriskowal/q (also includes an explanation of why
> the author likes deferreds)
>
> There were a few more that the speaker mentioned, but didn't cover.
> One of his points was that the various systems of deferreds are subtly
> different, some very badly so, and that it was a mess, but that
> deferreds were still awesome. JS is a language where async programming
> is mainstream, so lots of people try to make it easier, and they all
> do it slightly differently.
Thanks for those links. I followed the kriskowal/q link and was
reminded of why Twisted's Deferreds are considered more awesome than
Futures: it's the chaining.
BUT... That's only important if callbacks are all the language lets
you do! If your baseline is this:
step1(function (value1) {
step2(value1, function(value2) {
step3(value2, function(value3) {
step4(value3, function(value4) {
// Do something with value4
});
});
});
});
then of course the alternative using Deferred looks better:
Q.fcall(step1)
.then(step2)
.then(step3)
.then(step4)
.then(function (value4) {
// Do something with value4
}, function (error) {
// Handle any error from step1 through step4
})
.end();
(Both quoted literally from the kriskowal/q link.)
I also don't doubt that using classic Futures you can't do this -- the
chaining really matter for this style, and I presume this (modulo
unimportant API differences) is what typical Twisted code looks like.
However, Python has yield, and you can do much better (I'll write
plain yield for now, but it works the same with yield-from):
try:
value1 = yield step1(<args>)
value2 = yield step2(value1)
value3 = yield step3(value2)
# Do something with value4
except Exception:
# Handle any error from step1 through step4
There's an outer function missing here, since you can't have a
toplevel yield; I think that's the same for the JS case, typically.
Also, strictly speaking the "Do something with value4" code should
probably be in an else: clause after the except handler. But that
actually leads nicely to the advantage:
This form is more flexible, since it is easier to catch different
exceptions at different points. It is also much easier to pass extra
information around. E.g. what if your flow ends up having to pass both
value1 and value2 into step3()? Sure, you can do that by making value2
a tuple (or a dict, or an object) incorporating value1 and the
original value2, but that's exactly where this style becomes
cumbersome, whereas in the yield-based form, such things can remain
simple local variables. All in all I find it more readable.
In the past, when I pointed this out to Twisted aficionados, the
responses usually were a mix of "sure, if you like that style, we got
it covered, Twisted has inlineCallbacks," and "but that only works for
the simple cases, for the real stuff you still need Deferreds." But
that really sounds to me like Twisted people just liking what they've
got and not wanting to change. Which I understand -- I don't want to
change either. But I also observe that a lot of people find bare
Twisted-with-Deferreds too hard to grok, so they use Tornado instead,
or they build a layer on top of either (like Monocle), or they go a
completely different route and use greenlets/gevent instead -- and get
amazing performance and productivity that way too, even though they
know it's monkey-patching their asses off...
So, in the end, for Python 3.4 and beyond, I want to promote a style
that mixes simple callbacks (perhaps augmented with simple Futures)
and generator-based coroutines (either PEP 342, yield/send-based, or
PEP 380 yield-from-based). I'm looking to Twisted for the best
reactors (see other thread). But for transport/protocol
implementations I think that generator/coroutines offers a cleaner,
better interface than incorporating Deferred.
I hope that the path forward for Twisted will be simple enough: it
should be possible to hook Deferred into the simpler callback APIs
(perhaps a new implementation using some form of adaptation, but
keeping the interface the same). In a sense, the greenlet/gevent crowd
will be the biggest losers, since they currently write async code
without either callbacks or yield, using microthreads instead. I
wouldn't want to have to start putting yield back everywhere into that
code. But the stdlib will still support yield-free blocking calls
(even if under the hood some of these use yield/send-based or
yield-from-based couroutines) so the monkey-patchey tradition can
continue.
>> That's one of the
>> things I am desperately trying to keep out of Python, I find that
>> style unreadable and unmanageable (whenever I click on a button in a
>> website and nothing happens I know someone has a bug in their
>> callbacks). I understand you feel different; but I feel the general
>> sentiment is that callback-based async programming is even harder than
>> multi-threaded programming (and nobody is claiming that threads are
>> easy :-).
>
> :S
>
> There are (at least?) four different styles of asynchronous
> computation used in Twisted, and you seem to be confused as to which
> ones I'm talking about.
>
> 1. Explicit callbacks:
>
> For example, reactor.callLater(t, lambda: print("woo hoo"))
I actually like this, as it's a lowest-common-denominator approach
which everyone can easily adapt to their purposes. See the thread I
started about reactors.
> 2. Method dispatch callbacks:
>
> Similar to the above, the reactor or somebody has a handle on your
> object, and calls methods that you've defined when events happen
> e.g. IProtocol's dataReceived method
While I'm sure it's expedient and captures certain common patterns
well, I like this the least of all -- calling fixed methods on an
object sounds like a step back; it smells of the old Java way (before
it had some equivalent of anonymous functions), and of asyncore, which
(nearly) everybody agrees is kind of bad due to its insistence that
you subclass its classes. (Notice how subclassing as the prevalent
approach to structuring your code has gotten into a lot of discredit
since 1996.)
> 3. Deferred callbacks:
>
> When you ask for something to be done, it's set up, and you get an
> object back, which you can add a pipeline of callbacks to that will be
> called whenever whatever happens
> e.g. twisted.internet.threads.deferToThread(print,
> "x").addCallback(print, "x was printed in some other thread!")
Discussed above.
> 4. Generator coroutines
>
> These are a syntactic wrapper around deferreds. If you yield a
> deferred, you will be sent the result if the deferred succeeds, or an
> exception if the deferred fails.
> e.g. examples from previous message
Seeing them as syntactic sugar for Deferreds is one way of looking at
it; no doubt this is how they're seen in the Twisted community because
Deferreds are older and more entrenched. But there's no requirement
that an architecture has to have Deferreds in order to use generator
coroutines -- simple Futures will do just fine, and Greg Ewing has
shown that using yield-from you can even do without those. (But he
does use simple, explicit callbacks at the lowest level of his
system.)
> I don't see a reason for the first to exist at all, the second one is
> kind of nice in some circumstances (see below), but perhaps overused.
>
> I feel like you're railing on the first and second when I'm talking
> about the third and fourth. I could be wrong.
I think you're wrong -- I was (and am) most concerned about the
perceived complexity of the API offered by, and the typical looks of
code using, Deferreds (i.e., #3).
>>> and possibly elsewhere. Moreover, they're extremely similar to futures, so
>>> if one is arcane so is the other.
>>
>> I love Futures, they represent a nice simple programming model. But I
>> especially love that you can write async code using Futures and
>> yield-based coroutines (what you call inlineCallbacks) and never have
>> to write an explicit callback function. Ever.
>
> The reason explicit non-deferred callbacks are involved in Twisted is
> because of situations in which deferreds are not present, because of
> past history in Twisted. It is not at all a limitation of deferreds or
> something futures are better at, best as I'm aware.
>
> (In case that's what you're getting at.)
I don't think I was. It's clear to me (now) that Futures are simpler
than Deferreds -- and I like Futures better because of it, because for
the complex cases I would much rather use generator coroutines than
Deferreds.
> Anyway, one big issue is that generator coroutines can't really
> effectively replace callbacks everywhere. Consider the GUI button
> example you gave. How do you write that as a coroutine?
>
> I can see it being written like this:
>
> def mycoroutine(gui):
> while True:
> clickevent = yield gui.mybutton1.on_click()
> # handle clickevent
>
> But that's probably worse than using callbacks.
I touched on this briefly in the reactor thread. Basically, GUI
callbacks are often level-triggered rather than edge-triggered, and
IIUC Deferreds are not great for that either; and in a few cases where
edge-triggered coding makes sense I *would* like to use a generator
coroutine.
>>> Neither is clearly better or more obvious than the other. If anything
>>> I generally find deferred composition more useful than deferred
>>> tee-ing, so I feel like composition is the correct base operator, but
>>> you could pick another.
>>
>> If you're writing long complicated chains of callbacks that benefit
>> from these features, IMO you are already doing it wrong. I understand
>> that this is a matter of style where I won't be able to convince you.
>> But style is important to me, so let's agree to disagree.
[In a follow-up to yourself, you quoted starting from this point and
appended "Nevermind that whole segment." I'm keeping it in here just
for context of the thread.]
> This is more than a matter of style, so at least for now I'd like to
> hold off on calling it even.
>
> In my day to day silly, synchronous, python code, I do lots of
> synchronous requests. For example, it's not unreasonable for me to
> want to load two different files from disk, or make several database
> interactions, etc. If I want to make this asynchronous, I have to find
> a way to execute multiple things that could hypothetically block, at
> the same time. If I can't do that easily, then the asynchronous
> solution has failed, because its entire purpose is to do everything
> that I do synchronously, except without blocking the main thread.
>
> Here's an example with lots of synchronous requests in Django:
>
> def view_paste(request, filekey):
> try:
> fileinfo= Pastes.objects.get(key=filekey)
> except DoesNotExist:
> t = loader.get_template('pastebin/error.html')
> return HttpResponse(t.render(Context(dict(error='File does not exist'))))
>
> f = open(fileinfo.filename)
> fcontents = f.read()
> t = loader.get_template('pastebin/paste.html')
> return HttpResponse(t.render(Context(dict(file=fcontents))))
>
> How many blocking requests are there? Lots. This is, in a word, a
> long, complicated chain of synchronous requests. This is also very
> similar to what actual django code might look like in some
> circumstances. Even if we might think this is unreasonable, some
> subset of alteration of this is reasonable. Certainly we should be
> able to, say, load multiple (!) objects from the database, and open
> the template (possibly from disk), all potentially-blocking
> operations.
>
> This is inherently a long, complicated chain of requests, whether we
> implement it asynchronously or synchronously, or use Deferreds or
> Futures, or write it in Java or Python. Some parts can be done at any
> time before the end (loader.get_template(...)), some need to be done
> in a certain order, and there's branching depending on what happens in
> different cases. In order to even write this code _at all_, we need a
> way to chain these IO actions together. If we can't chain them
> together, we can't produce that final synthesis of results at the end.
[This is here you write "Ugh, just realized way after the fact that of
course you meant callbacks, not composition. I feel dumb. Nevermind
that whole segment."]
I'd like to come back to that Django example though. You are implying
that there are some opportunities for concurrency here, and I agree,
assuming we believe disk I/O is slow enough to bother making it
asynchronously. (In App Engine it's not, and we can't anyways, but in
other contexts I agree that it would be bad if a slow disk seek were
to hold up all processing -- not to mention that it might really be
NFS...)
The potentially async operations I see are:
(1) fileinfo = Pastes.objects.get(key=filekey) # I assume this is
some kind of database query
(2) loader.get_template('pastebin/error.html')
(3) f = open(fileinfo.filename) # depends on (1)
(4) fcontents = f.read() # depends on (3)
(5) loader.get_template('pastebin/paste.html')
How would you code that using Twisted Deferreds?
Using Futures and generator coroutines, I would do it as follows. I'm
hypothesizing that for every blocking API foo() there is a
corresponding non-blocking API foo_async() with the same call
signature, and returning a Future whose result is what the synchronous
API returns (and raises what the synchronous call would raise, if
there's an error). These are the conventions I use in NDB. I'm also
inventing a @task decorator.
@task
def view_paste_async(request, filekey):
# Create Futures -- no yields!
f1 = Pastes.objects.get_async(key=filekey) # This won't raise
f2 = loader.get_template_async('pastebin/error.html')
f3 = loader.get_template_async('pastebin/paste.html')
try:
fileinfo= yield f1
except DoesNotExist:
t = yield f2
return HttpResponse(t.render(Context(dict(error='File does not
exist'))))
f = yield open_async(fileinfo.filename)
fcontents = yield f.read_async()
t = yield f3
return HttpResponse(t.render(Context(dict(file=fcontents))))
You could easily decide not to bother loading the error template
asynchronously (assuming most requests don't fail), and you could move
the creation of f3 below the try/except. But you get the idea. Even if
you do everything serially, inserting the yields and _async calls
would make this more parallellizable without the use of threads. (If
you were using threads, all this would be moot of course -- but then
your limit on requests being handled concurrently probably goes way
down.)
> We _need_ a pipeline or something computationally equivalent or more
> powerful. Results from past "deferred computations" need to be passed
> forward into future "deferred computations", in order to implement
> this at all.
Yeah, and I think that a single generator using multiple yields is the
ideal pipeline to me (see my example near the top based on
kriskowal/q).
> This is not a style issue, this is an issue of needing to be able to
> solve problems that involve more than one computation where the
> results of every computation matters somewhere. It's just that in this
> case, some of the computations are computed asynchronously.
And I think generators do this very well.
>> I am totally open to learning from Twisted's experience. I hope that
>> you are willing to share even the end result might not look like
>> Twisted at all -- after all in Python 3.3 we have "yield from" and
>> return from a generator and many years of experience with different
>> styles of async APIs. In addition to Twisted, there's Tornado and
>> Monocle, and then there's the whole greenlets/gevent and
>> Stackless/microthreads community that we can't completely ignore. I
>> believe somewhere is an ideal async architecture, and I hope you can
>> help us discover it.
>>
>> (For example, I am very interested in Twisted's experiences writing
>> real-world performant, robust reactors.)
>
> For that stuff, you'd have to speak to the main authors of Twisted.
> I'm just a twisted user. :(
They seem to be mostly ignoring this conversation, so your standing in
as a proxy for them is much appreciated!
> In the end it really doesn't matter what API you go with. The Twisted
> people will wrap it up so that they are compatible, as far as that is
> possible.
And I want to ensure that that is possible and preferably easy, if I
can do it without introducing too many warts in the API that
non-Twisted users see and use.
> I hope I haven't detracted too much from the main thrust of the
> surrounding discussion. Futures/deferreds are a pretty big tangent, so
> sorry. I justified it to myself by figuring that it'd probably come up
> anyway, somehow, since these are useful abstractions for asynchronous
> programming.
Not at all. This has been a valuable refresher for me!
--
--Guido van Rossum (python.org/~guido)
Hello,
This PEP is a resurrection of the idea of having object-oriented
filesystem paths in the stdlib. It comes with a general API proposal
as well as a specific implementation (*). The implementation is young
and discussion is quite open.
(*) http://pypi.python.org/pypi/pathlib/
Regards
Antoine.
PS: You can all admire my ASCII-art skills.
PEP: 428
Title: The pathlib module -- object-oriented filesystem paths
Version: $Revision$
Last-Modified: $Date
Author: Antoine Pitrou <solipsis(a)pitrou.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-July-2012
Python-Version: 3.4
Post-History:
Abstract
========
This PEP proposes the inclusion of a third-party module, `pathlib`_, in
the standard library. The inclusion is proposed under the provisional
label, as described in :pep:`411`. Therefore, API changes can be done,
either as part of the PEP process, or after acceptance in the standard
library (and until the provisional label is removed).
The aim of this library is to provide a simple hierarchy of classes to
handle filesystem paths and the common operations users do over them.
.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
Related work
============
An object-oriented API for filesystem paths has already been proposed
and rejected in :pep:`355`. Several third-party implementations of the
idea of object-oriented filesystem paths exist in the wild:
* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
and others, which provides a ``str``-subclassing ``Path`` class;
* Twisted's slightly specialized `FilePath class`_;
* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
``str``;
* `Unipath`_, a variation on the str-subclassing approach with two public
classes, an ``AbstractPath`` class for operations which don't do I/O and a
``Path`` class for all common operations.
This proposal attempts to learn from these previous attempts and the
rejection of :pep:`355`.
.. _`path.py module`: https://github.com/jaraco/path.py
.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.File…
.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
Why an object-oriented API
==========================
The rationale to represent filesystem paths using dedicated classes is the
same as for other kinds of stateless objects, such as dates, times or IP
addresses. Python has been slowly moving away from strictly replicating
the C language's APIs to providing better, more helpful abstractions around
all kinds of common functionality. Even if this PEP isn't accepted, it is
likely that another form of filesystem handling abstraction will be adopted
one day into the standard library.
Indeed, many people will prefer handling dates and times using the high-level
objects provided by the ``datetime`` module, rather than using numeric
timestamps and the ``time`` module API. Moreover, using a dedicated class
allows to enable desirable behaviours by default, for example the case
insensitivity of Windows paths.
Proposal
========
Class hierarchy
---------------
The `pathlib`_ module implements a simple hierarchy of classes::
+----------+
| |
---------| PurePath |--------
| | | |
| +----------+ |
| | |
| | |
v | v
+---------------+ | +------------+
| | | | |
| PurePosixPath | | | PureNTPath |
| | | | |
+---------------+ | +------------+
| v |
| +------+ |
| | | |
| -------| Path |------ |
| | | | | |
| | +------+ | |
| | | |
| | | |
v v v v
+-----------+ +--------+
| | | |
| PosixPath | | NTPath |
| | | |
+-----------+ +--------+
This hierarchy divides path classes along two dimensions:
* a path class can be either pure or concrete: pure classes support only
operations that don't need to do any actual I/O, which are most path
manipulation operations; concrete classes support all the operations
of pure classes, plus operations that do I/O.
* a path class is of a given flavour according to the kind of operating
system paths it represents. `pathlib`_ implements two flavours: NT paths
for the filesystem semantics embodied in Windows systems, POSIX paths for
other systems (``os.name``'s terminology is re-used here).
Any pure class can be instantiated on any system: for example, you can
manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
under Unix, and so on. However, concrete classes can only be instantiated
on a matching system: indeed, it would be error-prone to start doing I/O
with ``NTPath`` objects under Unix, or vice-versa.
Furthermore, there are two base classes which also act as system-dependent
factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
``PureNTPath`` depending on the operating system. Similarly, ``Path``
will instantiate either a ``PosixPath`` or a ``NTPath``.
It is expected that, in most uses, using the ``Path`` class is adequate,
which is why it has the shortest name of all.
No confusion with builtins
--------------------------
In this proposal, the path classes do not derive from a builtin type. This
contrasts with some other Path class proposals which were derived from
``str``. They also do not pretend to implement the sequence protocol:
if you want a path to act as a sequence, you have to lookup a dedicate
attribute (the ``parts`` attribute).
By avoiding to pass as builtin types, the path classes minimize the potential
for confusion if they are combined by accident with genuine builtin types.
Immutability
------------
Path objects are immutable, which makes them hashable and also prevents a
class of programming errors.
Sane behaviour
--------------
Little of the functionality from os.path is reused. Many os.path functions
are tied by backwards compatibility to confusing or plain wrong behaviour
(for example, the fact that ``os.path.abspath()`` simplifies ".." path
components without resolving symlinks first).
Also, using classes instead of plain strings helps make system-dependent
behaviours natural. For example, comparing and ordering Windows path
objects is case-insensitive, and path separators are automatically converted
to the platform default.
Useful notations
----------------
The API tries to provide useful notations all the while avoiding magic.
Some examples::
>>> p = Path('/home/antoine/pathlib/setup.py')
>>> p.name
'setup.py'
>>> p.ext
'.py'
>>> p.root
'/'
>>> p.parts
<PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
>>> list(p.parents())
[PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
>>> p.exists()
True
>>> p.st_size
928
Pure paths API
==============
The philosophy of the ``PurePath`` API is to provide a consistent array of
useful path manipulation operations, without exposing a hodge-podge of
functions like ``os.path`` does.
Definitions
-----------
First a couple of conventions:
* All paths can have a drive and a root. For POSIX paths, the drive is
always empty.
* A relative path has neither drive nor root.
* A POSIX path is absolute if it has a root. A Windows path is absolute if
it has both a drive *and* a root. A Windows UNC path (e.g.
``\\some\\share\\myfile.txt``) always has a drive and a root
(here, ``\\some\\share`` and ``\\``, respectively).
* A drive which has either a drive *or* a root is said to be anchored.
Its anchor is the concatenation of the drive and root. Under POSIX,
"anchored" is the same as "absolute".
Construction and joining
------------------------
We will present construction and joining together since they expose
similar semantics.
The simplest way to construct a path is to pass it its string representation::
>>> PurePath('setup.py')
PurePosixPath('setup.py')
Extraneous path separators and ``"."`` components are eliminated::
>>> PurePath('a///b/c/./d/')
PurePosixPath('a/b/c/d')
If you pass several arguments, they will be automatically joined::
>>> PurePath('docs', 'Makefile')
PurePosixPath('docs/Makefile')
Joining semantics are similar to os.path.join, in that anchored paths ignore
the information from the previously joined components::
>>> PurePath('/etc', '/usr', 'bin')
PurePosixPath('/usr/bin')
However, with Windows paths, the drive is retained as necessary::
>>> PureNTPath('c:/foo', '/Windows')
PureNTPath('c:\\Windows')
>>> PureNTPath('c:/foo', 'd:')
PureNTPath('d:')
Calling the constructor without any argument creates a path object pointing
to the logical "current directory"::
>>> PurePosixPath()
PurePosixPath('.')
A path can be joined with another using the ``__getitem__`` operator::
>>> p = PurePosixPath('foo')
>>> p['bar']
PurePosixPath('foo/bar')
>>> p[PurePosixPath('bar')]
PurePosixPath('foo/bar')
As with constructing, multiple path components can be specified at once::
>>> p['bar/xyzzy']
PurePosixPath('foo/bar/xyzzy')
A join() method is also provided, with the same behaviour. It can serve
as a factory function::
>>> path_factory = p.join
>>> path_factory('bar')
PurePosixPath('foo/bar')
Representing
------------
To represent a path (e.g. to pass it to third-party libraries), just call
``str()`` on it::
>>> p = PurePath('/home/antoine/pathlib/setup.py')
>>> str(p)
'/home/antoine/pathlib/setup.py'
>>> p = PureNTPath('c:/windows')
>>> str(p)
'c:\\windows'
To force the string representation with forward slashes, use the ``as_posix()``
method::
>>> p.as_posix()
'c:/windows'
To get the bytes representation (which might be useful under Unix systems),
call ``bytes()`` on it, or use the ``as_bytes()`` method::
>>> bytes(p)
b'/home/antoine/pathlib/setup.py'
Properties
----------
Five simple properties are provided on every path (each can be empty)::
>>> p = PureNTPath('c:/pathlib/setup.py')
>>> p.drive
'c:'
>>> p.root
'\\'
>>> p.anchor
'c:\\'
>>> p.name
'setup.py'
>>> p.ext
'.py'
Sequence-like access
--------------------
The ``parts`` property provides read-only sequence access to a path object::
>>> p = PurePosixPath('/etc/init.d')
>>> p.parts
<PurePosixPath.parts: ['/', 'etc', 'init.d']>
Simple indexing returns the invidual path component as a string, while
slicing returns a new path object constructed from the selected components::
>>> p.parts[-1]
'init.d'
>>> p.parts[:-1]
PurePosixPath('/etc')
Windows paths handle the drive and the root as a single path component::
>>> p = PureNTPath('c:/setup.py')
>>> p.parts
<PureNTPath.parts: ['c:\\', 'setup.py']>
>>> p.root
'\\'
>>> p.parts[0]
'c:\\'
(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
The ``parent()`` method returns an ancestor of the path::
>>> p.parent()
PureNTPath('c:\\python33\\bin')
>>> p.parent(2)
PureNTPath('c:\\python33')
>>> p.parent(3)
PureNTPath('c:\\')
The ``parents()`` method automates repeated invocations of ``parent()``, until
the anchor is reached::
>>> p = PureNTPath('c:/python33/bin/python.exe')
>>> for parent in p.parents(): parent
...
PureNTPath('c:\\python33\\bin')
PureNTPath('c:\\python33')
PureNTPath('c:\\')
Querying
--------
``is_relative()`` returns True if the path is relative (see definition
above), False otherwise.
``is_reserved()`` returns True if a Windows path is a reserved path such
as ``CON`` or ``NUL``. It always returns False for POSIX paths.
``match()`` matches the path against a glob pattern::
>>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
True
``relative()`` returns a new relative path by stripping the drive and root::
>>> PurePosixPath('setup.py').relative()
PurePosixPath('setup.py')
>>> PurePosixPath('/setup.py').relative()
PurePosixPath('setup.py')
``relative_to()`` computes the relative difference of a path to another::
>>> PurePosixPath('/usr/bin/python').relative_to('/usr')
PurePosixPath('bin/python')
``normcase()`` returns a case-folded version of the path for NT paths::
>>> PurePosixPath('CAPS').normcase()
PurePosixPath('CAPS')
>>> PureNTPath('CAPS').normcase()
PureNTPath('caps')
Concrete paths API
==================
In addition to the operations of the pure API, concrete paths provide
additional methods which actually access the filesystem to query or mutate
information.
Constructing
------------
The classmethod ``cwd()`` creates a path object pointing to the current
working directory in absolute form::
>>> Path.cwd()
PosixPath('/home/antoine/pathlib')
File metadata
-------------
The ``stat()`` method caches and returns the file's stat() result;
``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
but doesn't have any caching behaviour::
>>> p.stat()
posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
For ease of use, direct attribute access to the fields of the stat structure
is provided over the path object itself::
>>> p.st_size
928
>>> p.st_mtime
1328287308.889562
Higher-level methods help examine the kind of the file::
>>> p.exists()
True
>>> p.is_file()
True
>>> p.is_dir()
False
>>> p.is_symlink()
False
The file owner and group names (rather than numeric ids) are queried
through matching properties::
>>> p = Path('/etc/shadow')
>>> p.owner
'root'
>>> p.group
'shadow'
Path resolution
---------------
The ``resolve()`` method makes a path absolute, resolving any symlink on
the way. It is the only operation which will remove "``..``" path components.
Directory walking
-----------------
Simple (non-recursive) directory access is done by iteration::
>>> p = Path('docs')
>>> for child in p: child
...
PosixPath('docs/conf.py')
PosixPath('docs/_templates')
PosixPath('docs/make.bat')
PosixPath('docs/index.rst')
PosixPath('docs/_build')
PosixPath('docs/_static')
PosixPath('docs/Makefile')
This allows simple filtering through list comprehensions::
>>> p = Path('.')
>>> [child for child in p if child.is_dir()]
[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
Simple and recursive globbing is also provided::
>>> for child in p.glob('**/*.py'): child
...
PosixPath('test_pathlib.py')
PosixPath('setup.py')
PosixPath('pathlib.py')
PosixPath('docs/conf.py')
PosixPath('build/lib/pathlib.py')
File opening
------------
The ``open()`` method provides a file opening API similar to the builtin
``open()`` method::
>>> p = Path('setup.py')
>>> with p.open() as f: f.readline()
...
'#!/usr/bin/env python3\n'
The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
>>> fd = p.raw_open(os.O_RDONLY)
>>> os.read(fd, 15)
b'#!/usr/bin/env '
Filesystem alteration
---------------------
Several common filesystem operations are provided as methods: ``touch()``,
``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
provided, for example some of the functionality of the shutil module.
Experimental openat() support
-----------------------------
On compatible POSIX systems, the concrete PosixPath class can take advantage
of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
open file descriptors as necessary. Support is enabled by passing the
*use_openat* argument to the constructor::
>>> p = Path(".", use_openat=True)
Then all paths constructed by navigating this path (either by iteration or
indexing) will also use the openat() family of functions. The point of using
these functions is to avoid race conditions whereby a given directory is
silently replaced with another (often a symbolic link to a sensitive system
location) between two accesses.
.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
Copyright
=========
This document has been placed into the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
[Hopefully this is the last spin-off thread from "asyncore: included
batteries don't fit"]
[LvH]
>> > If there's one take away idea from async-pep, it's reusable protocols.
[Guido]
>> Is there a newer version that what's on
>> http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any
>> specific proposals, after spending a lot of time giving a rationale
>> and defining some terms. The version on
>> https://github.com/lvh/async-pep doesn't seem to be any more complete.
[LvH]
> Correct.
So it's totally unfinished?
> If I had to change it today, I'd throw out consumers and producers and just
> stick to a protocol API.
>
> Do you feel that there should be less talk about rationale?
No, but I feel that there should be some actual specification. I am
also looking forward to an actual meaty bit of example code -- ISTR
you mentioned you had something, but that it was incomplete, and I
can't find the link.
>> > The PEP should probably be a number of PEPs. At first sight, it seems
>> > that this number is at least four:
>> >
>> > 1. Protocol and transport abstractions, making no mention of
>> > asynchronous IO
>> > (this is what I want 3153 to be, because it's small, manageable, and
>> > virtually everyone appears to agree it's a fantastic idea)
>>
>> But the devil is in the details. *What* specifically are you
>> proposing? How would you write a protocol handler/parser without any
>> reference to I/O? Most protocols are two-way streets -- you read some
>> stuff, and you write some stuff, then you read some more. (HTTP may be
>> the exception here, if you don't keep the connection open.)
>
> It's not that there's *no* reference to IO: it's just that that reference is
> abstracted away in data_received and the protocol's transport object, just
> like Twisted's IProtocol.
The words "data_received" don't even occur in the PEP.
>> > 2. A base reactor interface
>>
>> I agree that this should be a separate PEP. But I do think that in
>> practice there will be dependencies between the different PEPs you are
>> proposing.
>
> Absolutely.
>
>> > 3. A way of structuring callbacks: probably deferreds with a built-in
>> > inlineCallbacks for people who want to write synchronous-looking code
>> > with
>> > explicit yields for asynchronous procedures
>>
>> Your previous two ideas sound like you're not tied to backward
>> compatibility with Tornado and/or Twisted (not even via an adaptation
>> layer). Given that we're talking Python 3.4 here that's fine with me
>> (though I think we should be careful to offer a path forward for those
>> packages and their users, even if it means making changes to the
>> libraries).
>
> I'm assuming that by previous ideas you mean points 1, 2: protocol interface
> + reactor interface.
Yes.
> I don't see why twisted's IProtocol couldn't grow an adapter for stdlib
> Protocols. Ditto for Tornado. Similarly, the reactor interface could be
> *provided* (through a fairly simple translation layer) by different
> implementations, including twisted.
Right.
>> But Twisted Deferred is pretty arcane, and I would much
>> rather not use it as the basis of a forward-looking design. I'd much
>> rather see what we can mooch off PEP 3148 (Futures).
>
> I think this needs to be addressed in a separate mail, since more stuff has
> been said about deferreds in this thread.
Yes, that's in the thread with subject "The async API of the future:
Twisted and Deferreds".
>> > 4+ adapting the stdlib tools to using these new things
>>
>> We at least need to have an idea for how this could be done. We're
>> talking serious rewrites of many of our most fundamental existing
>> synchronous protocol libraries (e.g. httplib, email, possibly even
>> io.TextWrapper), most of which have had only scant updates even
>> through the Python 3 transition apart from complications to deal with
>> the bytes/str dichotomy.
>
> I certainly agree that this is a very large amount of work. However, it has
> obvious huge advantages in terms of code reuse. I'm not sure if I understand
> the technical barrier though. It should be quite easy to create a blocking
> API with a protocol implementation that doesn't care; just call
> data_received with all your data at once, and presto! (Since transports in
> general don't provide guarantees as to how bytes will arrive, existing
> Twisted IProtocols have to do this already anyway, and that seems to work
> fine.)
Hmm... I guess that depends on how your legacy code works. As Barry
mentioned somewhere, the email package's feedparser() is an attempt at
implementing this -- but he sounded he has doubts that it works as-is
in an async environment.
However I am more worried about pull-based APIs. Take (as an extreme
example) the standard stream API for reading, especially
TextIOWrapper. I could see how we could turn the *writing* APIs async
easily enough, but I don't see how to do it for the reading end -- you
can't seriously propose to read the entire file into the buffer and
then satisfy all reads from memory.
>> > Re: forward path for existing asyncore code. I don't remember this being
>> > raised as an issue. If anything, it was mentioned in passing, and I think
>> > the answer to it was something to the tune of "asyncore's API is broken,
>> > fixing it is more important than backwards compat". Essentially I agree with
>> > Guido that the important part is an upgrade path to a good third-party
>> > library, which is the part about asyncore that REALLY sucks right now.
>>
>> I have the feeling that the main reason asyncore sucks is that it
>> requires you to subclass its Dispatcher class, which has a rather
>> treacherous interface.
>
> There's at least a few others, but sure, that's an obvious one. Many of the
> objections I can raise however don't matter if there's already an *existing
> working solution*. I mean, sure, it can't do SSL, but if you have code that
> does what you want right now, then obviously SSL isn't actually needed.
I think you mean this as an indication that providing the forward path
for existing asyncore apps shouldn't be rocket science, right? Sure, I
don't want to worry about that, I just want to make sure that we don't
*completely* paint ourselves into the wrong corner when it comes to
that.
>> > Regardless, an API upgrade is probably a good idea. I'm not sure if it
>> > should go in the first PEP: given the separation I've outlined above (which
>> > may be too spread out...), there's no obvious place to put it besides it
>> > being a new PEP.
>>
>> Aren't all your proposals API upgrades?
>
> Sorry, that was incredibly poor wording. I meant something more of an
> adapter: an upgrade path for existing asyncore code to new and shiny 3153
> code.
Yes, now it makes sense.
>> > Re base reactor interface: drawing maximally from the lessons learned in
>> > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>> > etc), asynchronous-looking name lookup, fd handling are the important
>> > parts.
>>
>> That actually sounds more concrete than I'd like a reactor interface
>> to be. In the App Engine world, there is a definite need for a
>> reactor, but it cannot talk about file descriptors at all -- all I/O
>> is defined in terms of RPC operations which have their own (several
>> layers of) async management but still need to be plugged in to user
>> code that might want to benefit from other reactor functionality such
>> as scheduling and placing a call at a certain moment in the future.
>
> I have a hard time understanding how that would work well outside of
> something like GAE. IIUC, that level of abstraction was chosen because it
> made sense for GAE (and I don't disagree), but I'm not sure it makes sense
> here.
I think I answered this in the reactors thread -- I propose an I/O
object abstraction that is not directly tied to a file descriptor, but
for which a concrete implementation can be made to support file
descriptors, and another to support App Engine RPC.
> In this example, where would eg the select/epoll/whatever calls happen? Is
> it something that calls the reactor that then in turn calls whatever?
App Engine doesn't have select/epoll/whatever, so it would have a
reactor implementation that doesn't use them. But the standard Unix
reactor would support file descriptors using select/etc.
Please respond in the reactors thread.
>> > call_every can be implemented in terms of call_later on a separate object,
>> > so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>> > that is apparently forgotten about is event loop integration. The prime way
>> > of having two event loops cooperate is *NOT* "run both in parallel", it's
>> > "have one call the other". Even though not all loops support this, I think
>> > it's important to get this as part of the interface (raise an exception for
>> > all I care if it doesn't work).
>>
>> This is definitely one of the things we ought to get right. My own
>> thoughts are slightly (perhaps only cosmetically) different again:
>> ideally each event loop would have a primitive operation to tell it to
>> run for a little while, and then some other code could tie several
>> event loops together.
>
> As an API, that's pretty close to Twisted's IReactorCore.iterate, I think.
> It'd work well enough. The issue is only with event loops that don't
> cooperate so well.
Again, a topic for the reactor thread.
But I'm really hoping you'll make good on your promise of redoing
async-pep, giving some actual specifications and example code, so I
can play with it.
--
--Guido van Rossum (python.org/~guido)
(Sorry if this is in the wrong place, I'm joining the conversation and
I'm not sure where mailman will put it)
> Alternatively, yielding a future (or whatever ones calls the objects
> returned by *_async()) could register *and* wait for the result. To
> register without waiting one would yield a wrapper for the future. So
> one could write
What would registering a Future do? As far as I understood it, the
plan here is that a Future was just a marker for an outstanding
request:
def callback(result):
print "The result was", result
def say_hello(name):
f = Future()
f.resolve("Hello, %s!")
return f
f = say_hello("Jeff")
f.add_callback(callback)
The outstanding request doesn't have to care about socket connections;
it's just a way to pass around a result that hasn't arrived yet. This
is pretty much the same as Deferreds/Promises, with a different name.
There's no reactor here to register here, because there doesn't need
to be one.
--
Jasper
(Sorry if this doesn't end up in the right thread in mail clients; I've
been reading this through a web UI and only just formally subscribed so
can't reply directly to the correct email.)
Code that uses generators is indeed often easier to read... but the problem
is that this isn't just a difference in syntax, it has a significant
semantic impact. Specifically, requiring yield means that you're
re-introducing context switching. In inlineCallbacks, or coroutines, or any
system that use yield as in your example above, arbitrary code may run
during the context switch, and who knows what happened to the state of the
world in the interim. True, it's an explicit context switch, unlike
threading where it can happen at any point, but it's still a context
switch, and it still increases the chance of race conditions and all the
other problems threading has. (If you're omitting yield it's even worse,
since you can't even tell anymore where the context switches are
happening.) Superficially such code is simpler (and in some cases I'm happy
to use inlineCallbacks, in particular in unit tests), but much the same way
threaded code is "simpler". If you're not very very careful, it'll work 99
times and break mysteriously the 100th.
For example, consider the following code; silly, but buggy due to the
context switch in yield allowing race conditions if any other code modifies
counter.value while getResult() is waiting for a result.
def addToCounter():
counter.value = counter.value + (yield getResult())
In a Deferred callback, on the other hand, you know the only things that
are going to run are functions you call. In so far as it's possible, what
happens is under control of one function only. Less pretty, but no
potential race conditions:
def add(result):
counter.value = counter.value + result
getResult().addCallback(add)
That being said, perhaps some changes to Python syntax could solve this;
Allen Short (
http://washort.twistedmatrix.com/2012/10/coroutines-reduce-readability.html)
claims to have a proposal, hopefully he'll post it soon.
Le samedi 13 octobre 2012 à 19:47 +1000, Nick Coghlan a écrit :
> The problem is that "Windows path" and "Posix path" aren't really
> accurate. There are a bunch of degrees of freedom, which is *exactly*
> the problem the context pattern is designed to deal with without a
> combinatorial explosion of different types or mixins.
>
> The "how is the string format determined?" aspect could be handled
> with separate methods, but how do you do case insensitive comparisons
> of paths on posix systems?
The question is: why do you want to do that?
I know there are a limited bunch of special cases where Posix filesystem
paths may be case-insensitive, but nobody really cares about them today,
and I don't expect many people to bother tomorrow. Playing with
individual parameters of path semantics sounds like a theoretical bother
more than a practical one.
A possibility would be to expose the Flavour classes, which until now
are an internal implementation detail. That would first imply better
defining their API, though. Then people could write e.g.:
class PosixCaseInsensitiveFlavour(pathlib.PosixFlavour):
case_sensitive = False
class MyPath(pathlib.PosixPath):
flavour = PosixCaseInsensitiveFlavour()
But I would consider it extra icing on the cake, not a requirement for a
Path API.
Regards
Antoine.
--
Software development and contracting: http://pro.pitrou.net
Hello,
Since there has been some controversy about the joining syntax used in
PEP 428 (filesystem path objects), I would like to run an informal poll
about it. Please answer with +1/+0/-0/-1 for each proposal:
- `p[q]` joins path q to path p
- `p + q` joins path q to path p
- `p / q` joins path q to path p
- `p.join(q)` joins path q to path p
(you can include a rationale if you want, but don't forget to vote :-))
Thank you
Antoine.
--
Software development and contracting: http://pro.pitrou.net
On Fri, Oct 12, 2012 at 5:54 PM, Mark Adam <dreamingforward(a)gmail.com> wrote:
> On Thu, Oct 11, 2012 at 8:03 PM, Steven D'Aprano <steve(a)pearwood.info> wrote:
>>>> I would gladly give up a small amount of speed for better control
>>>> over floats, such as whether 1/0.0 raised an exception or
>>>> returned infinity.
>>>
>>> Umm, you would be giving up a *lot* of speed. Native floating point
>>> happens right in the processor, so if you want special behavior, you'd
>>> have to take the floating point out of hardware and into "user space".
>>
>> Even in user-space, you're not giving up that much speed in practical
>> terms, at least not for my needs. The new decimal module in Python 3.3 is
>> less than a factor of 10 times slower than Python's floats, which makes it
>> pretty much instantaneous to my mind :)
>
> Hmm, well, if it's only that much slower, then we should implement
> Rationals and get rid of the issue altogether.
Now that I think of it, this issue has a strange whiff of the argument
wherefrom came the "from __future__" directive and the split that
happened between the vpython folks who needed the direct support of
float division (rendering 3-d graphics for an interpreted environment)
and the regular python crowd. Anyone else remember that?
mark