From rosuav at gmail.com  Tue Jul  1 00:05:19 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 1 Jul 2014 08:05:19 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
Message-ID: <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>

On Tue, Jul 1, 2014 at 3:18 AM,  <random832 at fastmail.us> wrote:
> On Sat, Jun 28, 2014, at 01:28, Chris Angelico wrote:
>> empty_set_literal =
>> type(lambda:0)(type((lambda:0).__code__)(0,0,0,3,67,b't\x00\x00d\x01\x00h\x00\x00\x83\x02\x00\x01d\x00\x00S',(None,"I'm
>
> If you're embedding the entire compiler (in fact, a modified one) in
> your tool, why not just output a .pyc?

I'm not, I'm calling on the normal compiler. Also, I'm not familiar
with the pyc format, nor with any of the potential pit-falls of that
approach. But if someone wants to make an "alternative front end that
makes a .pyc file" kind of thing, they're most welcome to.

ChrisA

From abarnert at yahoo.com  Tue Jul  1 01:48:14 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 30 Jun 2014 16:48:14 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
Message-ID: <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>

First, two quick side notes:

It might be nice if the compiler were as easy to hook as the importer.?Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar. Either of those would make this problem trivial.

I doubt either of those would be useful often enough that anyone wants to put in the work.?But then I doubt the empty-set literal would be either, so anyone who seriously wants to work on this might want to work on the inline assembly and/or hookable compiler first.

Anyway:

On Monday, June 30, 2014 3:12 PM, Chris Angelico <rosuav at gmail.com> wrote:
>On Tue, Jul 1, 2014 at 3:18 AM,? <random832 at fastmail.us> wrote:

>> On Sat, Jun 28, 2014, at 01:28, Chris Angelico wrote:
>>> empty_set_literal =
>>> type(lambda:0)(type((lambda:0).__code__)(0,0,0,3,67,b't\x00\x00d\x01\x00h\x00\x00\x83\x02\x00\x01d\x00\x00S',(None,"I'm

I think it makes more sense to use types.FunctionType and types.CodeType here than to generate two extra functions for each function, even if that means you have to put an import types at the top of every munged source file.

>> If you're embedding the entire compiler (in fact, a modified one) in
>> your tool, why not just output a .pyc?
>
>I'm not, I'm calling on the normal compiler. Also, I'm not familiar
>with the pyc format, nor with any of the potential pit-falls of that
>approach. But if someone wants to make an "alternative front end that
>makes a .pyc file" kind of thing, they're most welcome to.


The tricky bit with making a .pyc file is generating the header information?last I checked (quite a while ago, and not that deeply?) that wasn't documented, and there were no helpers exported to Python.

But I think what he was suggesting is something like this: Let?py_compile.compile generate the .pyc file as normal, then munge the bytecode in that file, instead of compiling each function, munging its bytecode, and emitting source that creates the munged functions.


Besides being a lot less work, his version works for ? at top level, in class definitions, in lambda expressions, etc., not just for def statements. And it doesn't require finding and identifying all of the things to munge in a source file (which I assume you'd do bottom-up based on the ast.parse tree or something).

But either way, this still doesn't solve the big problem. Compiling a function by hand and then tweaking the bytecode is easy; doing it programmatically is more painful.?You obviously need the function to compile, so you have to replace the ? with something else whose bytecode you can search-and-replace. But what? That something else has to be valid in an expression context (so it compiles), has to compile to a 3-byte opcode (otherwise, replacing it will screw up any jump targets that point after it), can't add any globals/constants/etc. to the list (otherwise, removing it will screw up any LOAD_FOO statements that refer to a higher-numbered foo), and can't appear anywhere in the code being compiled.

The only thing I can think of off the top of my head is to replace it with whichever of [], (), or {} doesn't appear anywhere in the code being compiled, then you can search-replace BUILD_LIST/TUPLE/MAP 0 with BUILD_SET 0. But what if all three appear in the code? Raise a SyntaxError('Cannot use all 4 kinds of empty literals in the same scope')?

One more thing that I'm sure you thought of, but may not have thought through all the way: To make this generally useful, you can't just hardcode creating a zero-arg top-level function; you need to copy all of the code and function constructor arguments from the compiled function.?

So, if the function is a closure, how do you do that? You need to pass a list of closure cell objects that bind to the appropriate co_cellvars from the?current frame, and I don't think there's a way to do that from Python.?So, you need to do that by bytecode-hacking the outer function in the same way, just so it can build the inner function.?And, even if you could build closure cells, once you've replaced the inner function definition with a function constructor from bytecode, when the resulting code gets compiled, it won't have any cellvars anymore.

And going back to the top, all of these problems are why I think random's solution would be a lot easier than yours, but why my solution (first build compiler hooks or inline assembly, then use that to implement the empty set trivially) would be no harder than either (and a lot more generally useful), and also why I think this really isn't worth doing.

From ron3200 at gmail.com  Tue Jul  1 02:16:38 2014
From: ron3200 at gmail.com (Ron Adam)
Date: Mon, 30 Jun 2014 19:16:38 -0500
Subject: [Python-ideas] Special keyword denoting an infinite loop
In-Reply-To: <20140630182030.GR13014@ando>
References: <CAMPw9HRefqWwD5k11q5SLeOW3dP4mnPVn9is-fO6Z0gd7EXbMw@mail.gmail.com>
 <20140628091112.GI13014@ando> <lom41p$b7h$1@ger.gmane.org>
 <1404149077.20890.136187005.57F7B11E@webmail.messagingengine.com>
 <20140630182030.GR13014@ando>
Message-ID: <losul7$s37$1@ger.gmane.org>



On 06/30/2014 01:20 PM, Steven D'Aprano wrote:
> On Mon, Jun 30, 2014 at 01:24:37PM -0400, random832 at fastmail.us wrote:
>> On Sat, Jun 28, 2014, at 06:05, Stefan Behnel wrote:
>>> Adding a new keyword needs very serious reasoning, and that's a good
>>> thing.
> [...]
>> What about _just_ "while:" or "for:"?
>
> Why bother? Is there anything you can do with a bare "while:" that you
> can't do with "while True:"? If not, what's the point?

It looks like (in python3) "while 1:", "while True:", and while with a 
string, generates the same byte code.  Just a bare SETUP_LOOP.  Which would 
be the exact same as "while:" would.  So no, it wouldn't make a bit of 
difference other than saving a few key strokes in the source code.

 >>> def L():
...    while True:
...        break
...
 >>> L()
 >>> dis(L)
   2           0 SETUP_LOOP               4 (to 7)

   3     >>    3 BREAK_LOOP
               4 JUMP_ABSOLUTE            3
         >>    7 LOAD_CONST               0 (None)
              10 RETURN_VALUE


 >>> def LL():
...    while 1:
...        break
...
 >>> dis(LL)
   2           0 SETUP_LOOP               4 (to 7)

   3     >>    3 BREAK_LOOP
               4 JUMP_ABSOLUTE            3
         >>    7 LOAD_CONST               0 (None)
              10 RETURN_VALUE


 >>> def LLL():
...     while "looping":
...         break
...
 >>> dis(LLL)
   2           0 SETUP_LOOP               4 (to 7)

   3     >>    3 BREAK_LOOP
               4 JUMP_ABSOLUTE            3
         >>    7 LOAD_CONST               0 (None)
              10 RETURN_VALUE


Cheers, Ron


From abarnert at yahoo.com  Tue Jul  1 02:27:55 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 30 Jun 2014 17:27:55 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <20140701001814.GA27480@leliel.pault.ag>
References: <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <20140701001814.GA27480@leliel.pault.ag>
Message-ID: <1404174475.669.YahooMailNeo@web181001.mail.ne1.yahoo.com>

On Monday, June 30, 2014 5:18 PM, Paul Tagliamonte <paultag at gmail.com> wrote:


[snip]

>Right, so, this was brought up before, but with Hylang
>(https://github.com/hylang/hy), we abuse the PEP 302 new import hooks to
>search sys.path for .hy files rather then .py files.
>
>You could do the same for your .pyu files (again, *without* the blessing
>of the core team, as this is insane), and do the mangling before passing
>it to the normal internals to turn it into bytecode / AST.
>
>Doing it this way means you won't have to futz with the compiler,
>and you can remain happy.


The reason for needing to futz with the compiler is to generate source code that actually compiles to the bytecode to build an empty set directly, instead of the bytecode to load and call the "set" global.

I agree with both you and Guido that the whole thing is silly, and set() is fine. I also agree with your implied assumption that, even if you needed an empty set literal, having it compile to the same thing as set() would be fine. But those who disagree with both, and really want an empty set literal that compiles to "BUILD_SET 0", cannot have it without futzing with the compiler. So, I'd encourage them to work on making the compiler more futzable (which surely more people would have a use for than the number of people for whom set() is intolerably slow, or unusable because they want to redefine the global).

From rosuav at gmail.com  Tue Jul  1 02:39:14 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 1 Jul 2014 10:39:14 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
Message-ID: <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>

On Tue, Jul 1, 2014 at 9:48 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
> First, two quick side notes:
>
> It might be nice if the compiler were as easy to hook as the importer. Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar. Either of those would make this problem trivial.
>

That would be interesting, but it raises the possibility of mucking up
the stack. (Imagine if you put BUILD_SET 1 in there instead. What's it
going to make a set of? What's going to happen to the rest of the
stack? Do you REALLY want to debug that?)

Back when I did a lot of C and C++ programming, I used to make good
use of a "drop to assembly" feature. There were two broad areas where
I'd use it: either to access a CPU feature that the compiler and
library didn't offer me (like CPUID, in its early days), or to
hand-optimize some code. Then compilers got better and better, and the
first set of cases got replaced with library functions... and the
second lot ended up being no better than the compiler's output, and
potentially a lot worse - particularly because they're non-portable.
Allowing a "drop to bytecode" in CPython would have the exact same
effects, I think. Some people would use it to create an empty set,
others would use it to replace variable swapping with a marginally
faster and *almost* identical stack-based swap:

x,y = y,x
LOAD_GLOBAL y
LOAD_GLOBAL x
ROT_TWO
STORE_GLOBAL x
STORE_GLOBAL y

becomes

LOAD_GLOBAL x
LOAD_GLOBAL y
STORE_GLOBAL x
STORE_GLOBAL y

Seems fine, right? But it's a subtle change to semantics (evaluation
order), and not much benefit anyway. Plus, if it's decided that this
semantic change is safe (if it's provably not going to have any
significance), a future version of CPython would be able to make the
exact same optimization, while leaving the code readable, and portable
to other Python implementations.

So while an inline bytecode assembler might have some uses, I suspect
it'd be an attractive nuisance more than anything else.

> On Monday, June 30, 2014 3:12 PM, Chris Angelico <rosuav at gmail.com> wrote:
>>On Tue, Jul 1, 2014 at 3:18 AM,  <random832 at fastmail.us> wrote:
>
>>> On Sat, Jun 28, 2014, at 01:28, Chris Angelico wrote:
>>>> empty_set_literal =
>>>> type(lambda:0)(type((lambda:0).__code__)(0,0,0,3,67,b't\x00\x00d\x01\x00h\x00\x00\x83\x02\x00\x01d\x00\x00S',(None,"I'm
>
> I think it makes more sense to use types.FunctionType and types.CodeType here than to generate two extra functions for each function, even if that means you have to put an import types at the top of every munged source file.

Sure. This is just a proof-of-concept anyway, and it's not meant to be
good code. Either way works, I just tried to minimize name usage (and
potential name collisions).

> But I think what he was suggesting is something like this: Let py_compile.compile generate the .pyc file as normal, then munge the bytecode in that file, instead of compiling each function, munging its bytecode, and emitting source that creates the munged functions.
>
>
> Besides being a lot less work, his version works for ? at top level, in class definitions, in lambda expressions, etc., not just for def statements. And it doesn't require finding and identifying all of the things to munge in a source file (which I assume you'd do bottom-up based on the ast.parse tree or something).
>

Sure. But all I was doing was responding to the implied statement that
it's not possible to write a .py file that makes a function with
BUILD_SET 0 in it. Translating a .pyu directly into a .pyc is still
possible, but was not the proposal.

> But either way, this still doesn't solve the big problem. Compiling a function by hand and then tweaking the bytecode is easy; doing it programmatically is more painful. You obviously need the function to compile, so you have to replace the ? with something else whose bytecode you can search-and-replace. But what? That something else has to be valid in an expression context (so it compiles), has to compile to a 3-byte opcode (otherwise, replacing it will screw up any jump targets that point after it), can't add any globals/constants/etc. to the list (otherwise, removing it will screw up any LOAD_FOO statements that refer to a higher-numbered foo), and can't appear anywhere in the code being compiled.
>

What I did was put in a literal string.

https://github.com/Rosuav/shed/blob/master/empty_set.py

It uses "? is set()" as a marker, depending on that string not
existing in the source. (I could compile the function twice, once with
that string, and then a second time with another string; the first
compilation would show what consts it uses, and the program could then
generate an arbitrary constant which doesn't exist.) The opcode is the
right length (assuming it doesn't go for EXTENDED_ARG, which I've
never heard of; it seems to be necessary if you have more than 64K
consts/globals/locals in a function???), and the resulting function
has an unnecessary const in it. It wouldn't be hard to drop it (the
code already parses through everything; it could just go "if it's
LOAD_CONST, three options - if it's the marker, switch in a BUILD_SET,
if it's less than the marker, no change, if it's more than the marker,
decrement"), but it doesn't seem to be a problem to have an extra
const in there.

> One more thing that I'm sure you thought of, but may not have thought through all the way: To make this generally useful, you can't just hardcode creating a zero-arg top-level function; you need to copy all of the code and function constructor arguments from the compiled function.
>

It handles arguments and stuff. All the attributes of the original
function object get passed through unchanged to the resulting
function, with the exception of the bytecode, obviously.

> So, if the function is a closure, how do you do that? You need to pass a list of closure cell objects that bind to the appropriate co_cellvars from the current frame, and I don't think there's a way to do that from Python. So, you need to do that by bytecode-hacking the outer function in the same way, just so it can build the inner function. And, even if you could build closure cells, once you've replaced the inner function definition with a function constructor from bytecode, when the resulting code gets compiled, it won't have any cellvars anymore.
>

Ah, that part I've no idea about. But it wouldn't be impossible for
someone to develop that a bit further.

> And going back to the top, all of these problems are why I think random's solution would be a lot easier than yours, but why my solution (first build compiler hooks or inline assembly, then use that to implement the empty set trivially) would be no harder than either (and a lot more generally useful), and also why I think this really isn't worth doing.
>

Right. I absolutely agree with your conclusion (not worth doing), and
always have had that view. This is proof that it's kinda possible, but
still a bad idea. Now, if someone comes up with a really compelling
use-case for an empty set literal, then maybe it'd be more important;
but if that happens, CPython will probably grow an empty set literal
in ASCII somehow, and then the .pyu translation can just turn ? into
that.

ChrisA

From paultag at gmail.com  Tue Jul  1 02:18:14 2014
From: paultag at gmail.com (Paul Tagliamonte)
Date: Mon, 30 Jun 2014 20:18:14 -0400
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
References: <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
Message-ID: <20140701001814.GA27480@leliel.pault.ag>

On Mon, Jun 30, 2014 at 04:48:14PM -0700, Andrew Barnert wrote:
> First, two quick side notes:
> 
> It might be nice if the compiler were as easy to hook as the importer.?Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar. Either of those would make this problem trivial.
> 
> I doubt either of those would be useful often enough that anyone wants to put in the work.?But then I doubt the empty-set literal would be either, so anyone who seriously wants to work on this might want to work on the inline assembly and/or hookable compiler first.

Again, to be absolutely clear here, I hate this idea. `set()` is
perfectly clear. Sorry. Had to be said before any of this.


Right, so, this was brought up before, but with Hylang
(https://github.com/hylang/hy), we abuse the PEP 302 new import hooks to
search sys.path for .hy files rather then .py files.

You could do the same for your .pyu files (again, *without* the blessing
of the core team, as this is insane), and do the mangling before passing
it to the normal internals to turn it into bytecode / AST.

Doing it this way means you won't have to futz with the compiler,
and you can remain happy.

And we like being happy.


More info:

  https://github.com/hylang/hy/blob/master/hy/importer.py
  http://slides.pault.ag/hy.html#/15
  https://www.youtube.com/watch?v=AmMaN1AokTI
  https://www.youtube.com/watch?v=ulekCWvDFVI
  http://legacy.python.org/dev/peps/pep-0302/


Again, this approach can be a bit flaky, and this particular issue might
very well cause problems for us as a community - seeing as how the
syntax is almost exactly identical.

Hylang (for what it's worth) is just a nice way for us Lisp nerds to stop
complaining as much.


Godspeed,
  Paul

-- 
#define sizeof(x) rand()
</paul>
:wq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140630/28681101/attachment.sig>

From ncoghlan at gmail.com  Tue Jul  1 04:15:20 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 1 Jul 2014 12:15:20 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
Message-ID: <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>

On 30 Jun 2014 16:51, "Andrew Barnert" <abarnert at yahoo.com.dmarc.invalid>
wrote:
>
> First, two quick side notes:
>
> It might be nice if the compiler were as easy to hook as the
importer. Alternatively, it might be nice if there were a way to do "inline
bytecode assembly" in CPython, similar to the way you do inline assembly in
many C compilers, so the answer to random's question is just "asm
[('BUILD_SET', 0)]" or something similar. Either of those would make this
problem trivial.

Eugene Toder & Dave Malcolm have some interesting patches on the tracker to
help enhance the compiler (in particular, Dave's allowed compiler plugins
to be written in Python). Neither set of patches made it to being merge
ready, though, and they'll be rather stale at this point.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140701/e7d06acd/attachment-0001.html>

From guido at python.org  Tue Jul  1 04:18:44 2014
From: guido at python.org (Guido van Rossum)
Date: Mon, 30 Jun 2014 19:18:44 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
Message-ID: <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>

Like bytecode, the compiler's workings are not part of the language spec,
and are likely to change incompatibly between versions and not work for
anything besides CPython. I don't really want to go there (cool though it
sounds for wannabe compiler hackers).


On Mon, Jun 30, 2014 at 7:15 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 30 Jun 2014 16:51, "Andrew Barnert" <abarnert at yahoo.com.dmarc.invalid>
> wrote:
> >
> > First, two quick side notes:
> >
> > It might be nice if the compiler were as easy to hook as the
> importer. Alternatively, it might be nice if there were a way to do "inline
> bytecode assembly" in CPython, similar to the way you do inline assembly in
> many C compilers, so the answer to random's question is just "asm
> [('BUILD_SET', 0)]" or something similar. Either of those would make this
> problem trivial.
>
> Eugene Toder & Dave Malcolm have some interesting patches on the tracker
> to help enhance the compiler (in particular, Dave's allowed compiler
> plugins to be written in Python). Neither set of patches made it to being
> merge ready, though, and they'll be rather stale at this point.
>
> Cheers,
> Nick.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140630/01c0ff4b/attachment.html>

From abarnert at yahoo.com  Tue Jul  1 10:04:37 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 01:04:37 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
Message-ID: <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>

Before I get to the reply, because I couldn't find a 3.x-compatible bytecode assembler, I slapped one together at https://github.com/abarnert/cpyasm. I think it would be reasonably possible to use this to add inline assembly to a preprocessor, but I haven't tried, because I don't have a preprocessor I actually want, and this was the fun part. :)

> On Monday, June 30, 2014 5:39 PM, Chris Angelico <rosuav at gmail.com> wrote:

> > On Tue, Jul 1, 2014 at 9:48 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>  First, two quick side notes:
>> 
>>  It might be nice if the compiler were as easy to hook as the importer. 
> Alternatively, it might be nice if there were a way to do "inline bytecode 
> assembly" in CPython, similar to the way you do inline assembly in many C 
> compilers, so the answer to random's question is just "asm 
> [('BUILD_SET', 0)]" or something similar. Either of those would 
> make this problem trivial.
>> 
> 
> That would be interesting, but it raises the possibility of mucking up
> the stack. (Imagine if you put BUILD_SET 1 in there instead. What's it
> going to make a set of? What's going to happen to the rest of the
> stack? Do you REALLY want to debug that?)

The same thing that happens if you use bad inline assembly in C, or a bad C extension module in Python?bad things that you can't debug at source level. And yet, inline assembly in C and C extension modules in Python are still quite useful.

Of course the difference is that you can drop from the source level to the machine level pretty easily in gdb, lldb, Microsoft's debugger, etc., while you can't as easily drop from the source level to the bytecode level in pdb. (I'm not sure that wouldn't be an interesting feature to add in itself, but that's getting even farther off topic, so forget it for now.)?

> Back when I did a lot of C and C++ programming, I used to make good

> use of a "drop to assembly" feature. There were two broad areas where
> I'd use it: either to access a CPU feature that the compiler and
> library didn't offer me (like CPUID, in its early days), or to
> hand-optimize some code. Then compilers got better and better, and the
> first set of cases got replaced with library functions... and the
> second lot ended up being no better than the compiler's output, and
> potentially a lot worse - particularly because they're non-portable.
> Allowing a "drop to bytecode" in CPython would have the exact same
> effects, I think.

I'll ignore the second case for the moment, because I think it's rarely if ever appropriate to Python, and just focus on the first. Those cases did not go away because CPUID got replaced with library functions. Those library functions?which are compiled with the same compiler you use for your code?have inline assembly in them. (Or, if you're on linux, those library functions read from a device file, but the device driver, which is compiled with the same compiler you use, has inline assembly in it.) So, the compiler still needs to be able to compile it.

There are cases where that isn't true. For example, most modern compilers that care about x86 have extended the C language in some way to make it unnecessary for you to write LOCK CMPXCHG all over the place if you want to do lockfree refcounting (and, even better, they've done so in a way that also does the right thing on ARM 9 or SPARC or whatever else you care about). Or, in some cases, they've done something halfway in between, adding "compiler intrinsic functions" that look like functions, but are compiled directly into inline asm. But either way, that didn't happen until a lot of people were publishing code that used that inline assembly. Otherwise, the compiler vendors have no reason to believe it's necessary to add a new feature.

Plus, people still needed to keep distributing code that uses the inline asm for years, until the oldest compiler and library on every platform they support incorporated the change they needed.

And, just as you say, I think it would have the exact same effects in CPython. If we added inline bytecode asm to 3.5, and there were actually something useful to do with it, people would start doing it, and that's how we'd know that something useful was worth adding to the language, and when we added that something useful in 3.7, eventually people could start using that, and then it would be years before all of the projects that need that feature either die or require 3.7. But that's not a problem; that's inline asm working exactly as it should.

There is one good reason to reject the inline asm idea: If it's unlikely that there will be anything worth using it for (or if it might plausibly be useful, but not enough so that anyone's worth doing the work). Which I think is at least plausible, and maybe likely.

> Some people would use it to create an empty set,
> others would use it to replace variable swapping with a marginally
> faster and *almost* identical stack-based swap:

Do you really think anyone would do the latter? Seriously, what kind of code can you imagine that's too slow in CPython, not worth rewriting in C or running in PyPy or whatever, but fast enough with the rot opcode removed? And if someone really _did_ need that, I doubt they'd care much that Python 3.8 makes it unnecessary; they obviously have a specific deployment platform that has to work and that needed that last 3% speedup under 3.6.2, and they're going to need that to keep working for years.

The former, maybe. Not just to allow ?, but maybe someone would want to write a Unicode-math-ified Python dialect as an import-hook preprocessor that used inline asm among other tools. In which case? so what? That's not going to be something that people just randomly drop into their code, there will be a single project with however many users, which will be no worse for the Python community than Hylang. If their demonstration is just so cool that everyone decides we need Unicode symbols in Python core, then great. If not, and they still want to keep using it, well, a simpler preprocessor will be easier for the rest of us to understand than a ridiculously complicated one that does bytecode hackery, or than a hacked-up CPython compiler.

> So while an inline bytecode assembler might have some uses, I suspect

> it'd be an attractive nuisance more than anything else.

I honestly don't see it becoming an attractive nuisance.?

I can easily see it just not getting used for anything at all, beyond people playing with the interpreter.

And now, on to your other replies:

>>  On Monday, June 30, 2014 3:12 PM, Chris Angelico <rosuav at gmail.com>?

> wrote:
>>> On Tue, Jul 1, 2014 at 3:18 AM,? <random832 at fastmail.us> wrote:
>> 
>>>>  On Sat, Jun 28, 2014, at 01:28, Chris Angelico wrote:
>>>>>  empty_set_literal =
>>>>> 
> type(lambda:0)(type((lambda:0).__code__)(0,0,0,3,67,b't\x00\x00d\x01\x00h\x00\x00\x83\x02\x00\x01d\x00\x00S',(None,"I'm
>> 
>>  I think it makes more sense to use types.FunctionType and types.CodeType 
> here than to generate two extra functions for each function, even if that means 
> you have to put an import types at the top of every munged source file.
> 
> Sure. This is just a proof-of-concept anyway, and it's not meant to be
> good code. Either way works, I just tried to minimize name usage (and
> potential name collisions).
> 
>>  But I think what he was suggesting is something like this: Let 
> py_compile.compile generate the .pyc file as normal, then munge the bytecode in 
> that file, instead of compiling each function, munging its bytecode, and 
> emitting source that creates the munged functions.
>> 
>> 
>>  Besides being a lot less work, his version works for ? at top level, in 
> class definitions, in lambda expressions, etc., not just for def statements. And 
> it doesn't require finding and identifying all of the things to munge in a 
> source file (which I assume you'd do bottom-up based on the ast.parse tree 
> or something).
>> 
> 
> Sure. But all I was doing was responding to the implied statement that
> it's not possible to write a .py file that makes a function with
> BUILD_SET 0 in it. Translating a .pyu directly into a .pyc is still
> possible, but was not the proposal.

Agreed, I just think it's an _easier_ proposal than yours, not a harder one (assuming you want to actually build the real thing, not just a proof of concept), which I think is why Random suggested it.

Also, again, I don't think a real project that allowed ? in a def but not in a lambda, class, or top-level code would be acceptable to anyone, and I don't see how your solution can be easily adapted to those cases (well, except lambda).

[snip, and everything below here condensed]

> What I did was put in a literal string??
> It uses "? is set()" as a marker ? and the resulting function
> has an unnecessary const in it.?

I assumed that leaving the unnecessary const behind was unacceptable. After all, we're talking about (hypothetical?) people who find the cost of LOAD_GLOBAL set; CALL_FUNCTION 0 to be unacceptable? But you're right that fixing up all the other LOAD_CONST bytecodes' args is a feasible way to solve that.

>>  So, if the function is a closure, how do you do that?
> Ah, that part I've no idea about. But it wouldn't be impossible for

> someone to develop that a bit further.

Not impossible, but very hard, much harder than what you've done so far.

Ultimately, I think that just backs up your larger point: This is doable, but it's going to be a lot of work, and the benefit isn't even nearly worth the cost. My point is that there are other ways to do it that would be less work and/or that would have more side benefits? but the benefit still isn't even nearly worth the cost, so who cares? :)

From abarnert at yahoo.com  Tue Jul  1 10:27:00 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 01:27:00 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
 <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
Message-ID: <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>

(Replies to both Guido's top-post and Nick's reply-post below.)

On Monday, June 30, 2014 7:19 PM, Guido van Rossum <guido at python.org> wrote:


>Like bytecode, the compiler's workings are not part of the language spec, and are likely to change incompatibly between versions and not work for anything besides CPython. I don't really want to go there (cool though it sounds for wannabe compiler hackers).


But CPython does expose bytecode via the dis module, parts of inspect, etc. For that matter, it exposes some of the compiler's workings (especially if you consider everything up to AST generation part of the compiler, since every step up to there is exposed, including doing the whole thing in one whack with PyCF_ONLY_AST). So, I don't see how exposing the AST-to-bytecode transformation part (or, while we're at it, the .pyc generation part) is any more unportable than what's already there.

That being said, I can appreciate that it would almost certainly take a lot more work, and a lot riskier work to do that, so the same tradeoff could easily go the other way in this case. (Not to mention that the dis module and so on are already there, while the patches Nick was talking about, much less something brand new, are obviously not.)

>On Mon, Jun 30, 2014 at 7:15 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>>On 30 Jun 2014 16:51, "Andrew Barnert" <abarnert at yahoo.com.dmarc.invalid> wrote:
>>>
>>> First, two quick side notes:
>>>
>>> It might be nice if the compiler were as easy to hook as the importer.?Alternatively, it might be nice if there were a way to do "inline bytecode assembly" in CPython, similar to the way you do inline assembly in many C compilers, so the answer to random's question is just "asm [('BUILD_SET', 0)]" or something similar. Either of those would make this problem trivial.
>>Eugene Toder & Dave Malcolm have some interesting patches on the tracker to help enhance the compiler (in particular, Dave's allowed compiler plugins to be written in Python). Neither set of patches made it to being merge ready, though, and they'll be rather stale at this point.


Thanks!

Are you referring to Dave Malcolm's patch to adding a hook for an AST optimizer (in Python) right before compiling the AST to code (http://bugs.python.org/issue10399 and related)??

If so, I don't think that would actually help here. Unless it's possible to say "BUILD_SET 0" in AST, but in that case, we don't need any new compiler hooks; just use an import hook the same way MacroPy does. (Doing it without import hooks would be a little nicer, but it's not essential.)

The only patch I could find by Eugene Toder is one to reenable constant folding on -0, which I think was already committed in 3.3, and doesn't seem relevant anyway. Is there something else I should be searching for?

From rosuav at gmail.com  Tue Jul  1 10:38:37 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 1 Jul 2014 18:38:37 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
Message-ID: <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>

On Tue, Jul 1, 2014 at 6:04 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>> On Monday, June 30, 2014 5:39 PM, Chris Angelico <rosuav at gmail.com> wrote:
>
>> That would be interesting, but it raises the possibility of mucking up
>> the stack. (Imagine if you put BUILD_SET 1 in there instead. What's it
>> going to make a set of? What's going to happen to the rest of the
>> stack? Do you REALLY want to debug that?)
>
> The same thing that happens if you use bad inline assembly in C, or a bad C extension module in Python?bad things that you can't debug at source level. And yet, inline assembly in C and C extension modules in Python are still quite useful.

Right, useful but it adds another set of problems. (Just out of
curiosity, what protection _is_ there for a smashed stack? I just
tried fiddling with it and didn't manage to crash stuff.)

> I'll ignore the second case for the moment, because I think it's rarely if ever appropriate to Python, and just focus on the first. Those cases did not go away because CPUID got replaced with library functions. Those library functions?which are compiled with the same compiler you use for your code?have inline assembly in them. (Or, if you're on linux, those library functions read from a device file, but the device driver, which is compiled with the same compiler you use, has inline assembly in it.) So, the compiler still needs to be able to compile it.
>

Or those library functions are written in assembly language directly.
It's entirely possible to write something that uses CPUID and doesn't
use inline assembly in a C source file. The equivalent here, I
suppose, would be hand-rolling a .pyc file.

>> Some people would use it to create an empty set,
>> others would use it to replace variable swapping with a marginally
>> faster and *almost* identical stack-based swap:
>
> Do you really think anyone would do the latter? Seriously, what kind of code can you imagine that's too slow in CPython, not worth rewriting in C or running in PyPy or whatever, but fast enough with the rot opcode removed? And if someone really _did_ need that, I doubt they'd care much that Python 3.8 makes it unnecessary; they obviously have a specific deployment platform that has to work and that needed that last 3% speedup under 3.6.2, and they're going to need that to keep working for years.
>

Hang on, you're asking two different questions there. I'll split it out:

1) Do I really think anyone *should* do this? Your subsequent comments
support this question, and the answer is resoundingly NO. CPython is
not the sort of platform on which that kind of thing is ever worth
doing. You'll get far more performance by using Cython for parts, or
in some other way improving your code, than you will by hand-tweaking
the Python bytecode.

2) Do I think anyone would, if given the ability to tweak the
bytecode, go "Ah ha!" and proudly improve on what the compiler has
done, and then brag about the performance improvement? Definitely.
Someone will. It'll make some marginal difference to a microbenchmark,
and if you don't believe that would cause people to warp their code
into utter unreadability, you clearly don't hang out on python-list
enough :)

>> So while an inline bytecode assembler might have some uses, I suspect
>> it'd be an attractive nuisance more than anything else.
>
> I honestly don't see it becoming an attractive nuisance.
>
> I can easily see it just not getting used for anything at all, beyond people playing with the interpreter.

The "attractive nuisance" part is with microbenchmarking. Code won't
materially improve, and it'll be markedly worse in
readability/maintainability and portability (although the latter
probably doesn't matter all that much; a lot of people's code will be
suboptimal on Pythons other than CPython, if only for lack of 'with'
statements around files and such), with the addition of such a
feature.

>> What I did was put in a literal string?
>> It uses "? is set()" as a marker ? and the resulting function
>> has an unnecessary const in it.
>
> I assumed that leaving the unnecessary const behind was unacceptable. After all, we're talking about (hypothetical?) people who find the cost of LOAD_GLOBAL set; CALL_FUNCTION 0 to be unacceptable? But you're right that fixing up all the other LOAD_CONST bytecodes' args is a feasible way to solve that.

I'm not sure whether the problem is the cost of LOAD_GLOBAL followed
by CALL_FUNCTION (and, by the way, one unnecessary constant in the
function won't have anything like that cost - a bit of wasted RAM, but
not a function call), or the fact that such a style is vulnerable to
shadowing of the name 'set', which admittedly is a very useful name.
But in any case, it's quite solvable.

>>>  So, if the function is a closure, how do you do that?
>> Ah, that part I've no idea about. But it wouldn't be impossible for
>> someone to develop that a bit further.
>
> Not impossible, but very hard, much harder than what you've done so far.
>
> Ultimately, I think that just backs up your larger point: This is doable, but it's going to be a lot of work, and the benefit isn't even nearly worth the cost. My point is that there are other ways to do it that would be less work and/or that would have more side benefits? but the benefit still isn't even nearly worth the cost, so who cares? :)

Yep. Maybe someone (great, that probably means me) should write this
up into a PEP for immediate rejection or withdrawal, just to be a
document to point to - if you want an empty set literal, answer these
objections.

ChrisA

From abarnert at yahoo.com  Tue Jul  1 11:00:29 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 02:00:29 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
 <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
 <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
Message-ID: <1404205229.86750.YahooMailNeo@web181001.mail.ne1.yahoo.com>

On , Andrew Barnert <abarnert at yahoo.com> wrote:
> On Mon, Jun 30, 2014 at 7:15 PM, Nick Coghlan <ncoghlan at gmail.com> 
> wrote:
>> Eugene Toder & Dave Malcolm have some interesting patches on the 
>> tracker to help enhance the compiler

[snip]
?
> Are you referring to Dave Malcolm's patch to adding a hook for an AST 
> optimizer (in Python) right before compiling the AST to code 
> (http://bugs.python.org/issue10399 and related)??
> 
> If so, I don't think that would actually help here. Unless it's possible 
> to say "BUILD_SET 0" in AST, but in that case, we don't need any 
> new compiler hooks; just use an import hook the same way MacroPy does.

I should have just tested it before saying anything:

>>> e = ast.Expression(body=ast.Set(elts=[], ctx=ast.Load(),
... ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? lineno=1, col_offset=0))
>>> c = compile(e2, '<>', 'eval')
>>> dis.dis(c)
? 1 ? ? ? ? ? 0 BUILD_SET ? ? ? ? ? 0
? ? ? ? ? ? ? 3 RETURN_VALUE

So? it is possible to say "BUILD_SET 0" in AST. Which means the easy way to do this is to wrap an import hook around this:

class FixEmptySet(ast.NodeTransformer):
? ? def visit_Name(self, node):
? ? ? ? if node.id == '_EMPTY_SET_LITERAL':
? ? ? ? ? ? return ast.copy_location(
? ? ? ? ? ? ? ? ast.Set(elts=[], ctx=ast.Load()),
? ? ? ? ? ? ? ? node)
? ? ? ? return node

def ecompile(src, fname):
? ? src = src.replace('?', '_EMPTY_SET_LITERAL')
? ? tree = compile(src, fname, 'exec', flags=ast.PyCF_ONLY_AST)
? ? tree = FixEmptySet().visit(tree)
? ? return compile(tree, fname, 'exec')

code = ecompile('def f(): return ?', '<>')
exec(code)
f()

That returns set(). And if you dis.dis(f), it's just BUILD_SET 0 and RETURN_VALUE.

From rosuav at gmail.com  Tue Jul  1 11:07:28 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 1 Jul 2014 19:07:28 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404205229.86750.YahooMailNeo@web181001.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
 <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
 <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
 <1404205229.86750.YahooMailNeo@web181001.mail.ne1.yahoo.com>
Message-ID: <CAPTjJmrTQiSHySjLk6Ye=H5te2jSFDBFN-+4D-2MpsKY8cwcgA@mail.gmail.com>

On Tue, Jul 1, 2014 at 7:00 PM, Andrew Barnert
<abarnert at yahoo.com.dmarc.invalid> wrote:
>     src = src.replace('?', '_EMPTY_SET_LITERAL')

Note that this suffers from a flaw that my POC script also suffers
from: it replaces this character *anywhere*, rather than only when
it's being used as a symbol on its own. Even inside a literal string.
It might be necessary to replace it back the other way afterward,
somehow, but I'm not sure if that would work.

ChrisA

From abarnert at yahoo.com  Tue Jul  1 12:23:19 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 03:23:19 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CAPTjJmrTQiSHySjLk6Ye=H5te2jSFDBFN-+4D-2MpsKY8cwcgA@mail.gmail.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
 <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
 <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
 <1404205229.86750.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CAPTjJmrTQiSHySjLk6Ye=H5te2jSFDBFN-+4D-2MpsKY8cwcgA@mail.gmail.com>
Message-ID: <1404210199.47734.YahooMailNeo@web181004.mail.ne1.yahoo.com>

> On Tuesday, July 1, 2014 2:08 AM, Chris Angelico <rosuav at gmail.com> wrote:

> > On Tue, Jul 1, 2014 at 7:00 PM, Andrew Barnert
> 
> <abarnert at yahoo.com.dmarc.invalid> wrote:
>> ? ?  src = src.replace('?', '_EMPTY_SET_LITERAL')
> 
> Note that this suffers from a flaw that my POC script also suffers
> from: it replaces this character *anywhere*, rather than only when
> it's being used as a symbol on its own. Even inside a literal string.
> It might be necessary to replace it back the other way afterward,
> somehow, but I'm not sure if that would work.


Yes, that's easy. Also, _EMPTY_SET_LITERAL_ itself could exist in your source (after all, it exists in my source fragment right above, right?), but that's easy too. See https://github.com/abarnert/emptyset for a slapped-together implementation that solves both those problems (except for bytes literals, but it explains how to do that). If it prints out "set() is the empty set ?", then it worked; it successfully replaced the ? in your source with an empty set literal, and left the ? in your format string as ?.

From abarnert at yahoo.com  Tue Jul  1 12:51:18 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 03:51:18 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
Message-ID: <1404211878.45311.YahooMailNeo@web181005.mail.ne1.yahoo.com>

> On Tuesday, July 1, 2014 1:39 AM, Chris Angelico <rosuav at gmail.com> wrote:

> > On Tue, Jul 1, 2014 at 6:04 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>>  On Monday, June 30, 2014 5:39 PM, Chris Angelico 
> <rosuav at gmail.com> wrote:
>> 
>>>  That would be interesting, but it raises the possibility of mucking up
>>>  the stack. (Imagine if you put BUILD_SET 1 in there instead. What's 
> it
>>>  going to make a set of? What's going to happen to the rest of the
>>>  stack? Do you REALLY want to debug that?)
>> 
>>  The same thing that happens if you use bad inline assembly in C, or a bad C 
> extension module in Python?bad things that you can't debug at source level. 
> And yet, inline assembly in C and C extension modules in Python are still quite 
> useful.
> 
> Right, useful but it adds another set of problems. (Just out of
> curiosity, what protection _is_ there for a smashed stack? I just
> tried fiddling with it and didn't manage to crash stuff.)

I believe there are cases where the interpreter can detect that you've gone below 0 and raise an exception, but in general there's no protection, or at least nothing you can count on.

For example, assemble this code as a complete function:

? ? CALL_FUNCTION 1
? ? RETURN_VALUE

In 3.4.1, on my Mac, I get a bus error.

But, even when you don't manage to crash the interpreter, when you just confuse it at the bytecode level, there's still no way to debug that except by dropping to gdb/lldb/etc.

>>  I'll ignore the second case for the moment, because I think it's 
> rarely if ever appropriate to Python, and just focus on the first. Those cases 
> did not go away because CPUID got replaced with library functions. Those library 
> functions?which are compiled with the same compiler you use for your code?have 
> inline assembly in them. (Or, if you're on linux, those library functions 
> read from a device file, but the device driver, which is compiled with the same 
> compiler you use, has inline assembly in it.) So, the compiler still needs to be 
> able to compile it.
>> 
> 
> Or those library functions are written in assembly language directly.
> It's entirely possible to write something that uses CPUID and doesn't
> use inline assembly in a C source file. The equivalent here, I
> suppose, would be hand-rolling a .pyc file.

Yeah, that's entirely possible, but that's not how the linux device driver or the FreeBSD libc wrapper do it; they use inline assembly. Why? Well, for one thing, you get the function prolog and epilog code appropriate for your compiler automatically, instead of having to write it yourself. Also, you can do nice things like cast the result to a struct that you defined in C (which could be done with, e.g., a C macro wrapping the assembly source, but that's just making things more complicated for no benefit). And you don't need to know how to configure and run an assembler alongside the C compiler to build the device. And so on. Basically, the C versions of the exact same reasons you wouldn't want to hand-roll a .pyc file in Python?

> 2) Do I think anyone would, if given the ability to tweak the
> bytecode, go "Ah ha!" and proudly improve on what the compiler has
> done, and then brag about the performance improvement? Definitely.
> Someone will. It'll make some marginal difference to a microbenchmark,
> and if you don't believe that would cause people to warp their code
> into utter unreadability, you clearly don't hang out on python-list
> enough :)


Using ctypes to load Python.so to swap the pointers under the covers is already significantly faster, and would still be significantly faster than your optimized bytecode, and yes, people have suggested it on at least two StackOverflow questions. For that matter, you can already do exactly your optimization with a relatively simple bytecode hack, which would look a lot worse than the inline asm and have the same effect. Also, that bytecode hack could be factored out into a function, without any performance cost except a constant cost at .pyc time, while the inline asm obviously can't, another reason the inline asm (which would have to be written inline, and edited to fit the variables in question, each time) would be less of an attractive nuisance than what's already there. Sure, there may be a few people who are looking for horrible micro-optimizations like this, would know enough to figure out how to do this with inline asm, would not know how to do it
 with bytecode hacks, would not know any of the better (as in much worse, to anyone but them) alternatives, etc., but I think that number is vanishingly small.

>>>  What I did was put in a literal string?

>>>  It uses "? is set()" as a marker ? and the resulting function
>>>  has an unnecessary const in it.
>> 
>>  I assumed that leaving the unnecessary const behind was unacceptable. After 
> all, we're talking about (hypothetical?) people who find the cost of 
> LOAD_GLOBAL set; CALL_FUNCTION 0 to be unacceptable? But you're right that 
> fixing up all the other LOAD_CONST bytecodes' args is a feasible way to 
> solve that.
> 
> I'm not sure whether the problem is the cost of LOAD_GLOBAL followed
> by CALL_FUNCTION (and, by the way, one unnecessary constant in the
> function won't have anything like that cost - a bit of wasted RAM, but
> not a function call), or the fact that such a style is vulnerable to
> shadowing of the name 'set', which admittedly is a very useful name.
> But in any case, it's quite solvable.

I realize the cost of an extra LOAD_GLOBAL is much smaller than an extra CALL_FUNCTION, it's just that I think in 99.9999% of real cases neither will make a difference, and anyone who's objecting to the latter on principle will probably also object to the former on principle?

>>>> ? So, if the function is a closure, how do you do that?
>>>  Ah, that part I've no idea about. But it wouldn't be impossible 
> for
>>>  someone to develop that a bit further.
>> 
>>  Not impossible, but very hard, much harder than what you've done so 
> far.
>> 
>>  Ultimately, I think that just backs up your larger point: This is doable, 
> but it's going to be a lot of work, and the benefit isn't even nearly 
> worth the cost. My point is that there are other ways to do it that would be 
> less work and/or that would have more side benefits? but the benefit still 
> isn't even nearly worth the cost, so who cares? :)
> 
> Yep. Maybe someone (great, that probably means me) should write this
> up into a PEP for immediate rejection or withdrawal, just to be a
> document to point to - if you want an empty set literal, answer these
> objections.


I think Terry Reedy actually had a better answer: just tell people to implement it, polish it up, put it on PyPI, and come back to us when they're ready to show off their tons of users who can't live without it. Random objected that wasn't possible, in which case Terry's idea is more of a dismissal than a helpful suggestion, but I think https://github.com/abarnert/emptyset proves that it is possible, and even pretty easy.

From kn0m0n3 at gmail.com  Tue Jul  1 16:23:16 2014
From: kn0m0n3 at gmail.com (www.leap.cc)
Date: Tue, 01 Jul 2014 09:23:16 -0500
Subject: [Python-ideas] Enigmail on cell phone?
Message-ID: <yxi7cy4y0tjh18fty6chc49y.1404224596076@email.android.com>

Anyone done this successfully?  How can I use python for hidden Markov mods in machine learning from closed circuit came for public transportation to make better schedualing ask local universities computer science competition...  cheers, d.j.?

Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:

>> On Tuesday, July 1, 2014 1:39 AM, Chris Angelico <rosuav at gmail.com> wrote:
>
>> > On Tue, Jul 1, 2014 at 6:04 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>>>>  On Monday, June 30, 2014 5:39 PM, Chris Angelico 
>> <rosuav at gmail.com> wrote:
>>> 
>>>>  That would be interesting, but it raises the possibility of mucking up
>>>>  the stack. (Imagine if you put BUILD_SET 1 in there instead. What's 
>> it
>>>>  going to make a set of? What's going to happen to the rest of the
>>>>  stack? Do you REALLY want to debug that?)
>>> 
>>>  The same thing that happens if you use bad inline assembly in C, or a bad C 
>> extension module in Python?bad things that you can't debug at source level. 
>> And yet, inline assembly in C and C extension modules in Python are still quite 
>> useful.
>> 
>> Right, useful but it adds another set of problems. (Just out of
>> curiosity, what protection _is_ there for a smashed stack? I just
>> tried fiddling with it and didn't manage to crash stuff.)
>
>I believe there are cases where the interpreter can detect that you've gone below 0 and raise an exception, but in general there's no protection, or at least nothing you can count on.
>
>For example, assemble this code as a complete function:
>
>? ? CALL_FUNCTION 1
>? ? RETURN_VALUE
>
>In 3.4.1, on my Mac, I get a bus error.
>
>But, even when you don't manage to crash the interpreter, when you just confuse it at the bytecode level, there's still no way to debug that except by dropping to gdb/lldb/etc.
>
>>>  I'll ignore the second case for the moment, because I think it's 
>> rarely if ever appropriate to Python, and just focus on the first. Those cases 
>> did not go away because CPUID got replaced with library functions. Those library 
>> functions?which are compiled with the same compiler you use for your code?have 
>> inline assembly in them. (Or, if you're on linux, those library functions 
>> read from a device file, but the device driver, which is compiled with the same 
>> compiler you use, has inline assembly in it.) So, the compiler still needs to be 
>> able to compile it.
>>> 
>> 
>> Or those library functions are written in assembly language directly.
>> It's entirely possible to write something that uses CPUID and doesn't
>> use inline assembly in a C source file. The equivalent here, I
>> suppose, would be hand-rolling a .pyc file.
>
>Yeah, that's entirely possible, but that's not how the linux device driver or the FreeBSD libc wrapper do it; they use inline assembly. Why? Well, for one thing, you get the function prolog and epilog code appropriate for your compiler automatically, instead of having to write it yourself. Also, you can do nice things like cast the result to a struct that you defined in C (which could be done with, e.g., a C macro wrapping the assembly source, but that's just making things more complicated for no benefit). And you don't need to know how to configure and run an assembler alongside the C compiler to build the device. And so on. Basically, the C versions of the exact same reasons you wouldn't want to hand-roll a .pyc file in Python?
>
>> 2) Do I think anyone would, if given the ability to tweak the
>> bytecode, go "Ah ha!" and proudly improve on what the compiler has
>> done, and then brag about the performance improvement? Definitely.
>> Someone will. It'll make some marginal difference to a microbenchmark,
>> and if you don't believe that would cause people to warp their code
>> into utter unreadability, you clearly don't hang out on python-list
>> enough :)
>
>
>Using ctypes to load Python.so to swap the pointers under the covers is already significantly faster, and would still be significantly faster than your optimized bytecode, and yes, people have suggested it on at least two StackOverflow questions. For that matter, you can already do exactly your optimization with a relatively simple bytecode hack, which would look a lot worse than the inline asm and have the same effect. Also, that bytecode hack could be factored out into a function, without any performance cost except a constant cost at .pyc time, while the inline asm obviously can't, another reason the inline asm (which would have to be written inline, and edited to fit the variables in question, each time) would be less of an attractive nuisance than what's already there. Sure, there may be a few people who are looking for horrible micro-optimizations like this, would know enough to figure out how to do this with inline asm, would not know how to do it
> with bytecode hacks, would not know any of the better (as in much worse, to anyone but them) alternatives, etc., but I think that number is vanishingly small.
>
>>>>  What I did was put in a literal string?
>
>>>>  It uses "? is set()" as a marker ? and the resulting function
>>>>  has an unnecessary const in it.
>>> 
>>>  I assumed that leaving the unnecessary const behind was unacceptable. After 
>> all, we're talking about (hypothetical?) people who find the cost of 
>> LOAD_GLOBAL set; CALL_FUNCTION 0 to be unacceptable? But you're right that 
>> fixing up all the other LOAD_CONST bytecodes' args is a feasible way to 
>> solve that.
>> 
>> I'm not sure whether the problem is the cost of LOAD_GLOBAL followed
>> by CALL_FUNCTION (and, by the way, one unnecessary constant in the
>> function won't have anything like that cost - a bit of wasted RAM, but
>> not a function call), or the fact that such a style is vulnerable to
>> shadowing of the name 'set', which admittedly is a very useful name.
>> But in any case, it's quite solvable.
>
>I realize the cost of an extra LOAD_GLOBAL is much smaller than an extra CALL_FUNCTION, it's just that I think in 99.9999% of real cases neither will make a difference, and anyone who's objecting to the latter on principle will probably also object to the former on principle?
>
>>>>> ? So, if the function is a closure, how do you do that?
>>>>  Ah, that part I've no idea about. But it wouldn't be impossible 
>> for
>>>>  someone to develop that a bit further.
>>> 
>>>  Not impossible, but very hard, much harder than what you've done so 
>> far.
>>> 
>>>  Ultimately, I think that just backs up your larger point: This is doable, 
>> but it's going to be a lot of work, and the benefit isn't even nearly 
>> worth the cost. My point is that there are other ways to do it that would be 
>> less work and/or that would have more side benefits? but the benefit still 
>> isn't even nearly worth the cost, so who cares? :)
>> 
>> Yep. Maybe someone (great, that probably means me) should write this
>> up into a PEP for immediate rejection or withdrawal, just to be a
>> document to point to - if you want an empty set literal, answer these
>> objections.
>
>
>I think Terry Reedy actually had a better answer: just tell people to implement it, polish it up, put it on PyPI, and come back to us when they're ready to show off their tons of users who can't live without it. Random objected that wasn't possible, in which case Terry's idea is more of a dismissal than a helpful suggestion, but I think https://github.com/abarnert/emptyset proves that it is possible, and even pretty easy.
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/

From liam.marsh.home at gmail.com  Tue Jul  1 18:31:22 2014
From: liam.marsh.home at gmail.com (Liam Marsh)
Date: Tue, 1 Jul 2014 18:31:22 +0200
Subject: [Python-ideas] error in os.popen result
Message-ID: <CACPPHzvfsjoFxjQ=hitJ=+-f1_dMXSnU+Wr9sqN9Mp4RFav1Qw@mail.gmail.com>

hello, for a small server program, I wanted to know which ports were
occuped.
with the dos command 'netstat'
so I tried this:

*>>>a=os.popen('netstat')*
*>>>bytes(a.read())*

but this occured in the second step:

*Traceback (most recent call last):*
*  File "<pyshell#20>", line 1, in <module>*
*    bytes(a.read())*
*  File "C:\Apps\Programmation\Python3.2\lib\encodings\cp1252.py", line 23,
in decode*
*    return codecs.charmap_decode(input,self.errors,decoding_table)[0]*
*UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 79:
character maps to <undefined>*

*how can I avoid it and why does the windows cmd does return an
 undecodable character?*

*thank you.*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140701/11ecd30e/attachment.html>

From ncoghlan at gmail.com  Tue Jul  1 18:58:37 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 1 Jul 2014 09:58:37 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAO41-mNtEBPGkYvC6OSH_kVmzN-PEk4yrMqYWGmqBHK492CobQ@mail.gmail.com>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
 <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
 <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
Message-ID: <CADiSq7fx1_aUTM+zc07UMxBNuBo=0kD=kpMMzhygqWSMAWF_LA@mail.gmail.com>

On 1 July 2014 01:27, Andrew Barnert <abarnert at yahoo.com> wrote:
> (Replies to both Guido's top-post and Nick's reply-post below.)
>
> On Monday, June 30, 2014 7:19 PM, Guido van Rossum <guido at python.org> wrote:
>
>>Like bytecode, the compiler's workings are not part of the language spec, and are likely to change incompatibly between versions and not work for anything besides CPython. I don't really want to go there (cool though it sounds for wannabe compiler hackers).
>
>
> But CPython does expose bytecode via the dis module, parts of inspect, etc. For that matter, it exposes some of the compiler's workings (especially if you consider everything up to AST generation part of the compiler, since every step up to there is exposed, including doing the whole thing in one whack with PyCF_ONLY_AST). So, I don't see how exposing the AST-to-bytecode transformation part (or, while we're at it, the .pyc generation part) is any more unportable than what's already there.
>

Note that the dis module has a "CPython implementation detail"
disclaimer, and the AST structure is deliberately exempted from the
usual backwards compatibility guarantees.

As far as hooking compilation goes,
https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader.source_to_code
was added in 3.4 specifically to make it easier to define custom
loaders that make use of most of the existing import machinery
(including bytecode cache files), but do something different for the
source -> bytecode transformation step.

Cheers,
Nick.

From steve at pearwood.info  Tue Jul  1 19:04:22 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 2 Jul 2014 03:04:22 +1000
Subject: [Python-ideas] error in os.popen result
In-Reply-To: <CACPPHzvfsjoFxjQ=hitJ=+-f1_dMXSnU+Wr9sqN9Mp4RFav1Qw@mail.gmail.com>
References: <CACPPHzvfsjoFxjQ=hitJ=+-f1_dMXSnU+Wr9sqN9Mp4RFav1Qw@mail.gmail.com>
Message-ID: <20140701170422.GS13014@ando>

Hello Liam,

This is a mailing list for discussing possible future ideas for the next 
version of Python, not for general support.

I recommend that you use the python-list at python.org mailing list, also 
available via Usenet on comp.lang.python.

Good luck.




On Tue, Jul 01, 2014 at 06:31:22PM +0200, Liam Marsh wrote:
> hello, for a small server program, I wanted to know which ports were
> occuped.
> with the dos command 'netstat'
> so I tried this:
> 
> *>>>a=os.popen('netstat')*
> *>>>bytes(a.read())*
> 
> but this occured in the second step:
> 
> *Traceback (most recent call last):*
> *  File "<pyshell#20>", line 1, in <module>*
> *    bytes(a.read())*
> *  File "C:\Apps\Programmation\Python3.2\lib\encodings\cp1252.py", line 23,
> in decode*
> *    return codecs.charmap_decode(input,self.errors,decoding_table)[0]*
> *UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 79:
> character maps to <undefined>*
> 
> *how can I avoid it and why does the windows cmd does return an
>  undecodable character?*
> 
> *thank you.*

> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From steve at pearwood.info  Tue Jul  1 19:15:29 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 2 Jul 2014 03:15:29 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CADiSq7fx1_aUTM+zc07UMxBNuBo=0kD=kpMMzhygqWSMAWF_LA@mail.gmail.com>
References: <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CADiSq7cPiXK6VgUzKrCOuU=j64c0agu+pS0xuq-voR-WKj1xbg@mail.gmail.com>
 <CAP7+vJL7Chru9Np7NT_mT1avDwTUpeV5DeLu=1GHyiO_VmGQ1w@mail.gmail.com>
 <1404203220.53463.YahooMailNeo@web181002.mail.ne1.yahoo.com>
 <CADiSq7fx1_aUTM+zc07UMxBNuBo=0kD=kpMMzhygqWSMAWF_LA@mail.gmail.com>
Message-ID: <20140701171529.GT13014@ando>

On Tue, Jul 01, 2014 at 09:58:37AM -0700, Nick Coghlan wrote:
> On 1 July 2014 01:27, Andrew Barnert <abarnert at yahoo.com> wrote:

> > But CPython does expose bytecode via the dis module, parts of 
> > inspect, etc. [...]
> 
> Note that the dis module has a "CPython implementation detail"
> disclaimer, and the AST structure is deliberately exempted from the
> usual backwards compatibility guarantees.

Further to what Nick says, the *output* of dis is not expected to remain 
backwards compatible from version to version, only the dis API itself.

There's a big difference between saying "we guarantee that the dis 
module will correctly and accurately disassemble valid bytecode", and 
saying "we guarantee that this specific chunk of bytecode will do these 
things". In order to use a hypothetical asm function, you need to know 
what pseudo-assembly to write, say `asm [SPAM, EGGS]`. That means that 
SPAM and EGGS must be stable and part of the language definition. (Or at 
least part of the CPython API.) That's a big step from the current 
situation.


-- 
Steven

From steve at pearwood.info  Tue Jul  1 19:33:07 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 2 Jul 2014 03:33:07 +1000
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
References: <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
Message-ID: <20140701173307.GU13014@ando>

On Tue, Jul 01, 2014 at 06:38:37PM +1000, Chris Angelico wrote:
[...]
> 1) Do I really think anyone *should* do this? Your subsequent comments
> support this question, and the answer is resoundingly NO. CPython is

"This" being trying to micro-optimize code by bytecode-hacking.

> not the sort of platform on which that kind of thing is ever worth
> doing. You'll get far more performance by using Cython for parts, or
> in some other way improving your code, than you will by hand-tweaking
> the Python bytecode.

I think that micro-optimization is probably the wrong reason to hack 
bytecodes. What I'm more interested in is exploring potential new 
features, or to add functionality, for example:

Adding the ability to trace individual expressions, not just lines:
http://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html

Exploring dynamic scoping:
http://www.voidspace.org.uk/python/articles/code_blocks.shtml

A proposal from Python 2.3 days for a brand-new decorator syntax:
http://code.activestate.com/recipes/286147

A (serious!) defence of GOTO in Python:
http://www.dr-josiah.com/2012/04/python-bytecode-hacks-gotos-revisited.html

(although even Josiah doesn't suggest using COMEFROM :-)


I don't know that such bytecode manipulations should be provided in the 
standard library, and certainly not as a built-in "asm" command. But, I 
think that we ought to acknowledge that bytecode hacking has a role to 
play in the wider Python ecosystem.

I'm lead to understand that in the Java community, bytecode hacking is, 
perhaps not common, but accepted as something that powerusers do when 
all else fails:

https://weblogs.java.net/blog/simonis/archive/2009/02/we_need_a_dirty.html


[Aside: does Python do any sort of verification of the bytecode before 
executing it, as Java does?]



-- 
Steven

From ncoghlan at gmail.com  Tue Jul  1 20:07:35 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 1 Jul 2014 11:07:35 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <20140701173307.GU13014@ando>
References: <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <20140701173307.GU13014@ando>
Message-ID: <CADiSq7ekr=ShvMn1rre0jjq5duxCTFMWwmE9Pz842wfuxJaioQ@mail.gmail.com>

On 1 July 2014 10:33, Steven D'Aprano <steve at pearwood.info> wrote:
> On Tue, Jul 01, 2014 at 06:38:37PM +1000, Chris Angelico wrote:
> [...]
>> 1) Do I really think anyone *should* do this? Your subsequent comments
>> support this question, and the answer is resoundingly NO. CPython is
>
> "This" being trying to micro-optimize code by bytecode-hacking.
>
>> not the sort of platform on which that kind of thing is ever worth
>> doing. You'll get far more performance by using Cython for parts, or
>> in some other way improving your code, than you will by hand-tweaking
>> the Python bytecode.
>
> I think that micro-optimization is probably the wrong reason to hack
> bytecodes. What I'm more interested in is exploring potential new
> features, or to add functionality

https://pypi.python.org/pypi/withhacks and
https://pypi.python.org/pypi/byteplay may also be of interest to
anyone wishing to seriously tinker with what the CPython VM (as
opposed to Python-the-language) already supports. I also highly advise
working Python 3.4, since we made some substantial improvements to the
dis module API (adding the yield from tests for 3.3 highlighted how
limited the previous API was for testing purposes, so we fixed it in a
way that made bytecode easier to work with in general).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Tue Jul  1 20:16:29 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 1 Jul 2014 11:16:29 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <20140701173307.GU13014@ando>
References: <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <20140701173307.GU13014@ando>
Message-ID: <CADiSq7fet6GyH4JWkuB+mG9w7JbDBPJLdTHLPNAv4zQ3un4yFQ@mail.gmail.com>

On 1 July 2014 10:33, Steven D'Aprano <steve at pearwood.info> wrote:

> [Aside: does Python do any sort of verification of the bytecode before
> executing it, as Java does?]

Nope, it will happily attempt to execute invalid bytecode. That's
actually one of the reasons executing untrusted bytecode is even less
safe than executing untrusted source code - it's likely to be possible
to trigger segfaults that way.

There's an initial attempt at a bytecode verifier on PyPI
(https://pypi.python.org/pypi/Python-Bytecode-Verifier/), and I have a
vague recollection that Google have a bytecode verifier kicking around
somewhere, but there's nothing built in to the CPython runtime.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From python at mrabarnett.plus.com  Tue Jul  1 20:59:21 2014
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 01 Jul 2014 19:59:21 +0100
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <CADiSq7fet6GyH4JWkuB+mG9w7JbDBPJLdTHLPNAv4zQ3un4yFQ@mail.gmail.com>
References: <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <20140701173307.GU13014@ando>
 <CADiSq7fet6GyH4JWkuB+mG9w7JbDBPJLdTHLPNAv4zQ3un4yFQ@mail.gmail.com>
Message-ID: <53B30509.4090808@mrabarnett.plus.com>

On 2014-07-01 19:16, Nick Coghlan wrote:
> On 1 July 2014 10:33, Steven D'Aprano <steve at pearwood.info> wrote:
>
>> [Aside: does Python do any sort of verification of the bytecode
>> before executing it, as Java does?]
>
> Nope, it will happily attempt to execute invalid bytecode. That's
> actually one of the reasons executing untrusted bytecode is even less
> safe than executing untrusted source code - it's likely to be
> possible to trigger segfaults that way.
>
> There's an initial attempt at a bytecode verifier on PyPI
> (https://pypi.python.org/pypi/Python-Bytecode-Verifier/), and I have
> a vague recollection that Google have a bytecode verifier kicking
> around somewhere, but there's nothing built in to the CPython
> runtime.
>
The re module also uses a kind of bytecode that's generated by the
Python front end and verified by the C back end. The bytecode contains
things like offsets; for example, the bytecode that starts a repeated
sequence has an offset to the corresponding bytecode that ends it, and
vice versa.

The problem with that is that the structure (i.e. the nesting) is no
longer explicit, so it's more difficult to spot misnested structures.

For the regex module, I decided that it would be easier to verify if I
kept the structure explicit by using bytecodes to indicate the start and
end of the structures. For example, a repeated sequence could be
indicated by having a structure like GREEDY_REPEAT min_count max_count
... END.

The C back end could then build the internal representation that's
actually interpreted.

From tjreedy at udel.edu  Tue Jul  1 22:04:20 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 01 Jul 2014 16:04:20 -0400
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <1404211878.45311.YahooMailNeo@web181005.mail.ne1.yahoo.com>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <1404211878.45311.YahooMailNeo@web181005.mail.ne1.yahoo.com>
Message-ID: <lov48e$er7$1@ger.gmane.org>

On 7/1/2014 6:51 AM, Andrew Barnert wrote:

> I think Terry Reedy actually had a better answer: just tell people to
> implement it,  polish it up, put it on PyPI, and come back to us when
> they're ready to show off their tons of users who can't live without
> it. Random objected that wasn't possible,

'Random' said something quite different. He only noted that if '?' were 
translated to 'set()', then the resulting CPython-specific bytecode 
would continue to be "LOAD_GLOBAL (set), CALL_FUNCTION 0" rather than 
the 'optimized' "BUILD_SET 0". He also noted (objected) that there is no 
python code that CPython currently compiles as "BUILD_SET 0" Well, its 
unfortunate that {} is not available. If it were, there would be no 
issue, to me anyway, of using '?'.  However, optimizing CPython 
bytecode, and compiler hooks, are completely different issues from 
translating unisym python to standard python that could run on any 
implementation of Python. If we thought the bytecode difference was 
important (which most do not), we could have a peephole optimizer to 
'fix' it, completely independently of the existence of '?' or any idea 
of using it in python code.

> in which case Terry's idea is more of a dismissal than a helpful suggestion,

My post was a dismissal of the idea of changing python itself *and* a 
suggestion of how to proceed without involving pydev.

> https://github.com/abarnert/emptyset proves that it is possible, and
> even pretty easy.

I consider producing (or at least being able to produce) a standard .py 
file that can be published outside the specialized group run on and 
debugged on standard interpreters to be essential to any sensible idea 
for augmented Python code (whether with unicode symbols or anything 
else, such as native-language keywords).  However, as I said before, off 
topic here for unicode symbols, though not on python-list.

-- 
Terry Jan Reedy



From mertz at gnosis.cx  Tue Jul  1 22:25:25 2014
From: mertz at gnosis.cx (David Mertz)
Date: Tue, 1 Jul 2014 13:25:25 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
 Empty dict)
In-Reply-To: <lov48e$er7$1@ger.gmane.org>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <1404211878.45311.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <lov48e$er7$1@ger.gmane.org>
Message-ID: <CAEbHw4Z35uShp5QF9ikPmDTKpLfE0hHrRkXDPXQBaLCWNZ48aw@mail.gmail.com>

Somewhere in this thread, someone mentioned
https://github.com/ehamberg/vim-cute-python (and something similar for
emacs, but I'm a vim user).  I'm not sure if this mention was a joke or
not, but I thought it looked cool and started using it.  I can't decide if
I actually find it useful or distracting, but in truth it seems to answer
the *entire* concern of anyone wanting to see an empty-set symbol (but not
to save one bytecode instruction, I admit), and also various other math
symbols that name concepts spelled in ASCII in Python.

While some hypothetical .pyu translation tool or import hook could do the
same thing, this really *does* seem like something to just do at the editor
level since there's nothing *semantic* about the new symbols, just a way
for them to look.


On Tue, Jul 1, 2014 at 1:04 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 7/1/2014 6:51 AM, Andrew Barnert wrote:
>
>  I think Terry Reedy actually had a better answer: just tell people to
>> implement it,  polish it up, put it on PyPI, and come back to us when
>> they're ready to show off their tons of users who can't live without
>> it. Random objected that wasn't possible,
>>
>
> 'Random' said something quite different. He only noted that if '?' were
> translated to 'set()', then the resulting CPython-specific bytecode would
> continue to be "LOAD_GLOBAL (set), CALL_FUNCTION 0" rather than the
> 'optimized' "BUILD_SET 0". He also noted (objected) that there is no python
> code that CPython currently compiles as "BUILD_SET 0" Well, its unfortunate
> that {} is not available. If it were, there would be no issue, to me
> anyway, of using '?'.  However, optimizing CPython bytecode, and compiler
> hooks, are completely different issues from translating unisym python to
> standard python that could run on any implementation of Python. If we
> thought the bytecode difference was important (which most do not), we could
> have a peephole optimizer to 'fix' it, completely independently of the
> existence of '?' or any idea of using it in python code.
>
>
>  in which case Terry's idea is more of a dismissal than a helpful
>> suggestion,
>>
>
> My post was a dismissal of the idea of changing python itself *and* a
> suggestion of how to proceed without involving pydev.
>
>
>  https://github.com/abarnert/emptyset proves that it is possible, and
>> even pretty easy.
>>
>
> I consider producing (or at least being able to produce) a standard .py
> file that can be published outside the specialized group run on and
> debugged on standard interpreters to be essential to any sensible idea for
> augmented Python code (whether with unicode symbols or anything else, such
> as native-language keywords).  However, as I said before, off topic here
> for unicode symbols, though not on python-list.
>
> --
> Terry Jan Reedy
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140701/ed871cb8/attachment-0001.html>

From abarnert at yahoo.com  Tue Jul  1 23:33:02 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 14:33:02 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <20140701173307.GU13014@ando>
References: <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <20140701173307.GU13014@ando>
Message-ID: <1404250382.2679.YahooMailNeo@web181005.mail.ne1.yahoo.com>

> On Tuesday, July 1, 2014 10:35 AM, Steven D'Aprano <steve at pearwood.info> wrote:

> I think that micro-optimization is probably the wrong reason to hack 
> bytecodes. What I'm more interested in is exploring potential new 
> features, or to add functionality, for example:
> 
> Adding the ability to trace individual expressions, not just lines:
> http://nedbatchelder.com/blog/200804/wicked_hack_python_bytecode_tracing.html
> 
> Exploring dynamic scoping:
> http://www.voidspace.org.uk/python/articles/code_blocks.shtml
> 
> A proposal from Python 2.3 days for a brand-new decorator syntax:
> http://code.activestate.com/recipes/286147
> 
> A (serious!) defence of GOTO in Python:
> http://www.dr-josiah.com/2012/04/python-bytecode-hacks-gotos-revisited.html
> 
> (although even Josiah doesn't suggest using COMEFROM :-)
> 
> 
> I don't know that such bytecode manipulations should be provided in the 
> standard library, and certainly not as a built-in "asm" command. But, 
> I 
> think that we ought to acknowledge that bytecode hacking has a role to 
> play in the wider Python ecosystem.

I think CPython provides just about the right level of support here.

The documentation, the APIs, and the helper tools for dealing with bytecode are all superb, and get better with each release. It's all more than sufficient to figure out what you're doing, and how to do it.

It might be nice if there were an assembler in the stdlib, but the format is simple enough, and the documentation complete enough, that you can write one in a couple hours (as I did). And, honestly, I suspect a stdlib assembler wouldn't be updated fast enough?e.g., when support for Instruction objects was added to CPython's dis module in 3.4, I doubt an existing assembler would have been modified to take advantage of that, but a new one that you slap together can do so easily.

Documenting that bytecode is only supported on CPython, and can change between CPython versions, isn't a problem for anyone who's just looking to experiment with and explore ideas, rather than write production code. As your examples show, you can usually even publish your explorations for others to experiment with, granting those limitations, and maintain them for years without much headache. (Bytecode has traditionally been much more conservative than what the documentation allows; it's generally only when your hacks rely on knowing exactly what bytecode will be generated for a given Python expression that they break. But even there, with a sufficient test suite, it's usually pretty simple to adapt.)

> I'm lead to understand that in the Java community, bytecode hacking is,?

> perhaps not common, but accepted as something that powerusers do when 
> all else fails:
> 
> https://weblogs.java.net/blog/simonis/archive/2009/02/we_need_a_dirty.html


Here, it sounds like you _are_ suggesting that bytecode hacking may need to be used for production code, not just for exploration. But there are some pretty big differences between Java and Python that I think are relevant here:

?* Java is designed for one specific VM, on which many other languages run; Python is designed to run on a variety of VMs, and nothing else runs on the CPython VM.
?* Java is designed to be secure first, fast second, and flexible a distant third; Python is designed to be simple and transparent first, flexible and dynamic second, and everything else a distant third. So most of what you'd want to do (including solving problems like the one in the blog) can be done with simple monkey-patching and related techniques?and you can go a lot deeper than that without getting beyond the supported, portable reflection techniques.
?* Java's VM is designed to be debuggable and optimizable; CPython's is designed to be the simplest thing that could support CPython. So, anything that's too hard to do with runtime structures is often easier at the VM level in Java, while the reverse is true in CPython.
?* Java code is often distributed and always deployed as binary files; Python almost always as source. Besides being the cause of problems like the one in this article, it also means that if you have to go below the runtime level, you don't have the intermediate steps of source and AST hacking, you have no choice but to go to the bytecode.

From abarnert at yahoo.com  Wed Jul  2 00:03:17 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 1 Jul 2014 15:03:17 -0700
Subject: [Python-ideas] .pyu nicode syntax symbols (was Re: Empty set,
	Empty dict)
In-Reply-To: <lov48e$er7$1@ger.gmane.org>
References: <DA5C0D17-2AD9-4B87-891D-C66023F8F7CB@wiggy.net>
 <CAP7+vJ+9yjg3cdcoe2RCSTJWFHVRcrq_sQLBoPq-PMFJoMS83Q@mail.gmail.com>
 <CAN8d9g=umX8GM+A=B5XpJqXJizsahVxWbo7KELwG_mXJhR4Oyg@mail.gmail.com>
 <DBC557BA-FA33-4B72-9DD3-09276A9B8E82@gmail.com>
 <CAN8d9gn2PTOwfU5WCDtugeWLpd3=ujCFBSvG263ySi1VpM9HgA@mail.gmail.com>
 <lo7dni$582$1@ger.gmane.org>
 <1403931602.14407.135458493.44CF193B@webmail.messagingengine.com>
 <CAPTjJmpKiBZWwkgzmnj53LFNCb3qn5-pQjY1xxVxAVA+Xc-jZg@mail.gmail.com>
 <1404148690.18766.136186337.62A26E9E@webmail.messagingengine.com>
 <CAPTjJmoxORHDo1bhUwO68y9n0cGSmgckcHyBPLx0BbbanR5Fug@mail.gmail.com>
 <1404172094.55099.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmrvdvrnedoq0z64=7kO02xeM7W+kptOP55NeRSyEwBzpQ@mail.gmail.com>
 <1404201877.75807.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAPTjJmr8eabmJwuXydxoqb2xze-u8bqQ+HwwAZqayR9Rw5UXgw@mail.gmail.com>
 <1404211878.45311.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <lov48e$er7$1@ger.gmane.org>
Message-ID: <1404252197.45327.YahooMailNeo@web181001.mail.ne1.yahoo.com>

> On Tuesday, July 1, 2014 1:05 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> > On 7/1/2014 6:51 AM, Andrew Barnert wrote:
> 
>>  I think Terry Reedy actually had a better answer: just tell people to
>>  implement it,? polish it up, put it on PyPI, and come back to us when
>>  they're ready to show off their tons of users who can't live 
> without
>>  it. Random objected that wasn't possible,
> 
> 'Random' said something quite different. He only noted that if 
> '?' were 
> translated to 'set()', then the resulting CPython-specific bytecode 
> would continue to be "LOAD_GLOBAL (set), CALL_FUNCTION 0" rather than 
> the 'optimized' "BUILD_SET 0". He also noted (objected) that 
> there is no 
> python code that CPython currently compiles as "BUILD_SET 0"?

You're reading a lot into a 2-line message, but your take is that he interpreted the problem as needing to compile "BUILD_SET 0", and pointed out that there is no way to do that with a source preprocessor.

You can insist that they're two separate problems to be solved (or, maybe, not solved), and I think you're right. You just have to make that point?as you, I, and half a dozen others have done since his original post.

But meanwhile, Chris Angelico offered a solution to the problem that answers his complaint, and I offered another solution that doesn't even require bytecode hacking. That shows that even if you accept the objection, it still doesn't block anyone.

> Well, its 
> unfortunate that {} is not available. If it were, there would be no 
> issue, to me anyway, of using '?'.? However, optimizing CPython 
> bytecode, and compiler hooks, are completely different issues from 
> translating unisym python to standard python that could run on any 
> implementation of Python.

First, as others have pointed out, it's not just, or even primarily, an optimization, it's also a semantic difference.

> If we thought the bytecode difference was?

> important (which most do not), we could have a peephole optimizer to 
> 'fix' it, completely independently of the existence of '?' or 
> any idea 
> of using it in python code.

But you can't make semantic changes in a peephole optimizer. You'd have to first change the language to document that set() may (or may not!) return an empty set even if the name set resolves to something different. While this isn't entirely unique in Python history (e.g., back when you could redefine False through various kinds of trickery, the compiler was still allowed to optimize out if False: code), but it's very unusual. And nobody's going to do that for a minor optimization (if False:, besides being a potentially huge optimization, also _fixes_ a semantic problem, rather than causing one, since False was supposed to be un-redefinable, but wasn't because of various holes).

>>  in which case Terry's idea is more of a dismissal than a helpful 
> suggestion,
> 
> My post was a dismissal of the idea of changing python itself *and* a 
> suggestion of how to proceed without involving pydev.

My point is that _if_ you take Random's objection as being critical, _then_ your post dismisses the idea, even though it wasn't intended to. You can follow up in two ways: challenge his objection, or answer his objection; there were replies doing both, and if either of the two succeeds, the idea is still alive for people to take further if they want.

>>  https://github.com/abarnert/emptyset proves that it is possible, and
>>  even pretty easy.
> 
> I consider producing (or at least being able to produce) a standard .py 
> file that can be published outside the specialized group run on and 
> debugged on standard interpreters to be essential to any sensible idea?

My approach is made up of nothing but standard .py files. Those files can be published outside a specialized group, and run and debugged on CPython 3.4+. They can also be edited by people outside that specialized group, without needing a specialized build process involving a preprocessor, just a standard Python module that they already have.

Sure, it only works on CPython, but Python 3.4, scipy, etc. also currently only work on CPython, and that doesn't prevent a large community of users from making using of them, publishing code outside a specialized group, and?most importantly for the topic at hand?coming up with suggestions that are germane to Python as a whole and taken seriously. For example, nobody suggested that PEP 465 wasn't a sensible idea because all of the sample code presented only runs on CPython; the idea itself is clearly portable, the community using such code is gigantic and mature, and that's all that matters.


Finally, I don't think anyone actually needs this feature, but I was able to whip up a proof of concept in an hour that provides it. Anyone who seriously wants to pursue it doesn't have to use my approach, much less my code; it still serves as an existence proof that what they want to do can be done, meaning they should go do it.

From stefano.borini at ferrara.linux.it  Wed Jul  2 00:36:48 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Wed, 02 Jul 2014 00:36:48 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
Message-ID: <53B33800.1030300@ferrara.linux.it>

Dear all,

after the first mailing list feedback, and further private discussion 
with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for 
keyword arguments in indexing. The document is available here.

https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

The document is not in final form when it comes to specifications. In 
fact, it requires additional discussion about the best strategy to 
achieve the desired result. Particular attention has been devoted to 
present alternative implementation strategies, their pros and cons. I 
will examine all feedback tomorrow morning European time (in approx 10 
hrs), and apply any pull requests or comments you may have.

When the specification is finalized, or this community suggests that the 
PEP is in a form suitable for official submission despite potential open 
issues, I will submit it to the editor panel for further discussion, and 
deploy an actual implementation according to the agreed specification 
for a working test run.

I apologize for potential mistakes in the PEP drafting and submission 
process, as this is my first PEP.

Kind Regards,

Stefano Borini

From rosuav at gmail.com  Wed Jul  2 03:06:24 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jul 2014 11:06:24 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <CAPTjJmqnM0eOj9ByBKkM2-eGKPrevV4SQYEbO3bK46bUq+P_qQ@mail.gmail.com>

On Wed, Jul 2, 2014 at 8:36 AM, Stefano Borini
<stefano.borini at ferrara.linux.it> wrote:
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

A good start!

"""
        C0: a[1]        -> idx = 1        # integer
            a[1,2]      -> idx = (1,2)    # tuple
        C1: a[Z=3]      -> idx = {"Z": 3} # dictionary with single key
        C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}     #
dictionary/ordereddict [*]
                        or idx = ({"Z": 3}, {"R": 4}) # tuple of two
single-key dict [**]
...
        C5. a[1, 2, Z=3]   -> idx = (1, 2, {"Z": 3})
"""

Another possibility for the keyword arguments is a two-item tuple,
which would mean that C1 comes up as ("Z", 3) (or maybe (("Z", 3),) -
keyword arguments forcing a tuple of all args for
consistency/clarity), C2 as (("Z", 3), ("R", 4)), and C5 as (1, 2,
("Z", 3)). This would be lighter and easier to use than the tuple of
dicts, and still preserves order (unlike the regular dict); however,
it doesn't let you easily fetch up the one keyword you're interested
in, which is normally something you'd want to support for a
**kwargs-like feature:

def __getitem__(self, item, **kwargs):
    # either that, or kwargs is part of item in some way
    ret = self.base[item]
    if "precis" in kwargs: ret.round(kwargs["precis"])
    return ret

To implement that with a tuple of tuples, or a tuple of dicts, you'd
have to iterate over it and check each one - much less clean code.

I would be inclined to simply state, in the PEP, that keyword
arguments in indexing are equivalent to kwargs in function calls, and
equally unordered (that is to say: if a proposal to make function call
kwargs ordered is accepted, the same consideration can be applied to
this, but otherwise they have no order). This does mean that it
doesn't fit the original use-case, but it seems very odd to start out
by saying "here, let's give indexing the option to carry keyword args,
just like with function calls", and then come back and say "oh, but
unlike function calls, they're inherently ordered and carried very
differently".

For the OP's use-case, though, it would actually be possible to abuse
slice notation. I don't remember this being mentioned, but it does
preserve order; the cost is that all the "keywords" have to be defined
as objects.

class kw: pass # because object() doesn't have attributes
def make_kw(names):
    for i in names.split():
        globals()[i] = obj = kw()
        obj.keyword_arg = i
make_kw("Z R X")

# Now you can use them in indexing
some_obj[5, Z:3]
some_obj[7, Z:3, R:4]

The parameters will arrive in the item tuple as slice objects, where
the start is a signature object and the stop is its value.

>>> some_obj[5, Z:3]
getitem: (5, slice(<__main__.kw object at 0x016C5E10>, 3, None))

Yes, it uses a colon rather than an equals sign, but on the flip side,
it already works :)

ChrisA

From rob.cliffe at btinternet.com  Wed Jul  2 04:36:23 2014
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Wed, 02 Jul 2014 03:36:23 +0100
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <53B37027.9030808@btinternet.com>


On 01/07/2014 23:36, Stefano Borini wrote:
> Dear all,
>
> after the first mailing list feedback, and further private discussion 
> with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for 
> keyword arguments in indexing. The document is available here.
>
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

A small bit of uninformed feedback (no charge :-) ):

1) Ahem, doesn't a[3] (usually) return the *fourth* element of a ?

2)

""" Compare e.g. a[1:3, Z=2] with a.get(slice(1,3,None), Z=2). """

I think this is slightly unfair as the second form can be abbreviated to  a.get(slice(1,3), Z=2),
just as the first is an abbreviation for  a[1:3:None, Z=2].

3) You may not consider this relevant.  But as an (I believe) 
intelligent reader, but one unfamiliar with the material, I cannot 
understand what your first example

""" low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) """

is about, and whether it is really (conceptually) related to indexing, 
or just a slick hack.  I guess it could be anything, depending on the 
implementation of __getitem__.

Best wishes,
Rob Cliffe

From anthony at xtfx.me  Wed Jul  2 04:58:44 2014
From: anthony at xtfx.me (C Anthony Risinger)
Date: Tue, 1 Jul 2014 21:58:44 -0500
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CAPTjJmqnM0eOj9ByBKkM2-eGKPrevV4SQYEbO3bK46bUq+P_qQ@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAPTjJmqnM0eOj9ByBKkM2-eGKPrevV4SQYEbO3bK46bUq+P_qQ@mail.gmail.com>
Message-ID: <CAGAVQTH96ftcD_1XURZsRueAtCjiD6PqUbJUaHvQSzMw52-+_Q@mail.gmail.com>

On Jul 1, 2014 8:06 PM, "Chris Angelico" <rosuav at gmail.com> wrote:
>
> [...]
>
> For the OP's use-case, though, it would actually be possible to abuse
> slice notation. I don't remember this being mentioned, but it does
> preserve order; the cost is that all the "keywords" have to be defined
> as objects.
>
> class kw: pass # because object() doesn't have attributes
> def make_kw(names):
>     for i in names.split():
>         globals()[i] = obj = kw()
>         obj.keyword_arg = i
> make_kw("Z R X")
>
> # Now you can use them in indexing
> some_obj[5, Z:3]
> some_obj[7, Z:3, R:4]
>
> The parameters will arrive in the item tuple as slice objects, where
> the start is a signature object and the stop is its value.
>
> >>> some_obj[5, Z:3]
> getitem: (5, slice(<__main__.kw object at 0x016C5E10>, 3, None))
>
> Yes, it uses a colon rather than an equals sign, but on the flip side,
> it already works :)

This works great, IIRC you can pretty much pass *anything*:

dict[{}:]
dict[AType:lambda x: x]
dict[::]
dict[:]

...don't forget extended slice possibilities :)

I've dabbled with this in custom dict implementations and it usefully
excludes all normal dicts, which quickly reject slice objects.

-- 

C Anthony [mobile]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140701/0efba120/attachment.html>

From ncoghlan at gmail.com  Wed Jul  2 09:06:53 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Jul 2014 00:06:53 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <CADiSq7dF1YSjpfo3H-EXyAomx48dRox6d1J_tGhhgK0d-K=cFA@mail.gmail.com>

On 1 July 2014 15:36, Stefano Borini <stefano.borini at ferrara.linux.it> wrote:
> Dear all,
>
> after the first mailing list feedback, and further private discussion with
> Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for keyword
> arguments in indexing. The document is available here.
>
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
>
> The document is not in final form when it comes to specifications. In fact,
> it requires additional discussion about the best strategy to achieve the
> desired result. Particular attention has been devoted to present alternative
> implementation strategies, their pros and cons. I will examine all feedback
> tomorrow morning European time (in approx 10 hrs), and apply any pull
> requests or comments you may have.

It's a well written PEP, but the "just use call notation instead"
argument is going to be a challenging one to overcome.

Given that part of the rationale given is that "slice(start, stop,
step)" is uglier than the "start:stop:step" permitted in an indexing
operation, the option of allowing "[start:]",
"[:stop]","[start:stop:step]", etc as dedicated slice syntax should
also be explicitly considered.

Compare:

    a.get(slice(1,3), Z=2) # today
    a.get([1:3], Z=2) # slice sytax
    a[1:3, Z=2] # PEP

Introducing a more general slice notation would make indexing *less*
special (reducing the current "allows slice notation" special case to
"allows slice notation with the surrounding square brackets implied".

The reduction of special casing could be taken further, by allowing
the surrounding square brackets to be omitted in tuple and list
displays, just as they are in indexing operations.

I'm not saying such a proposal would necessarily be accepted - I just
see a proposal that takes an existing special case and proposes to
make it *less* special as more appealing than one that proposes to
make it even *more* special.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From nicholas.cole at gmail.com  Wed Jul  2 09:45:47 2014
From: nicholas.cole at gmail.com (Nicholas Cole)
Date: Wed, 2 Jul 2014 08:45:47 +0100
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CADiSq7dF1YSjpfo3H-EXyAomx48dRox6d1J_tGhhgK0d-K=cFA@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CADiSq7dF1YSjpfo3H-EXyAomx48dRox6d1J_tGhhgK0d-K=cFA@mail.gmail.com>
Message-ID: <CAAu18hen2sCsDkhTBRHLUtmoH4MF_9-F3GLb9hCASS_7tPc2+w@mail.gmail.com>

> It's a well written PEP, but the "just use call notation instead"
> argument is going to be a challenging one to overcome.
>

+1

The advantages the PEP suggests are very subjective ones to do with
readability.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140702/67ee578b/attachment.html>

From stefano.borini at ferrara.linux.it  Wed Jul  2 09:59:54 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Wed, 2 Jul 2014 09:59:54 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <CAAu18hen2sCsDkhTBRHLUtmoH4MF_9-F3GLb9hCASS_7tPc2+w@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CADiSq7dF1YSjpfo3H-EXyAomx48dRox6d1J_tGhhgK0d-K=cFA@mail.gmail.com>
 <CAAu18hen2sCsDkhTBRHLUtmoH4MF_9-F3GLb9hCASS_7tPc2+w@mail.gmail.com>
Message-ID: <20140702075954.GA26500@ferrara.linux.it>

On Wed, Jul 02, 2014 at 08:45:47AM +0100, Nicholas Cole wrote:
> > It's a well written PEP, but the "just use call notation instead"
> > argument is going to be a challenging one to overcome.
> >
> 
> +1
> 
> The advantages the PEP suggests are very subjective ones to do with
> readability.

I want to be honest, I agree with this point of view myself. it's not _needed_.
it would be a nice additional feature but maybe only rarely used and in very
specialized cases, and again, there are always workarounds.

Even if rejected on the long run, it rationalizes and analyzes motivations and
alternatives, and enshrines them formally on why it's a "not worth it"
scenario. 

Thank you for all the feedback. I am including all the raised points in the PEP
and I'll follow up with a revised version ASAP.

Stefano Borini

From stefano.borini at ferrara.linux.it  Wed Jul  2 12:08:25 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Wed, 2 Jul 2014 12:08:25 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <53B37027.9030808@btinternet.com>
References: <53B33800.1030300@ferrara.linux.it>
 <53B37027.9030808@btinternet.com>
Message-ID: <20140702100825.GA29589@ferrara.linux.it>

On Wed, Jul 02, 2014 at 03:36:23AM +0100, Rob Cliffe wrote:
> 1) Ahem, doesn't a[3] (usually) return the *fourth* element of a ?

Yes. I changed the indexes many times for consistency and that slipped
through. It used to be a[2]

>
> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3]) """
>
> is about, and whether it is really (conceptually) related to indexing,  
> or just a slick hack.  I guess it could be anything, depending on the  
> implementation of __getitem__.

The reason behind an indexing is that the BasisSet object could be internally 
represented as a numeric table, where rows are associated to individual elements 
(e.g. row 0:5 to element 1, row 5:8 to element 2) and each column is associated 
to a given degree of accuracy (e.g. first column is low accuracy, second column 
is medium accuracy etc). You could say that users are not concerned with the
internal representation, but if they are eventually allowed to create these
basis sets in this tabular form, it makes a nice conceptual model to keep the
association column <-> accuracy and keep it explicit in the interface.

From xavier.combelle at gmail.com  Wed Jul  2 13:47:03 2014
From: xavier.combelle at gmail.com (Xavier Combelle)
Date: Wed, 2 Jul 2014 13:47:03 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140702100825.GA29589@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <53B37027.9030808@btinternet.com>
 <20140702100825.GA29589@ferrara.linux.it>
Message-ID: <CAEQcUJTzUTaKesYKbptHQ2MbmUQBkncbr4T44=CUeN=RpanRnA@mail.gmail.com>

in this case:

        C1: a[Z=3]      -> idx = {"Z": 3}             # P1/P2
dictionary with single key


as we can index with any object, I wonder how one could differency between
the calls, a[z=3]
and the actual a[{"Z":3}]. Do they should be return the same?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140702/1ce1cf23/attachment.html>

From stefano.borini at ferrara.linux.it  Wed Jul  2 14:20:03 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Wed, 2 Jul 2014 14:20:03 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <CAEQcUJTzUTaKesYKbptHQ2MbmUQBkncbr4T44=CUeN=RpanRnA@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <53B37027.9030808@btinternet.com> <20140702100825.GA29589@ferrara.linux.it>
 <CAEQcUJTzUTaKesYKbptHQ2MbmUQBkncbr4T44=CUeN=RpanRnA@mail.gmail.com>
Message-ID: <20140702122003.GA2183@ferrara.linux.it>

On Wed, Jul 02, 2014 at 01:47:03PM +0200, Xavier Combelle wrote:
> in this case:
> 
>         C1: a[Z=3]      -> idx = {"Z": 3}             # P1/P2
> dictionary with single key
> 
> 
> as we can index with any object, I wonder how one could differency between
> the calls, a[z=3]
> and the actual a[{"Z":3}]. Do they should be return the same?

indeed you can't, and if I recall correctly I wrote it somewhere. The point is eventually 
if such distinction is worth considering or if, instead, the two cases should be handled
as degenerate (equivalent) notations. 

IMHO, they should be kept distinct, and this disqualifies that implementation strategy.
Too much magic would happen otherwise.


-- 
------------------------------------------------------------

-----BEGIN GEEK CODE BLOCK----- 
Version: 3.12         
GCS d- s+:--- a? C++++ UL++++ P+ L++++ E--- W- N+ o K- w---
O+ M- V- PS+ PE+ Y PGP++ t+++ 5 X- R* tv+ b DI-- D+
G e h++ r+ y*
------------------------------------------------------------


From 4kir4.1i at gmail.com  Wed Jul  2 17:14:47 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Wed, 02 Jul 2014 19:14:47 +0400
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <878uobn66w.fsf@gmail.com>

Stefano Borini <stefano.borini at ferrara.linux.it>
writes:

> Dear all,
>
> after the first mailing list feedback, and further private discussion
> with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for
> keyword arguments in indexing. The document is available here.
>
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
>
> The document is not in final form when it comes to specifications. In
> fact, it requires additional discussion about the best strategy to
> achieve the desired result. Particular attention has been devoted to
> present alternative implementation strategies, their pros and cons. I
> will examine all feedback tomorrow morning European time (in approx 10
> hrs), and apply any pull requests or comments you may have.
>
> When the specification is finalized, or this community suggests that
> the PEP is in a form suitable for official submission despite
> potential open issues, I will submit it to the editor panel for
> further discussion, and deploy an actual implementation according to
> the agreed specification for a working test run.
>
> I apologize for potential mistakes in the PEP drafting and submission
> process, as this is my first PEP.
>

Strategy 3b: builtin named tuple

  C0. a[2] -> idx = 2; # scalar
      a[2,3] -> idx = (2, 3) # tuple
                idx[0] == 2
                idx[1] == 3
  C1. a[Z=3] -> idx = (Z=3)  # builtin named tuple (pickable, etc)
                idx[0] == idx.Z == 3 
  C2. a[Z=3, R=2] -> idx = (Z=3, R=2)
                     idx[0] == idx.Z == 3
                     idx[1] == idx.R == 2
  C3. a[1, Z=3] -> idx = (1, Z=3) 
                   idx[0] == 1
                   idx[1] == idx.Z == 3
  C4. a[1, Z=3, R=2] -> idx = (1, Z=3, R=2)
                        idx[0] == 1
                        idx[1] == idx.Z == 3
                        idx[2] == idx.R == 2
  C5. a[1, 2, Z=3] -> idx = (1, 2, Z=3)
  C6. a[1, 2, Z=3, R=4] -> (1, 2, Z=3, R=4)
  C7. a[1, Z=3, 2, R=4] -> SyntaxError: non-keyword arg after keyword arg

Pros:

 - looks nice
 - easy to explain: a[1,b=2] is equivalent to a[(1,b=2)] like a[1,2] is
   equivalent to a[(1,2)]
 - it makes `__getitem__` *less special* if Python supports a builtin
   named tuple and/or ordered keyword args (the call syntax)

Cons:

 - Python currently has no builtin named tuple (an ordered collection of
   named (optionally) values) 
 - Python currently doesn't support ordered keyword args (it might have
   made the implementation trivial)

Note: `idx = (Z=3)` is a SyntaxError so it is safe to produce a named tuple
instead of a scalar.


--
Akira


From drekin at gmail.com  Wed Jul  2 18:40:43 2014
From: drekin at gmail.com (drekin at gmail.com)
Date: Wed, 02 Jul 2014 09:40:43 -0700 (PDT)
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
Message-ID: <53b4360b.ebd3b40a.33ca.5890@mx.google.com>

Hello, just some remarks:

Ad degeneracy of notation: The case of a[Z=3] and a[{"Z": 3}] is similar to current a[1, 2] and a[(1, 2)]. Even though one may argue that the parentheses are actually not part of tuple notation but are just needed because of syntax, it may look as degeneracy of notation when compared to function call: f(1, 2) is not the same thing as f((1, 2)).

Ad making dict.get() obsolete: There is still often used a_dict.get(key) which has to be spelled a_dict[key, default=None] with index notation.

The _n keys used in strategy 3 may be indexed from zero like list indices.

Regards, Drekin



From joseph.martinot-lagarde at m4x.org  Wed Jul  2 21:17:15 2014
From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde)
Date: Wed, 02 Jul 2014 21:17:15 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CAAu18hen2sCsDkhTBRHLUtmoH4MF_9-F3GLb9hCASS_7tPc2+w@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CADiSq7dF1YSjpfo3H-EXyAomx48dRox6d1J_tGhhgK0d-K=cFA@mail.gmail.com>
 <CAAu18hen2sCsDkhTBRHLUtmoH4MF_9-F3GLb9hCASS_7tPc2+w@mail.gmail.com>
Message-ID: <53B45ABB.30802@m4x.org>

Le 02/07/2014 09:45, Nicholas Cole a ?crit :
>
>     It's a well written PEP, but the "just use call notation instead"
>     argument is going to be a challenging one to overcome.
>
>
> +1
>
> The advantages the PEP suggests are very subjective ones to do with
> readability.

Well, "Readability counts" is in the zen of python !

Having recently translated a Matlab program to python, I can assure you 
that the notation difference between call and indexing is really useful.
.get() does not looks like indexing.


From timothy.c.delaney at gmail.com  Wed Jul  2 22:12:30 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 3 Jul 2014 06:12:30 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>

On 2 July 2014 08:36, Stefano Borini <stefano.borini at ferrara.linux.it>
wrote:

> Dear all,
>
> after the first mailing list feedback, and further private discussion with
> Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for keyword
> arguments in indexing. The document is available here.
>
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
>
> The document is not in final form when it comes to specifications. In
> fact, it requires additional discussion about the best strategy to achieve
> the desired result. Particular attention has been devoted to present
> alternative implementation strategies, their pros and cons. I will examine
> all feedback tomorrow morning European time (in approx 10 hrs), and apply
> any pull requests or comments you may have.
>
> When the specification is finalized, or this community suggests that the
> PEP is in a form suitable for official submission despite potential open
> issues, I will submit it to the editor panel for further discussion, and
> deploy an actual implementation according to the agreed specification for a
> working test run.
>
> I apologize for potential mistakes in the PEP drafting and submission
> process, as this is my first PEP.
>

One option I don't see is to have a[b=1, c=2] be translated to
a.__getitem__((slice('b', 1, None), slice['c', 2, None)) automatically.
That completely takes care of backwards compatibility in __getitem__ (no
change at all), and also deals with your issue with abusing slice objects:

a[K=1:10:2] -> a.__getitem__(slice('K', slice(1, 10, 2)))

And using that we can have an ordered dict "literal"

class OrderedDictLiteral(object):
    def __getitem__(self, t):
        try:
            i = iter(t)
        except TypeError:
            i = (t,)

        return collections.OrderedDict((s.start, s.stop) for s in i)

odict = OrderedDictLiteral()

o = odict[a=1, b='c']
print(o)  # prints OrderedDict([('a', 1), ('b', 'c')])

On a related note, if we combined this with the idea that kwargs should be
constructed using the type of the passed dict (i.e. if you pass an
OrderedDict as **kwargs you get a new OrderedDict in the function) we could
do:

kw = OrderedDictLiteral()

def f(**kw):
    print(kw)

f('a', 'b', **kw[c='d', e=2])

always resulting in:

{'c': 'd', 'e': 2}

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/8205a344/attachment-0001.html>

From timothy.c.delaney at gmail.com  Wed Jul  2 22:14:00 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 3 Jul 2014 06:14:00 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
Message-ID: <CAN8CLg=xLhbwWGBQKwNtGemZ9ZvucAbFVzbf7AFkC38HvdvNkQ@mail.gmail.com>

On 3 July 2014 06:12, Tim Delaney <timothy.c.delaney at gmail.com> wrote:
>
> a[K=1:10:2] -> a.__getitem__(slice('K', slice(1, 10, 2)))
>

Of course, that should have been:

a[K=1:10:2] -> a.__getitem__(slice('K', slice(1, 10, 2), None))

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/69f2b9be/attachment.html>

From stefano.borini at ferrara.linux.it  Wed Jul  2 23:29:53 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Wed, 2 Jul 2014 23:29:53 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
Message-ID: <20140702212953.GA16637@ferrara.linux.it>

On Thu, Jul 03, 2014 at 06:12:30AM +1000, Tim Delaney wrote:
> One option I don't see is to have a[b=1, c=2] be translated to
> a.__getitem__((slice('b', 1, None), slice['c', 2, None)) automatically.

it would be weird, since it's not technically a slice, but it would work.
I personally think that piggybacking on the slice would appear hackish.
One could eventually think to have a keyword() object similar to slice(), 
but then it's basically a single item dictionary (Strategy 1) with a fancy
name.




-- 
------------------------------------------------------------

-----BEGIN GEEK CODE BLOCK----- 
Version: 3.12         
GCS d- s+:--- a? C++++ UL++++ P+ L++++ E--- W- N+ o K- w---
O+ M- V- PS+ PE+ Y PGP++ t+++ 5 X- R* tv+ b DI-- D+
G e h++ r+ y*
------------------------------------------------------------


From timothy.c.delaney at gmail.com  Thu Jul  3 01:10:18 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 3 Jul 2014 09:10:18 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140702212953.GA16637@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
Message-ID: <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>

On 3 July 2014 07:29, Stefano Borini <stefano.borini at ferrara.linux.it>
wrote:

> On Thu, Jul 03, 2014 at 06:12:30AM +1000, Tim Delaney wrote:
> > One option I don't see is to have a[b=1, c=2] be translated to
> > a.__getitem__((slice('b', 1, None), slice['c', 2, None)) automatically.
>
> it would be weird, since it's not technically a slice, but it would work.
> I personally think that piggybacking on the slice would appear hackish.
> One could eventually think to have a keyword() object similar to slice(),
> but then it's basically a single item dictionary (Strategy 1) with a fancy
> name.


I really do think that a[b=c, d=e] should just be syntax sugar for a['b':c,
'd':e]. It's simple to explain, and gives the greatest backwards
compatibility. In particular, libraries that already abused slices in this
way will just continue to work with the new syntax.

I'd maybe thought a subclass of slice, with .key (= .start) and and .value
(= .stop) variables would work, but slice isn't subclassable so it would be
a bit more difficult. That would also be backwards-compatible with existing
__getitem__ that used slice, but would preclude people calling that
__getitem__ with slice syntax, which I personally don't think is
desireable. Instead, maybe recommend something like:

ordereddict = OrderedDictLiteral()  # using the definition from previous
email

class GetItemByName(object):
    def __getitem__(self, t):
        # convert the parameters to a dictionary
        d = ordereddict[t]
        return d['name']

Hmm - here's an anonymous named tuple "literal" as another example:

class AnonymousNamedTuple(object):
    def __getitem__(self, t):
        d = ordereddict[t]
        t = collections.namedtuple('_', d)
        return t(*d.values())

namedtuple = AnonymousNamedTuple()
print(namedtuple[a='b', c=1])  # _(a='b', c=1)

As you can see, I'm in favour of keeping the order of the keyword arguments
to the index - losing it would prevent things like the above.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/db8fccb3/attachment.html>

From ethan at stoneleaf.us  Thu Jul  3 01:40:39 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 02 Jul 2014 16:40:39 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
Message-ID: <53B49877.6090304@stoneleaf.us>

On 07/02/2014 04:10 PM, Tim Delaney wrote:
>
> I really do think that a[b=c, d=e] should just be syntax sugar for a['b':c, 'd':e]. It's simple to explain, and gives
> the greatest backwards compatibility. In particular, libraries that already abused slices in this way will just continue
> to work with the new syntax.

+0.5 for keywords in __getitem__

+1 for this version of it

~Ethan~

From bruce at leapyear.org  Thu Jul  3 09:37:45 2014
From: bruce at leapyear.org (Bruce Leban)
Date: Thu, 3 Jul 2014 00:37:45 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B49877.6090304@stoneleaf.us>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
Message-ID: <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>

On Wed, Jul 2, 2014 at 4:40 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> On 07/02/2014 04:10 PM, Tim Delaney wrote:
>
>>
>> I really do think that a[b=c, d=e] should just be syntax sugar for
>> a['b':c, 'd':e]. It's simple to explain, and gives
>> the greatest backwards compatibility. In particular, libraries that
>> already abused slices in this way will just continue
>> to work with the new syntax.
>>
>
> +0.5 for keywords in __getitem__
>
> +1 for this version of it



If there weren't already abuse of slices for this purpose, would this be
the first choice? I think not. This kind of abuse makes it more likely that
there will be mysterious failures when someone tries to use keyword
indexing for objects that don't support it. In contrast, using kwargs means
you'll get an immediate meaningful exception.

Tangentially, I think the PEP can reasonably reserve the keyword argument
name 'default' for default values specifying that while __getitem__ methods
do not need to support default, they should not use that keyword for any
other purpose.

Also, the draft does not explain why you would not allow defining
__getitem__(self, idx, x=1, y=2) rather than only supporting the kwargs
form. I don't know if I think it should or shouldn't at this point, but it
definitely think it need to be discussed and justified one way or the other.

--- Bruce
Learn how hackers think: http://j.mp/gruyere-security
https://www.linkedin.com/in/bruceleban
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/55ddb744/attachment-0001.html>

From stefano.borini at ferrara.linux.it  Thu Jul  3 19:00:36 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Thu, 3 Jul 2014 19:00:36 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
Message-ID: <20140703170036.GA13843@ferrara.linux.it>

On Thu, Jul 03, 2014 at 09:10:18AM +1000, Tim Delaney wrote:
> I really do think that a[b=c, d=e] should just be syntax sugar for a['b':c,
> 'd':e]. It's simple to explain, and gives the greatest backwards
> compatibility

This is indeed a point, as the initialization for a dictionary looks very, very similar,
however, it would definitely collide with the slice object. At the very least, it would be
confusing.

> In particular, libraries that already abused slices in this
> way will just continue to work with the new syntax.

Are there any actual examples in the wild of this behavior?


From stefano.borini at ferrara.linux.it  Thu Jul  3 19:15:09 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Thu, 3 Jul 2014 19:15:09 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <20140703171509.GB13843@ferrara.linux.it>

On Wed, Jul 02, 2014 at 12:36:48AM +0200, Stefano Borini wrote:
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

I committed and pushed the most recent changes and they are now available.
Some points have been clarified and expanded. Also, there's a new section about
C interface compatibility. Please check the diffs for tracking the changes.

Tonight I will comb the document and the thread again, further distilling the
current hot spots.


From shoyer at gmail.com  Thu Jul  3 19:57:48 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 3 Jul 2014 10:57:48 -0700 (PDT)
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
Message-ID: <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>

 don't have strong opinions about the implementation, but I am strongly 
supportive of this PEP for the second case it lists -- the ability to index 
an a multi-dimensional array by axis name or label instead of position.

Why? Suppose you're working with high dimensional data, where arrays may 
have any number of axes such as time, x, y and z. I work with this sort of 
data every day, as do many scientists.

It is awkward and error prone to use the existing __getitem__ and 
__setitem__ syntax, because it's difficult to reliably keep track of axis 
order with this many indices:

a[:, :, 0:10]

vs.

a[y=0:10]

Keyword getitem syntax should be encouraged for the same reasons that 
keyword arguments are often preferable to positional arguments: it is both 
explicit (no implicit reliance on axis order), and more flexible (the same 
code will work on arrays with transposed or altered axes). This is 
particularly important because it is typical to be working with arrays that 
use some but not all the same axes.

A method does allow for an explicit (if verbose) alternative to __getitem__ 
syntax:

a.getitem(y=slice(0, 10))

But it's worse for __setitem__:

a.setitem(dict(y=slice(0, 10)), 0)

vs.

a[y=0:10] = 0

------------

Another issue: The PEP should address whether expressions with slice 
abbreviations like the following should be valid syntax:

a[x=:, y=:5, z=::-1]

These look pretty strange (=: looks like a form of assign), but the 
functionality would certainly be nice to support in some way.

Surrounding the indices with [] might help:

a[x=[:], y=[:5], z=[::-1]]

-------------


On Thursday, July 3, 2014 12:39:09 AM UTC-7, Bruce Leban wrote:
>
> Tangentially, I think the PEP can reasonably reserve the keyword argument 
> name 'default' for default values specifying that while __getitem__ methods 
> do not need to support default, they should not use that keyword for any 
> other purpose.
>

-1 from me. The existing get method handles this case pretty well, with 
fewer keystrokes than the keyword only "default" index (as I think has 
already been pointed out).

In my opinion, case 1 (labeled indices for a physics DSL) and case 2 
(labeled indices to removed ambiguity) are basically the same, and the only 
use-cases that should be encouraged. Labeling tensor indices with names in 
mathematical notation is standard for precisely the same reasons that it's 
a good idea for Python.

Best,
Stephan

(note: apologies for any redundant messages, I tried sending this message 
from the google groups mirror before I signed up, which didn't go out to 
the main listing list)

>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/50ad555c/attachment.html>

From stefano.borini at ferrara.linux.it  Thu Jul  3 20:30:59 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Thu, 3 Jul 2014 20:30:59 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <20140703171509.GB13843@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140703171509.GB13843@ferrara.linux.it>
Message-ID: <20140703183059.GC13843@ferrara.linux.it>

On Thu, Jul 03, 2014 at 07:15:09PM +0200, Stefano Borini wrote:
> On Wed, Jul 02, 2014 at 12:36:48AM +0200, Stefano Borini wrote:
> > https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
> 
> I committed and pushed the most recent changes and they are now available.
> Some points have been clarified and expanded. Also, there's a new section about
> C interface compatibility. Please check the diffs for tracking the changes.

Forgot: I also added a possibility P4 for the first strategy: keyword
(alternative name "keyindex") which was proposed in the thread.
This solution would look rather neat

>>> a[3]
3
>>> a[3:1]
slice(3, 1, None)
>>> a[slice(3,1,None)]         # <- Note how this notation is a long and equivalent form of the
slice(3, 1, None)              #    syntactic sugar above
>>> a[z=4]                     # <- Again, note how this notation would be a syntactic sugar 
keyindex("z", 4)               #    for a[keyindex("z", 4)]
>>> a[z=1:5:2]                 # <- Supports slices too.
keyindex("z", slice(1,5,2))    #    No ambiguity with dictionaries, and C compatibility is 
                               #    straightforward
>>> keyindex("z", 4).key
"z"
                            
Another thing I observed is that the point of indexing operation is indexing,
and a keyed _index_ is not the same thing as a keyed _option_ during an
indexing operation. This has been stated during the thread but it's worth to
point out explicitly in the PEPi (it isn't). Using it for options such as
default would technically be a misuse, but an acceptable one for... broad
definitions of indexing.

The keyindex object could be made to implement the same interface as its value
through forwarding, so it can behave just as its value if your logic cares only about
position, and not key

>>> keyindex("z", 4) + 1
5

Another rationalization: current indexing has only one degree of freedom, that
is: positioning. Add keywords and now there are two degrees of freedom: position
and key. How are these two degrees of freedom supposed to interact?



From stefano.borini at ferrara.linux.it  Thu Jul  3 21:33:56 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Thu, 3 Jul 2014 21:33:56 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
Message-ID: <20140703193356.GD13843@ferrara.linux.it>

On Thu, Jul 03, 2014 at 10:57:48AM -0700, Stephan Hoyer wrote:
>  don't have strong opinions about the implementation, but I am strongly 
> supportive of this PEP for the second case it lists -- the ability to index 
> an a multi-dimensional array by axis name or label instead of position.

thinking aloud.
The biggest problem is that there's no way of specifying which labels the
object supports, and therefore no way of binding a specified keyword, unless
the __getitem__ signature is deeply altered.

> It is awkward and error prone to use the existing __getitem__ and 
> __setitem__ syntax, because it's difficult to reliably keep track of axis 
> order with this many indices:
> 
> a[:, :, 0:10]
> 
> vs.
> 
> a[y=0:10]

This is indeed an important use case. I should probably stress it more in the
PEP.

> Another issue: The PEP should address whether expressions with slice 
> abbreviations like the following should be valid syntax:
> 
> a[x=:, y=:5, z=::-1]

looks ugly indeed

> Surrounding the indices with [] might help:
> 
> a[x=[:], y=[:5], z=[::-1]]

better, but unusual

> -1 from me. The existing get method handles this case pretty well, with 
> fewer keystrokes than the keyword only "default" index (as I think has 
> already been pointed out).
> 
> In my opinion, case 1 (labeled indices for a physics DSL) and case 2 
> (labeled indices to removed ambiguity) are basically the same, and the only 
> use-cases that should be encouraged. Labeling tensor indices with names in 
> mathematical notation is standard for precisely the same reasons that it's 
> a good idea for Python.

Meaning dropping the use of keyword indexing for "options" use cases.


From shoyer at gmail.com  Thu Jul  3 21:43:59 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 3 Jul 2014 12:43:59 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140703193356.GD13843@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
Message-ID: <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>

On Thu, Jul 3, 2014 at 12:33 PM, Stefano Borini <
stefano.borini at ferrara.linux.it> wrote:
>
> thinking aloud.
> The biggest problem is that there's no way of specifying which labels the
> object supports, and therefore no way of binding a specified keyword,
> unless
> the __getitem__ signature is deeply altered.


I don't I follow you here. The object itself handles the __getitem__ logic
in whatever way it sees fit, and it would be up to it to raise KeyError
when an invalid label is supplied, much like the current situation with
invalid keys.

Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/a289d8ee/attachment-0001.html>

From stefano.borini at ferrara.linux.it  Thu Jul  3 21:59:11 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Thu, 3 Jul 2014 21:59:11 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
Message-ID: <20140703195911.GE13843@ferrara.linux.it>

On Thu, Jul 03, 2014 at 12:43:59PM -0700, Stephan Hoyer wrote:
> On Thu, Jul 3, 2014 at 12:33 PM, Stefano Borini <
> stefano.borini at ferrara.linux.it> wrote:
> >
> > thinking aloud.
> > The biggest problem is that there's no way of specifying which labels the
> > object supports, and therefore no way of binding a specified keyword,
> > unless
> > the __getitem__ signature is deeply altered.
> 
> 
> I don't I follow you here. The object itself handles the __getitem__ logic
> in whatever way it sees fit, and it would be up to it to raise KeyError
> when an invalid label is supplied, much like the current situation with
> invalid keys.

NB: Still thinking aloud here...

True, but the problem is that in a function

def foo(x,y,z): pass

calling the following will give the exact same result

foo(1,2,3)
foo(x=1, y=2, z=3)
foo(z=3, x=1, y=2)

this happens because at function definition you can specify the argument names.
with __getitem__ you can't explain this binding. its current form precludes it

__getitem__(self, idx)

if you use a[1,2,3], you have no way of saying that "the first index is called x", so you have
no way for these two to be equivalent in a similar way a function does

a[1,2,3]
a[z=3, x=1, y=2]

unless you allow getitem in the form

__getitem__(self, x, y, z)

which I feel it would be a wasps' nest in terms of backward compatibility, both at the 
python and C level. I doubt this would fly.

So if you want to keep __getitem__ signature unchanged, you will have to map labels
to positions manuallyi inside __getitem__, a potentially complex task. Not even
strategy 3 (namedtuple) would solve this issue. 



From shoyer at gmail.com  Thu Jul  3 22:20:45 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 3 Jul 2014 13:20:45 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140703195911.GE13843@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
 <20140703195911.GE13843@ferrara.linux.it>
Message-ID: <CAEQ_Tvf0hsKYP32EVF4YGrZaJPOJydWcxtg_vqm9Q_N7tfM4SA@mail.gmail.com>

On Thu, Jul 3, 2014 at 12:59 PM, Stefano Borini <
stefano.borini at ferrara.linux.it> wrote:

> So if you want to keep __getitem__ signature unchanged, you will have to
> map labels
> to positions manuallyi inside __getitem__, a potentially complex task. Not
> even
> strategy 3 (namedtuple) would solve this issue.
>

Yes, this is true. However, in practice many implementations of labeled
arrays would have generic labeled axes, so they would need to use their own
logic to do the mapping in __getitem__ anyways.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/273316d5/attachment.html>

From sturla.molden at gmail.com  Thu Jul  3 22:48:06 2014
From: sturla.molden at gmail.com (Sturla Molden)
Date: Thu, 3 Jul 2014 20:48:06 +0000 (UTC)
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
References: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
 <20140703195911.GE13843@ferrara.linux.it>
 <CAEQ_Tvf0hsKYP32EVF4YGrZaJPOJydWcxtg_vqm9Q_N7tfM4SA@mail.gmail.com>
Message-ID: <110367949426113052.835097sturla.molden-gmail.com@news.gmane.org>

Stephan Hoyer <shoyer at gmail.com> wrote:

> Yes, this is true. However, in practice many implementations of labeled
> arrays would have generic labeled axes, so they would need to use their own
> logic to do the mapping in __getitem__ anyways.

If you are thiniking about Pandas, then each keyword should be allowed to
take a slice as well.

dataframe[apples=1:3, oranges=2:6]


Sturla


From shoyer at gmail.com  Thu Jul  3 23:00:20 2014
From: shoyer at gmail.com (Stephan Hoyer)
Date: Thu, 3 Jul 2014 14:00:20 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <110367949426113052.835097sturla.molden-gmail.com@news.gmane.org>
References: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
 <20140703195911.GE13843@ferrara.linux.it>
 <CAEQ_Tvf0hsKYP32EVF4YGrZaJPOJydWcxtg_vqm9Q_N7tfM4SA@mail.gmail.com>
 <110367949426113052.835097sturla.molden-gmail.com@news.gmane.org>
Message-ID: <CAEQ_TvcTUtXNGK3rvVGPX+8H6YMM6UHqy817ODHKL61hVxN+XQ@mail.gmail.com>

On Thu, Jul 3, 2014 at 1:48 PM, Sturla Molden <sturla.molden at gmail.com>
wrote:

> Stephan Hoyer <shoyer at gmail.com> wrote:
>
> > Yes, this is true. However, in practice many implementations of labeled
> > arrays would have generic labeled axes, so they would need to use their
> own
> > logic to do the mapping in __getitem__ anyways.
>
> If you are thiniking about Pandas, then each keyword should be allowed to
> take a slice as well.
>
> dataframe[apples=1:3, oranges=2:6]
>

Yes, I am indeed thinking about pandas and other similar libraries.
Supporting slices with keywords would be essential.

Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140703/43c47e9b/attachment.html>

From ncoghlan at gmail.com  Thu Jul  3 23:48:10 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 3 Jul 2014 14:48:10 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CAEQ_TvcTUtXNGK3rvVGPX+8H6YMM6UHqy817ODHKL61hVxN+XQ@mail.gmail.com>
References: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
 <20140703195911.GE13843@ferrara.linux.it>
 <CAEQ_Tvf0hsKYP32EVF4YGrZaJPOJydWcxtg_vqm9Q_N7tfM4SA@mail.gmail.com>
 <110367949426113052.835097sturla.molden-gmail.com@news.gmane.org>
 <CAEQ_TvcTUtXNGK3rvVGPX+8H6YMM6UHqy817ODHKL61hVxN+XQ@mail.gmail.com>
Message-ID: <CADiSq7cDo-eD0edGDGGvyMu9Hmc2d-Y0599xcq6D9XUwLgicRg@mail.gmail.com>

On 3 July 2014 14:00, Stephan Hoyer <shoyer at gmail.com> wrote:
> On Thu, Jul 3, 2014 at 1:48 PM, Sturla Molden <sturla.molden at gmail.com>
> wrote:
>>
>> Stephan Hoyer <shoyer at gmail.com> wrote:
>>
>> > Yes, this is true. However, in practice many implementations of labeled
>> > arrays would have generic labeled axes, so they would need to use their
>> > own
>> > logic to do the mapping in __getitem__ anyways.
>>
>> If you are thiniking about Pandas, then each keyword should be allowed to
>> take a slice as well.
>>
>> dataframe[apples=1:3, oranges=2:6]
>
>
> Yes, I am indeed thinking about pandas and other similar libraries.
> Supporting slices with keywords would be essential.

Some more concrete pandas-based examples could definitely help make a
more compelling case. I genuinely think the hard part here is to make
the case for offering the feature *at all*, so adding a "here is
current real world pandas based code" and "here is how this PEP could
make that code more readable" example could be worthwhile.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From stefano.borini at ferrara.linux.it  Fri Jul  4 08:25:13 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 04 Jul 2014 08:25:13 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <CADiSq7cDo-eD0edGDGGvyMu9Hmc2d-Y0599xcq6D9XUwLgicRg@mail.gmail.com>
References: <CAN8CLg=t6Ut6QUg6cZrRm-tLPcLzAAhSBjLM+_oCD8G=N+AJ0A@mail.gmail.com>
 <20140702212953.GA16637@ferrara.linux.it>
 <CAN8CLgk+3CkjNz+CbsS26WReQrh24m4tmdvvJ0Bod7POY5BRyw@mail.gmail.com>
 <53B49877.6090304@stoneleaf.us>
 <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
 <20140703195911.GE13843@ferrara.linux.it>
 <CAEQ_Tvf0hsKYP32EVF4YGrZaJPOJydWcxtg_vqm9Q_N7tfM4SA@mail.gmail.com>
 <110367949426113052.835097sturla.molden-gmail.com@news.gmane.org>
 <CAEQ_TvcTUtXNGK3rvVGPX+8H6YMM6UHqy817ODHKL61hVxN+XQ@mail.gmail.com>
 <CADiSq7cDo-eD0edGDGGvyMu9Hmc2d-Y0599xcq6D9XUwLgicRg@mail.gmail.com>
Message-ID: <53B648C9.1090907@ferrara.linux.it>

On 7/3/14 11:48 PM, Nick Coghlan wrote:
> Some more concrete pandas-based examples could definitely help make a
> more compelling case. I genuinely think the hard part here is to make
> the case for offering the feature *at all*, so adding a "here is
> current real world pandas based code" and "here is how this PEP could
> make that code more readable" example could be worthwhile.

I agree. I will examine pandas this evening for more context.


From j.wielicki at sotecware.net  Fri Jul  4 10:21:53 2014
From: j.wielicki at sotecware.net (Jonas Wielicki)
Date: Fri, 04 Jul 2014 10:21:53 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
 arguments
In-Reply-To: <20140703183059.GC13843@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140703171509.GB13843@ferrara.linux.it>
 <20140703183059.GC13843@ferrara.linux.it>
Message-ID: <53B66421.80902@sotecware.net>

On 03.07.2014 20:30, Stefano Borini wrote:
> The keyindex object could be made to implement the same interface as its value
> through forwarding, so it can behave just as its value if your logic cares only about
> position, and not key
> 
>>>> keyindex("z", 4) + 1
> 5
> 

What about a value which has a .key attribute?

regards,
jwi


From stefano.borini at ferrara.linux.it  Fri Jul  4 11:20:50 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 4 Jul 2014 11:20:50 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <53B66421.80902@sotecware.net>
References: <53B33800.1030300@ferrara.linux.it>
 <20140703171509.GB13843@ferrara.linux.it>
 <20140703183059.GC13843@ferrara.linux.it> <53B66421.80902@sotecware.net>
Message-ID: <20140704092050.GA8507@ferrara.linux.it>

On Fri, Jul 04, 2014 at 10:21:53AM +0200, Jonas Wielicki wrote:
> On 03.07.2014 20:30, Stefano Borini wrote:
> > The keyindex object could be made to implement the same interface as its value
> > through forwarding, so it can behave just as its value if your logic cares only about
> > position, and not key
> > 
> >>>> keyindex("z", 4) + 1
> > 5
> > 
> 
> What about a value which has a .key attribute?

that would have to be added, and unless you copy the passed index it would be a
side effect of getitem on the passed entity, which would not be nice.


From drekin at gmail.com  Fri Jul  4 11:29:34 2014
From: drekin at gmail.com (drekin at gmail.com)
Date: Fri, 04 Jul 2014 02:29:34 -0700 (PDT)
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
Message-ID: <53b673fe.475fc20a.2d71.5367@mx.google.com>

Just some ideas, not claiming they are good:

??? As already stated in the thread and also in the PEP, there are two different classes of uses cases of indexing with keyword arguments: as a named index, and as an option contextual to the indexing. I think that the cases ask for different signatures. Even if I have a complex indexing scheme, the signature is (assumming Strategy 1 or 3):

def __getitem__(self, idx): ???

However if I now want to add support for default value, I would do it like:

_Empty = object()
def __getitem__(self, idx, *, default=_Empty): ???

That leads to the following strategies.

??? Just for sake of completeness, maybe the easiest and also most powerful strategy would be just copying of behaviour of function call just with arguments going to __getitem__ instead of __call__ and allowing the syntax sugar for slices (which would raise the question whether to allow slice literals also in functin call or even in every expression).

This strategy has two serious problems:
    1. It is not backwards compatible with current mechanism of automatic packing of positional arguments.
    2. It is not clear how to intercorporate the additional parameter of __setitem__.

??? This takes me to the following hybrid strategy. Both strategies 1 and 3 pack everything into one idx object whereas stratery 2 leaves key indices in separate kwargs parameter. The hybrid strategy takes as much as possible from function call strategy and generalizes strategies 1, 2, 3 at the same time.

The general signature looks like this:
def __getitem__(self, idx, *, key1, key2=default, **kwargs): ???
During the call, every provided keyword argument with present corresponding parameter is put into that parameter. If there is **kwargs parameter then the remaining keyword arguments are put into kwargs and if not then they are somehow (strategy 1 or 3) packed into idx parameter.

Also the additional __setitem__ argument is just added as positional argument:
def __setitem__(self, idx, value, *, key1, key2=default, **kwargs): ???


Regards, Drekin



From stefano.borini at ferrara.linux.it  Fri Jul  4 17:44:30 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 4 Jul 2014 17:44:30 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <53B648C9.1090907@ferrara.linux.it>
References: <CAGu0Anvsiqv+hzudyPg0ku=1uLGC0dfjYJa_XXnhG2PBVCe=NA@mail.gmail.com>
 <44dd6995-1335-41aa-8ab0-8e1ae79f8818@googlegroups.com>
 <20140703193356.GD13843@ferrara.linux.it>
 <CAEQ_TvcyTCi9vrdzyfODMTOsLAHQMY3q17V-gOmERJUieLCNAQ@mail.gmail.com>
 <20140703195911.GE13843@ferrara.linux.it>
 <CAEQ_Tvf0hsKYP32EVF4YGrZaJPOJydWcxtg_vqm9Q_N7tfM4SA@mail.gmail.com>
 <110367949426113052.835097sturla.molden-gmail.com@news.gmane.org>
 <CAEQ_TvcTUtXNGK3rvVGPX+8H6YMM6UHqy817ODHKL61hVxN+XQ@mail.gmail.com>
 <CADiSq7cDo-eD0edGDGGvyMu9Hmc2d-Y0599xcq6D9XUwLgicRg@mail.gmail.com>
 <53B648C9.1090907@ferrara.linux.it>
Message-ID: <20140704154430.GA18583@ferrara.linux.it>

On Fri, Jul 04, 2014 at 08:25:13AM +0200, Stefano Borini wrote:
> On 7/3/14 11:48 PM, Nick Coghlan wrote:
>> Some more concrete pandas-based examples could definitely help make a
>> more compelling case. I genuinely think the hard part here is to make
>> the case for offering the feature *at all*, so adding a "here is
>> current real world pandas based code" and "here is how this PEP could
>> make that code more readable" example could be worthwhile.
>
> I agree. I will examine pandas this evening for more context.


Ok, I examined pandas, and I think it solves a completely different problem

In [27]: df.loc[:,['A','B']]
Out[27]: 
                   A         B
2013-01-01  0.469112 -0.282863
2013-01-02  1.212112 -0.173215


Pandas is naming the columns. With keyword arguments you would be naming the _axes_.


From stefano.borini at ferrara.linux.it  Fri Jul  4 20:10:51 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 4 Jul 2014 20:10:51 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <53B33800.1030300@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
Message-ID: <20140704181051.GB18583@ferrara.linux.it>

On Wed, Jul 02, 2014 at 12:36:48AM +0200, Stefano Borini wrote:
> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt

I just added a new strategy. This one cuts the problem down.

Strategy 4: Strict dictionary
-----------------------------

This strategy accepts that __getitem__ is special in accepting only one object,
and the nature of that object must be non-ambiguous in its specification of the
axes: it can be either by order, or by name. As a result of this assumption,
in presence of keyword arguments, the passed entity is a dictionary and all
labels must be specified.

    C0. a[1]; a[1,2]      -> idx = 1; idx=(1, 2)
    C1. a[Z=3]            -> idx = {"Z": 3}
    C2. a[Z=3, R=4]       -> idx = {"Z"=3, "R"=4}
    C3. a[1, Z=3]         -> raise SyntaxError
    C4. a[1, Z=3, R=4]    -> raise SyntaxError
    C5. a[1, 2, Z=3]      -> raise SyntaxError
    C6. a[1, 2, Z=3, R=4] -> raise SyntaxError
    C7. a[1, Z=3, 2, R=4] -> raise SyntaxError


Pros:
    - strong conceptual similarity between the tuple case and the dictionary case.
      In the first case, we are specifying a tuple, so we are naturally defining
      a plain set of values separated by commas. In the second, we are specifying a
      dictionary, so we are specifying a homogeneous set of key/value pairs, as
      in dict(Z=3, R=4)
    - simple and easy to parse on the __getitem__ side: if it gets a tuple, 
      determine the axes using positioning. If it gets a dictionary, use 
      the keywords.
    - C interface does not need changes.

Cons:
    - degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4], but the same degeneracy exists
      for a[(2,3)] and a[2,3].
    - very strict.
    - destroys the use case a[1, 2, default=5]


i

From phd at phdru.name  Fri Jul  4 20:20:18 2014
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 4 Jul 2014 20:20:18 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <20140704181051.GB18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
Message-ID: <20140704182018.GA30712@phdru.name>

On Fri, Jul 04, 2014 at 08:10:51PM +0200, Stefano Borini <stefano.borini at ferrara.linux.it> wrote:
>     C1. a[Z=3]            -> idx = {"Z": 3}
>     C2. a[Z=3, R=4]       -> idx = {"Z"=3, "R"=4}

   Huh? Shouldn't it be 
C2. a[Z=3, R=4]       -> idx = {"Z": 3, "R": 4}
   ???

> Cons:
>     - degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4], but the same degeneracy exists
>       for a[(2,3)] and a[2,3].

   There is no degeneration in the second case. Tuples are created by
commas, not parentheses (except for an empty tuple), hence (2,3) and 2,3
are simply the same thing. While Z=3, R=4 is far from being the same as
{"Z": 3, "R": 4}.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From stefano.borini at ferrara.linux.it  Fri Jul  4 20:34:24 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 4 Jul 2014 20:34:24 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <20140704183322.GC18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it> <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
Message-ID: <20140704183424.GD18583@ferrara.linux.it>

On Fri, Jul 04, 2014 at 08:20:18PM +0200, Oleg Broytman wrote:
> On Fri, Jul 04, 2014 at 08:10:51PM +0200, Stefano Borini <stefano.borini at ferrara.linux.it> wrote:
> >     C1. a[Z=3]            -> idx = {"Z": 3}
> >     C2. a[Z=3, R=4]       -> idx = {"Z"=3, "R"=4}
> 
>    Huh? Shouldn't it be 
> C2. a[Z=3, R=4]       -> idx = {"Z": 3, "R": 4}

yes. typo. already fixed in the PEP

> > Cons:
> >     - degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4], but the same degeneracy exists
> >       for a[(2,3)] and a[2,3].
> 
>    There is no degeneration in the second case. Tuples are created by
> commas, not parentheses (except for an empty tuple), hence (2,3) and 2,3
> are simply the same thing.

We discussed this point above in the thread, and you are of course 
right in saying so, yet it stresses the fact that no matter what you pass
inside those square brackets, they always end up funneled inside a single
object, which happens to be a tuple that you just created

> While Z=3, R=4 is far from being the same as
> {"Z": 3, "R": 4}.

but dict(Z=3, R=4) is the same as {"Z": 3, "R": 4}. 
this is exactly like tuple((2,3)) is the same as (2,3)
See the similarity? the square brackets "call a constructor"
on its content. This constructor is tuple if entries are not
key=values (except for the single index case, of course), 
and dict if entries are key=values.

From phd at phdru.name  Fri Jul  4 20:39:15 2014
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 4 Jul 2014 20:39:15 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <20140704183424.GD18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
 <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
Message-ID: <20140704183915.GA31861@phdru.name>

On Fri, Jul 04, 2014 at 08:34:24PM +0200, Stefano Borini <stefano.borini at ferrara.linux.it> wrote:
> On Fri, Jul 04, 2014 at 08:20:18PM +0200, Oleg Broytman wrote:
> > Z=3, R=4 is far from being the same as
> > {"Z": 3, "R": 4}.
> 
> but dict(Z=3, R=4) is the same as {"Z": 3, "R": 4}. 
> this is exactly like tuple((2,3)) is the same as (2,3)
> See the similarity? the square brackets "call a constructor"
> on its content. This constructor is tuple if entries are not
> key=values (except for the single index case, of course), 
> and dict if entries are key=values.

   I didn't like the idea from the beginning and I am still against it.

d = dict
a[d(Z=3, R=4)]

   looks good enough for me without adding any magic to the language.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From stefano.borini at ferrara.linux.it  Fri Jul  4 20:40:56 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 4 Jul 2014 20:40:56 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
	arguments
In-Reply-To: <20140704183424.GD18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it> <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
Message-ID: <20140704184056.GA24625@ferrara.linux.it>

On Fri, Jul 04, 2014 at 08:34:24PM +0200, Stefano Borini wrote:
> but dict(Z=3, R=4) is the same as {"Z": 3, "R": 4}. 
> this is exactly like tuple((2,3)) is the same as (2,3)
> See the similarity? the square brackets "call a constructor"
> on its content. This constructor is tuple if entries are not
> key=values (except for the single index case, of course), 
> and dict if entries are key=values.

On this regard, one can of course do 

idx=(2,3)
print(a[idx])

idx={"x":2, "y":3}
print(a[idx])

the above syntax is already legal today, and calls back to a comment from
a previous post. keywords would just be a shorthand for it.




From alexander.belopolsky at gmail.com  Fri Jul  4 21:00:56 2014
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Fri, 4 Jul 2014 15:00:56 -0400
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140704181051.GB18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
Message-ID: <CAP7h-xbMR7dN2Pha2b42Qpww+2Yf=m6AW8eqchiY2Ktj_gNCBA@mail.gmail.com>

On Fri, Jul 4, 2014 at 2:10 PM, Stefano Borini <
stefano.borini at ferrara.linux.it> wrote:

> I just added a new strategy. This one cuts the problem down.
>
> Strategy 4: Strict dictionary
>

Did anyone consider treating = inside [] in a similar way as : is treated
now.  One can even (re/ab)use the slice object:

a[1, 2, 5:7, Z=42] -> a.__getitem__((1, 2, slice(5, 7, None), slice('Z',
'=', 42)))

This strategy would also offer a semi-readable back-porting solution:

>>> class C:
...    def __getitem__(self, key):
...        print(key)
...
>>> c = C()
>>> c['Z':'=':42]
slice('Z', '=', 42)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140704/3bea5b7e/attachment-0001.html>

From timothy.c.delaney at gmail.com  Fri Jul  4 22:10:15 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Sat, 5 Jul 2014 06:10:15 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140704184056.GA24625@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
 <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
Message-ID: <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>

1. I think you absolutely *must* address the option of purely syntactic
sugar in the PEP. It will come up on python-dev, so address it now.

    a[b, c=f, e=f:g:h]
    -> a[b, 'c':d, 'e':slice(f, g, h)]

The rationale is readability and being both backwards and forwards
compatible - existing __getitem__ designed to abuse slices will continue to
work, and __getitem__ designed to work with the new syntax will work by
abusing slices in older versions of Python.

Pandas could be cited as an example of an existing library that could
potentially benefit. It would be good if there were precise examples of
Pandas syntax that would benefit immediately, but I don't know it beyond a
cursory glance over the docs. My gut feeling from that is that if the
syntax were available Pandas might be able to use it effectively.

2. I think you're at the point that you need to pick a single option as
your preferred option, and everything else needs to be in the alternatives.


FWIW, I would vote:


+1 for syntax-sugar only (zero backwards-compatibility concerns). If I were
starting from scratch this would not be my preferred option, but I think
compatibility is important.


+0 for a keyword(key, value) parameter object i.e.

a[b, c=d, e=f:g:h]
-> a[b, keyword('c', d), keyword('e', slice(f, g, h))]

My objection is that either __getitem__ will be more complicated if you
want to support earlier versions of Python (abuse slices for earlier
versions, use keyword object for current) or imposes an additional burden
on the caller in earlier versions (need to create a keyword-equivalent
object to call with). If we were starting from scratch this would be one of
my preferred options.


-1 to any option that loses the order of the parameters (I'm strongly in
favour of bringing order to keyword arguments - let's not take a backwards
step here).


-0 to any option that doesn't allow arbitrary ordering of positional and
keyword arguments i.e. any option where the following is not legal:

a[b, c=d, e]

This is something we can do now (albeit in a fairly verbose way at times)
and I think restricting this is likely to remove options for DSLs, etc.


-0 for namedtuple (BTW you might want to mention that
collections.namedtuple() already has precedent for _X positional parameter
names)

My objection is that it's not possible to determine definitively in
__getitem__ if the the call was:

a[b, c]

or

a[_0=b, _1=c]

which might be important in some use cases. The same objection would apply
to passing an OrderedDict (but that's got additional compatibility issues).

Cheers,

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140705/ba09ec93/attachment.html>

From njs at pobox.com  Fri Jul  4 22:39:00 2014
From: njs at pobox.com (Nathaniel Smith)
Date: Fri, 4 Jul 2014 21:39:00 +0100
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
 <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
 <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
Message-ID: <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>

On Fri, Jul 4, 2014 at 9:10 PM, Tim Delaney <timothy.c.delaney at gmail.com> wrote:
> 1. I think you absolutely *must* address the option of purely syntactic
> sugar in the PEP. It will come up on python-dev, so address it now.
>
>     a[b, c=f, e=f:g:h]
>     -> a[b, 'c':d, 'e':slice(f, g, h)]
>
> The rationale is readability and being both backwards and forwards
> compatible - existing __getitem__ designed to abuse slices will continue to
> work, and __getitem__ designed to work with the new syntax will work by
> abusing slices in older versions of Python.

I don't know of any existing code that abuses slices in this way (so
worrying about compatibility with it seems odd?).

> Pandas could be cited as an example of an existing library that could
> potentially benefit. It would be good if there were precise examples of
> Pandas syntax that would benefit immediately, but I don't know it beyond a
> cursory glance over the docs. My gut feeling from that is that if the syntax
> were available Pandas might be able to use it effectively.

Your hack (aside from being pointlessly ugly) would actually prevent
pandas from using this feature. In pandas, slices like foo["a":"b"]
already have a meaning (i.e., take all items from the one labeled "a"
to the one labeled "b").

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

From ethan at stoneleaf.us  Fri Jul  4 22:19:02 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 04 Jul 2014 13:19:02 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it> <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
 <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
Message-ID: <53B70C36.3000202@stoneleaf.us>

On 07/04/2014 01:10 PM, Tim Delaney wrote:
>
> 1. I think you absolutely *must* address the option of purely syntactic
> sugar in the PEP. It will come up on python-dev, so address it now.
>
>      a[b, c=f, e=f:g:h]
>      -> a[b, 'c':d, 'e':slice(f, g, h)]


> +1 for syntax-sugar only (zero backwards-compatibility concerns).

Also +1 for this approach.

--
~Ethan~

From timothy.c.delaney at gmail.com  Fri Jul  4 22:46:58 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Sat, 5 Jul 2014 06:46:58 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
 <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
 <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
 <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>
Message-ID: <CAN8CLgkSrf3s6x8pftZKb9hdPhBweNnOUay4rwxj_Vj8QVq+2g@mail.gmail.com>

On 5 July 2014 06:39, Nathaniel Smith <njs at pobox.com> wrote:

> On Fri, Jul 4, 2014 at 9:10 PM, Tim Delaney <timothy.c.delaney at gmail.com>
> wrote:
> > 1. I think you absolutely *must* address the option of purely syntactic
> > sugar in the PEP. It will come up on python-dev, so address it now.
> >
> >     a[b, c=f, e=f:g:h]
> >     -> a[b, 'c':d, 'e':slice(f, g, h)]
> >
> > The rationale is readability and being both backwards and forwards
> > compatible - existing __getitem__ designed to abuse slices will continue
> to
> > work, and __getitem__ designed to work with the new syntax will work by
> > abusing slices in older versions of Python.
>


> pandas from using this feature. In pandas, slices like foo["a":"b"]
> already have a meaning (i.e., take all items from the one labeled "a"
> to the one labeled "b").
>

If that's the case then it should be listed as a reason in the PEP for a
change larger than syntax sugar, otherwise this important information will
be lost.

One of the first suggestions when this PEP came up was to just (ab)use
slices - people will use the syntax they have available to them.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140705/8cec56e8/attachment.html>

From ethan at stoneleaf.us  Fri Jul  4 23:07:41 2014
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 04 Jul 2014 14:07:41 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it> <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
 <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
 <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>
Message-ID: <53B7179D.8050101@stoneleaf.us>

On 07/04/2014 01:39 PM, Nathaniel Smith wrote:
> On Fri, Jul 4, 2014 at 9:10 PM, Tim Delaney wrote:
>
> Your hack (aside from being pointlessly ugly) would actually prevent
> pandas from using this feature. In pandas, slices like foo["a":"b"]
> already have a meaning (i.e., take all items from the one labeled "a"
> to the one labeled "b").

Isn't that the standard way slices are supposed to be used though?  Instead of integers Panda is allowing strings.  How 
would Pandas use the new feature?

--
~Ethan~

From timothy.c.delaney at gmail.com  Fri Jul  4 23:40:23 2014
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Sat, 5 Jul 2014 07:40:23 +1000
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <53B7179D.8050101@stoneleaf.us>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
 <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
 <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
 <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>
 <53B7179D.8050101@stoneleaf.us>
Message-ID: <CAN8CLgm5c1o5F0q+HdyGRgjJcQiV0ejNgjhoDyYgQ75fTZ9HzQ@mail.gmail.com>

On 5 July 2014 07:07, Ethan Furman <ethan at stoneleaf.us> wrote:

> On 07/04/2014 01:39 PM, Nathaniel Smith wrote:
>
>  On Fri, Jul 4, 2014 at 9:10 PM, Tim Delaney wrote:
>>
>> Your hack (aside from being pointlessly ugly) would actually prevent
>> pandas from using this feature. In pandas, slices like foo["a":"b"]
>> already have a meaning (i.e., take all items from the one labeled "a"
>> to the one labeled "b").
>>
>
> Isn't that the standard way slices are supposed to be used though?
>  Instead of integers Panda is allowing strings.  How would Pandas use the
> new feature?
>

I think Nathaniel is saying that pandas is already using string slices in
an appropriate way (rather than abusing them), and so if this was just
syntax sugar they wouldn't be able to use the new syntax for new
functionality (since you couldn't distinguish the two).

It would be possible to make both approaches "work" by having an object
that had all of .start, .stop, .step, .key and .value (and trying
.key/.value first), but IMO that's going too far - I'd rather have a
separate object with just .key and .value to test for.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140705/22940d8a/attachment.html>

From stefano.borini at ferrara.linux.it  Fri Jul  4 23:41:44 2014
From: stefano.borini at ferrara.linux.it (Stefano Borini)
Date: Fri, 04 Jul 2014 23:41:44 +0200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
 arguments
In-Reply-To: <53B7179D.8050101@stoneleaf.us>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it> <20140704182018.GA30712@phdru.name>
 <20140704183322.GC18583@ferrara.linux.it>
 <20140704183424.GD18583@ferrara.linux.it>
 <20140704184056.GA24625@ferrara.linux.it>
 <CAN8CLgmisVn4dhtvBr3cReX5n8O4RMxToPcauHOzMXj0tqbhug@mail.gmail.com>
 <CAPJVwBnW9P1m7Asg7YWk7WuuQK07O+=CpRhq321WqNYZk74x8g@mail.gmail.com>
 <53B7179D.8050101@stoneleaf.us>
Message-ID: <53B71F98.6010309@ferrara.linux.it>

On 7/4/14 11:07 PM, Ethan Furman wrote:
> Isn't that the standard way slices are supposed to be used though?
> Instead of integers Panda is allowing strings.  How would Pandas use the
> new feature?

It would not. Pandas is using it to use labels as indexes. adding 
keywords would allow to name the axes. These are two completely 
different use cases.

For example, one could have a table containing the temperature
with the city on one axis and the time on the other axis.

So one could have

temperature["London", 12]

Pandas would have text indexes for "London", "New York", "Chicago" and 
so on. One could say

temperature["London":"Chicago", 12]

to get the temperature of the cities between "London" and "Chicago" at noon.

The PEP would allow instead to name the axes in the query

temperature[city="London":"Chicago", hour=12]


From greg.ewing at canterbury.ac.nz  Sat Jul  5 01:05:22 2014
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 05 Jul 2014 11:05:22 +1200
Subject: [Python-ideas] PEP pre-draft: Support for indexing with	keyword
 arguments
In-Reply-To: <20140704181051.GB18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
Message-ID: <53B73332.8020505@canterbury.ac.nz>

Stefano Borini wrote:

> Strategy 4: Strict dictionary
> -----------------------------
> 
> in presence of keyword arguments, the passed entity is a dictionary and all
> labels must be specified.

This wouldn't solve the OP's problem, because he apparently
needs to preserve the order of the keywords.

I don't really understand what he's trying to do, but
labelling the axes doesn't seem to be it, or at least not
just that.

-- 
Greg

From paultag at gmail.com  Sat Jul  5 02:59:30 2014
From: paultag at gmail.com (Paul Tagliamonte)
Date: Fri, 4 Jul 2014 20:59:30 -0400
Subject: [Python-ideas] lazy tuple unpacking
Message-ID: <20140705005930.GA7612@leliel.pault.ag>

Given:

    >>> def g_range(n):
    ...     for y in range(n):
    ...         yield y
    ... 

I notice that:

    >>> a, b, c, *others = g_range(100)

Works great. Super useful stuff there. Looks good.

I also notice that this causes *others to consume the generator
in a greedy way.

    >>> type(others)
    <class 'list'>

And this makes me sad.

    >>> a, b, c, *others = g_range(10000000000)
    # will also make your machine very sad. Eventually resulting
    # (ok, unless you've got a really fancy bit of kit) in:
    Killed

Really, the behavior (I think) should be more similar to:

    >>> _x = g_range(1000000000000)
    >>> a = next(_x)
    >>> b = next(_x)
    >>> c = next(_x)
    >>> others = _x
    >>> 


Of course, this leads to all sorts of fun errors, like the fact you
couldn't iterate over it twice. This might not be expected. However, it
might be nice to have this behavior when you're unpacking a generator.

Thoughts?
  Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140704/0658e233/attachment.sig>

From pyideas at rebertia.com  Sat Jul  5 03:10:44 2014
From: pyideas at rebertia.com (Chris Rebert)
Date: Fri, 4 Jul 2014 18:10:44 -0700
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <20140705005930.GA7612@leliel.pault.ag>
References: <20140705005930.GA7612@leliel.pault.ag>
Message-ID: <CAMZYqRTSA=JHqxMuHkpRTF7yaeoEoqWaWxGe0nRQgcUFr8158Q@mail.gmail.com>

On Fri, Jul 4, 2014 at 5:59 PM, Paul Tagliamonte <paultag at gmail.com> wrote:
<snip>
> I notice that:
>
>     >>> a, b, c, *others = g_range(100)
>
> Works great. Super useful stuff there. Looks good.
>
> I also notice that this causes *others to consume the generator
> in a greedy way.
>
>     >>> type(others)
>     <class 'list'>
>
> And this makes me sad.
>
>     >>> a, b, c, *others = g_range(10000000000)
>     # will also make your machine very sad. Eventually resulting
>     # (ok, unless you've got a really fancy bit of kit) in:
>     Killed
>
> Really, the behavior (I think) should be more similar to:
>
>     >>> _x = g_range(1000000000000)
>     >>> a = next(_x)
>     >>> b = next(_x)
>     >>> c = next(_x)
>     >>> others = _x
>     >>>
>
>
> Of course, this leads to all sorts of fun errors, like the fact you
> couldn't iterate over it twice. This might not be expected. However, it
> might be nice to have this behavior when you're unpacking a generator.
>
> Thoughts?

It would mean an (IMHO undesirable) loss of consistency/symmetry,
type-wise, with other unpackings where this generator optimization
isn't possible:

Python 3.4.1 (default, May 19 2014, 13:10:29)
>>> x = [1,2,3,4,5,6,7]
>>> a,b,*c,d = x
>>> c
[3, 4, 5, 6]
>>> *e,f,g = x
>>> e
[1, 2, 3, 4, 5]

Cheers,
Chris

From paultag at gmail.com  Sat Jul  5 03:12:49 2014
From: paultag at gmail.com (Paul Tagliamonte)
Date: Fri, 4 Jul 2014 21:12:49 -0400
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <CAMZYqRTSA=JHqxMuHkpRTF7yaeoEoqWaWxGe0nRQgcUFr8158Q@mail.gmail.com>
References: <20140705005930.GA7612@leliel.pault.ag>
 <CAMZYqRTSA=JHqxMuHkpRTF7yaeoEoqWaWxGe0nRQgcUFr8158Q@mail.gmail.com>
Message-ID: <20140705011249.GA9902@leliel.pault.ag>

On Fri, Jul 04, 2014 at 06:10:44PM -0700, Chris Rebert wrote:
> It would mean an (IMHO undesirable) loss of consistency/symmetry,
> type-wise, with other unpackings where this generator optimization
> isn't possible:
> 
> Python 3.4.1 (default, May 19 2014, 13:10:29)
> >>> x = [1,2,3,4,5,6,7]
> >>> a,b,*c,d = x
> >>> c
> [3, 4, 5, 6]
> >>> *e,f,g = x
> >>> e
> [1, 2, 3, 4, 5]
> 
> Cheers,
> Chris

Euch, good point. This feature might just be DOA.

Thanks!
  Paul
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140704/69aa5fd8/attachment-0001.sig>

From graffatcolmingov at gmail.com  Sat Jul  5 03:19:01 2014
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Fri, 4 Jul 2014 20:19:01 -0500
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <20140705005930.GA7612@leliel.pault.ag>
References: <20140705005930.GA7612@leliel.pault.ag>
Message-ID: <CAN-Kwu0L1kSsL3fZZUV_UfoQPa68Y3i=eoSu7HvM-dHGk-Vn9w@mail.gmail.com>

On Fri, Jul 4, 2014 at 7:59 PM, Paul Tagliamonte <paultag at gmail.com> wrote:
> Given:
>
>     >>> def g_range(n):
>     ...     for y in range(n):
>     ...         yield y
>     ...
>
> I notice that:
>
>     >>> a, b, c, *others = g_range(100)
>
> Works great. Super useful stuff there. Looks good.
>
> I also notice that this causes *others to consume the generator
> in a greedy way.
>
>     >>> type(others)
>     <class 'list'>
>
> And this makes me sad.
>
>     >>> a, b, c, *others = g_range(10000000000)
>     # will also make your machine very sad. Eventually resulting
>     # (ok, unless you've got a really fancy bit of kit) in:
>     Killed
>
> Really, the behavior (I think) should be more similar to:
>
>     >>> _x = g_range(1000000000000)
>     >>> a = next(_x)
>     >>> b = next(_x)
>     >>> c = next(_x)
>     >>> others = _x
>     >>>
>
>
> Of course, this leads to all sorts of fun errors, like the fact you
> couldn't iterate over it twice. This might not be expected. However, it
> might be nice to have this behavior when you're unpacking a generator.
>
> Thoughts?

I agree that the behaviour is suboptimal, but as Chris already pointed
out it would introduce a significant inconsistency in the API of
unpacking. I'm struggling to see a *good* way of doing this. My first
instinct was that we could make something like this do what you
expect:

>>> a, b, c, others = g_range(some_really_big_number)
>>> others
<generator ...>

But this doesn't work like this currently because Python currently
raises a ValueError because there were too many values to unpack. I'm
also against introducing some new syntax to add the behaviour.

From bruce at leapyear.org  Sat Jul  5 05:16:59 2014
From: bruce at leapyear.org (Bruce Leban)
Date: Fri, 4 Jul 2014 20:16:59 -0700
Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword
	arguments
In-Reply-To: <20140704181051.GB18583@ferrara.linux.it>
References: <53B33800.1030300@ferrara.linux.it>
 <20140704181051.GB18583@ferrara.linux.it>
Message-ID: <CAGu0AntwO14-qpg-Vwaw_x-VkkiGpxxPN3+7twNLiD6xzHVBjw@mail.gmail.com>

On Fri, Jul 4, 2014 at 11:10 AM, Stefano Borini <
stefano.borini at ferrara.linux.it> wrote:

> On Wed, Jul 02, 2014 at 12:36:48AM +0200, Stefano Borini wrote:
> > https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt
>
> Strategy 4: Strict dictionary
> -----------------------------
>
> This strategy accepts that __getitem__ is special in accepting only one
> object,
> and the nature of that object must be non-ambiguous in its specification
> of the
> axes: it can be either by order, or by name. As a result of this
> assumption,
> in presence of keyword arguments, the passed entity is a dictionary and all
> labels must be specified.
>

The result that "all labels must be specified" does not follow from that
assumption that the object must be unambiguous. Numbers are not valid
keyword names but are perfectly useful as index values. See below. Note
that I am not advocating for/against strategy 4, just commenting on it.

>
>     C0. a[1]; a[1,2]      -> idx = 1; idx=(1, 2)
>     C1. a[Z=3]            -> idx = {"Z": 3}
>     C2. a[Z=3, R=4]       -> idx = {"Z"=3, "R"=4}
>     C3. a[1, Z=3]         -> {0: 1, "Z": 3}
>     C4. a[1, Z=3, R=4]    ->  {0: 1, "Z": 3, "R": 4}

    C5. a[1, 2, Z=3]      ->  {0: 1, 1: 2, "Z": 3}
>     C6. a[1, 2, Z=3, R=4] ->  {0: 1, 1: 2, "Z": 3, "R": 4}

    C7. a[1, Z=3, 2, R=4] -> raise SyntaxError
>
> Note that idx[0] would have the same value it would have in the normal
__getitem__ call while in all cases above idx[3] would raise an exception.
It would not be the case that a[1,2] and a[x=1,y=2] would be
interchangeable as they would for function calls. That would still have to
be handled by the __getitem__ function itself. But it's fairly easy to
write a function that does that:

def extract_indexes(idx, args):
    # args is a list of tuples either (key, default) or (key,) if no default
    result = []
    for i, arg in zip(itertools.count(), args):
        if i in idx and arg[0] in idx:
            raise IndexError
        result.append(idx[i] if i in idx else idx[arg[0]] if arg[0] in idx
else arg[1])
    return result

This raises IndexError if a key value is specified both positionally and by
name or if a missing key value does not have a default. It should also (but
does not) raise IndexError when idx contains extra keys not listed in args.
It also doesn't support unnamed (positional only) indexes. Neither of those
is difficult to add.

--- Bruce
Learn how hackers think: http://j.mp/gruyere-security
https://www.linkedin.com/in/bruceleban
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140704/6bda998a/attachment.html>

From abarnert at yahoo.com  Sat Jul  5 13:26:26 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 5 Jul 2014 04:26:26 -0700
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <CAMZYqRTSA=JHqxMuHkpRTF7yaeoEoqWaWxGe0nRQgcUFr8158Q@mail.gmail.com>
References: <20140705005930.GA7612@leliel.pault.ag>
 <CAMZYqRTSA=JHqxMuHkpRTF7yaeoEoqWaWxGe0nRQgcUFr8158Q@mail.gmail.com>
Message-ID: <1404559586.21453.YahooMailNeo@web181003.mail.ne1.yahoo.com>

On Friday, July 4, 2014 6:11 PM, Chris Rebert <pyideas at rebertia.com> wrote:

> On Fri, Jul 4, 2014 at 5:59 PM, Paul Tagliamonte <paultag at gmail.com> 
> wrote:
> <snip>
>>  I notice that:
>> 
>> ? ?  >>> a, b, c, *others = g_range(100)
>> 
>>  Works great. Super useful stuff there. Looks good.
>> 
>>  I also notice that this causes *others to consume the generator
>>  in a greedy way.
>> 
>> ? ?  >>> type(others)
>> ? ?  <class 'list'>
>> 
>>  And this makes me sad.
>> 
>> ? ?  >>> a, b, c, *others = g_range(10000000000)
>> ? ?  # will also make your machine very sad. Eventually resulting
>> ? ?  # (ok, unless you've got a really fancy bit of kit) in:
>> ? ?  Killed
>> 
>>  Really, the behavior (I think) should be more similar to:
>> 
>> ? ?  >>> _x = g_range(1000000000000)
>> ? ?  >>> a = next(_x)
>> ? ?  >>> b = next(_x)
>> ? ?  >>> c = next(_x)
>> ? ?  >>> others = _x
>> ? ?  >>>
>> 
>> 
>>  Of course, this leads to all sorts of fun errors, like the fact you
>>  couldn't iterate over it twice. This might not be expected. However, it
>>  might be nice to have this behavior when you're unpacking a generator.
>> 
>>  Thoughts?
> 
> It would mean an (IMHO undesirable) loss of consistency/symmetry,
> type-wise, with other unpackings where this generator optimization
> isn't possible:
> 
> Python 3.4.1 (default, May 19 2014, 13:10:29)
>>>>  x = [1,2,3,4,5,6,7]
>>>>  a,b,*c,d = x
>>>>  c
> [3, 4, 5, 6]
>>>>  *e,f,g = x
>>>>  e
> [1, 2, 3, 4, 5]


When I was experimenting with adding lazy lists (sequences that wrap an iterator and request each value on first request), I played around with PyPy (Python 2.x, not 3.x), making unpacking (as well as map, filter, etc.) return them instead of iterators. (I also tried to make generator functions and expressions return them, but I couldn't get that to work in a quick hack?)

It worked nicely, and I think it would work with the expanded unpacking in 3.x. In Paul's original example, others is a lazy list of 9999999997 elements, which will only evaluate the ones you actually ask for. In Chris's examples, c and e are fully-evaluated lazy lists of 4 or 3 elements, respectively.

But, even if this weren't a ridiculously radical change to the language, I don't think it's what you'd want.

First, an iterator over 9999999997 elements is a lot more useful than a lazy list of 9999999997 elements, because you can iterate the whole thing without running out of memory.

Second, it wouldn't help in a case like this:

? ? a, b, *c, d, e = range(10000000000)

To make that work, you'd need something smarter than just using a lazy list instead of an iterator.

One way to solve it is to try to keep the original type, so unpacking maps to something like this:

? ? try:
? ? ? ? a, b, c, d, e = i[0], i[1], i[2:-2], i[-2], i[-1]
? ? except TypeError:
? ? ? ? # current behavior

Then c ends up as range(2, 9999999998), which is the best possible thing you could get there.

You could take this even further by adding either a notion of bidirectional and forward-only sequences, or a notion of reversible iterables, but that's getting much farther into left field; if anyone's interested, see http://stupidpythonideas.blogspot.com/2014/07/lazy-tuple-unpacking.html for details.

From toddrjen at gmail.com  Tue Jul  8 10:30:16 2014
From: toddrjen at gmail.com (Todd)
Date: Tue, 8 Jul 2014 10:30:16 +0200
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <20140705005930.GA7612@leliel.pault.ag>
References: <20140705005930.GA7612@leliel.pault.ag>
Message-ID: <CAFpSVpJuDcdQD7BoqOYLCyvwJVFDO6AiybyqfibwMPh7Po08iA@mail.gmail.com>

On Sat, Jul 5, 2014 at 2:59 AM, Paul Tagliamonte <paultag at gmail.com> wrote:

> Given:
>
>     >>> def g_range(n):
>     ...     for y in range(n):
>     ...         yield y
>     ...
>
> I notice that:
>
>     >>> a, b, c, *others = g_range(100)
>
> Works great. Super useful stuff there. Looks good.
>
> I also notice that this causes *others to consume the generator
> in a greedy way.
>
>     >>> type(others)
>     <class 'list'>
>
> And this makes me sad.
>
>     >>> a, b, c, *others = g_range(10000000000)
>     # will also make your machine very sad. Eventually resulting
>     # (ok, unless you've got a really fancy bit of kit) in:
>     Killed
>
> Really, the behavior (I think) should be more similar to:
>
>     >>> _x = g_range(1000000000000)
>     >>> a = next(_x)
>     >>> b = next(_x)
>     >>> c = next(_x)
>     >>> others = _x
>     >>>
>
>
> Of course, this leads to all sorts of fun errors, like the fact you
> couldn't iterate over it twice. This might not be expected. However, it
> might be nice to have this behavior when you're unpacking a generator.
>
> Thoughts?
>   Paul
>

Besides the issues others have discussed, another issue I see here is that
you are basically copying the iterator.  In the case of this, where "gen_a"
is a generator :

>>> a, b, c, *others = gen_a

"others" should be the same as "gen_a" in the end (i.e. "others is gen_a ==
True").  This seems redundant, especially when we have the itertools "take"
recipe which can be used to retrieve the first "n" values of an iterator,
which can then be unpacked in whatever way you want.


However, there might be an alternative.  You could have something where, if
you are unpacking an iterable to N variables, you can tell it to just
unpack the first N values, and the iterable then remains at position N
(similar to "take", but integrated more deeply).  For the case of something
like a list or tuple, it will just unpack those variables and skip the
rest.  Maybe either a method like this:

>>> a, b, c = gen_a.unpack()

Or some sort of syntax to say that the remaining values should be skipped
(although I don't really know what syntax would be good here, the syntax I
am using here is probably not good):

>>> a, b, c, [] = gen_a

Of course with "take" so simple to implement, this is probably way
overkill.  I also don't know if it is even possible for the right side of
the expression to know how the layout of the left in that way.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140708/cc7d5a4d/attachment.html>

From paultag at gmail.com  Tue Jul  8 18:17:37 2014
From: paultag at gmail.com (Paul Tagliamonte)
Date: Tue, 8 Jul 2014 12:17:37 -0400
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <CAFpSVpJuDcdQD7BoqOYLCyvwJVFDO6AiybyqfibwMPh7Po08iA@mail.gmail.com>
References: <20140705005930.GA7612@leliel.pault.ag>
 <CAFpSVpJuDcdQD7BoqOYLCyvwJVFDO6AiybyqfibwMPh7Po08iA@mail.gmail.com>
Message-ID: <20140708161737.GA13805@helios.pault.ag>

On Tue, Jul 08, 2014 at 10:30:16AM +0200, Todd wrote:
>    Besides the issues others have discussed, another issue I see here is that
>    you are basically copying the iterator.? In the case of this, where
>    "gen_a" is a generator :
> 
>    >>> a, b, c, *others = gen_a
> 
>    "others" should be the same as "gen_a" in the end (i.e. "others is gen_a
>    == True").? This seems redundant, especially when we have the itertools
>    "take" recipe which can be used to retrieve the first "n" values of an
>    iterator, which can then be unpacked in whatever way you want.
> 
>    However, there might be an alternative.? You could have something where,
>    if you are unpacking an iterable to N variables, you can tell it to just
>    unpack the first N values, and the iterable then remains at position N
>    (similar to "take", but integrated more deeply).? For the case of
>    something like a list or tuple, it will just unpack those variables and
>    skip the rest.? Maybe either a method like this:
> 
>    >>> a, b, c = gen_a.unpack()
> 
>    Or some sort of syntax to say that the remaining values should be skipped
>    (although I don't really know what syntax would be good here, the syntax I
>    am using here is probably not good):
> 
>    >>> a, b, c, [] = gen_a
> 
>    Of course with "take" so simple to implement, this is probably way
>    overkill.? I also don't know if it is even possible for the right side of
>    the expression to know how the layout of the left in that way.

Yeah, I think all the productive ideas (thanks, Andrew and Todd) to make
this happen are mostly starting to converge on full-blown lazy lists,
which is to say, generators which are indexable, sliceable, and work from both
ends (which is to say: more than the current iterator protocol).

I totally like the idea, not sure how keen everyone will be about it.

I'm not sure I have the energy or drive to try and convince everyone on
python-dev this is a good idea, but I'd totally love to play around
with this. Anyone else?

Cheers,
  Paul

-- 
#define sizeof(x) rand()
</paul>
:wq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140708/6006027c/attachment.sig>

From abarnert at yahoo.com  Tue Jul  8 19:41:09 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 8 Jul 2014 10:41:09 -0700
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <CAFpSVpJuDcdQD7BoqOYLCyvwJVFDO6AiybyqfibwMPh7Po08iA@mail.gmail.com>
References: <20140705005930.GA7612@leliel.pault.ag>
 <CAFpSVpJuDcdQD7BoqOYLCyvwJVFDO6AiybyqfibwMPh7Po08iA@mail.gmail.com>
Message-ID: <1404841269.59215.YahooMailNeo@web181002.mail.ne1.yahoo.com>

On Tuesday, July 8, 2014 1:31 AM, Todd <toddrjen at gmail.com> wrote:


>Besides the issues others have discussed, another issue I see here is that you are basically copying the iterator.? In the case of this, where "gen_a" is a generator :
>
>>>> a, b, c, *others = gen_a
>
>"others" should be the same as "gen_a" in the end (i.e. "others is gen_a == True").? This seems redundant, especially when we have the itertools "take" recipe which can be used to retrieve the first "n" values of an iterator, which can then be unpacked in whatever way you want.

I don't think the issue he's trying to solve is that others is gen_a, but just that gen_a is not exhausted and copied into a list. If others were a wrapper around gen_a instead, I think that would solve all of the interesting use cases.?(But I don't want to put words in Paul Tagliamonte's mouth here, so make sure he confirms it before replying in depth?)

>However, there might be an alternative.? You could have something where, if you are unpacking an iterable to N variables, you can tell it to just unpack the first N values, and the iterable then remains at position N (similar to "take", but integrated more deeply).? For the case of something like a list or tuple, it will just unpack those variables and skip the rest.? Maybe either a method like this:
>
>>>> a, b, c = gen_a.unpack()
>
>Or some sort of syntax to say that the remaining values should be skipped (although I don't really know what syntax would be good here, the syntax I am using here is probably not good):

>
>
>>>> a, b, c, [] = gen_a


I think the obvious way to write this is a bare *:

>>> a, b, *, c, d = range(10)
>>> a, b, c, d
(0, 1, 8, 9)

>Of course with "take" so simple to implement, this is probably way overkill.? I also don't know if it is even possible for the right side of the expression to know how the layout of the left in that way.


There was a thread a few months back about enhancing the unpacking protocol by asking the iterable itself to do the unpacking (possibly feeding it more information about what needs to be unpacked, something like an __unpack__(self, prestar_count, star_flag, poststar_count)), which would allow the flexibility you're looking for.

I don't want to repeat someone else's use cases and arguments out of my own faulty memory; if you're interested in following up, search the python-ideas archive.

Anyway, your explicit method version could be written today. Obviously it wouldn't work on generators or other arbitrary iterators, but you could write a simple wrapper that takes an iterator and returns an iterator with an unpack method, which I think would be enough for experimenting with the feature.

Meanwhile, in the case of a non-iterator iterable, what would you want to happen here? Should others end up as what's left over from the iterator created by iter(iterable)? In other words:

>>> a, b, *others = [1, 2, 3, 4, 5]
>>> others
<list_iterator at 0x12345678>
>>> next(others)
3


From abarnert at yahoo.com  Tue Jul  8 20:25:36 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 8 Jul 2014 11:25:36 -0700
Subject: [Python-ideas] lazy tuple unpacking
In-Reply-To: <20140708161737.GA13805@helios.pault.ag>
References: <20140705005930.GA7612@leliel.pault.ag>
 <CAFpSVpJuDcdQD7BoqOYLCyvwJVFDO6AiybyqfibwMPh7Po08iA@mail.gmail.com>
 <20140708161737.GA13805@helios.pault.ag>
Message-ID: <1404843936.17303.YahooMailNeo@web181003.mail.ne1.yahoo.com>

> On Tuesday, July 8, 2014 9:18 AM, Paul Tagliamonte <paultag at gmail.com> wrote:



> Yeah, I think all the productive ideas (thanks, Andrew and Todd) to make
> this happen are mostly starting to converge on full-blown lazy lists,
> which is to say, generators which are indexable, sliceable, and work from both
> ends (which is to say: more than the current iterator protocol).


I don't think that's necessarily true.

First, I think the idea of just trying to index and slice the iterable and falling back to the current behavior is at least worth thinking about. It wouldn't solve the problem for iterators, but it would for your example (range), or any other kind of sequence that knows how to slice itself in a better way than making a list.

And to take that farther, you don't necessarily need to replace iterators with lazy lists, just with some kind of sequence. A view is just as indexable, sliceable, and reusable as a lazy list, and better in many ways, and Python already has views like dict_keys built in, and NumPy already uses a similar idea for slicing.

I believe either Guido or Nick has a writeup somewhere on why list slices being new lists rather than views is a good thing, not just a historical accident we're stuck with. But that doesn't mean that having views for many of the cases we use iterators for today (including a view comprehension, a viewtools library, etc.) would necessarily be a bad idea.

And, as I mentioned, expanding the notion of sequence to include weaker notions of bidirectional-only sequence and forward-only sequence eliminates many of the other need for iterators (but not all?a general generator function obviously can't return a reusable forward-only sequence).

If you're interest in more on this, see http://stupidpythonideas.blogspot.com/2014/07/lazy-tuple-unpacking.html and http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-views.html for some ideas.


> I totally like the idea, not sure how keen everyone will be about it.

> 
> I'm not sure I have the energy or drive to try and convince everyone on
> python-dev this is a good idea, but I'd totally love to play around
> with this. Anyone else?


Before trying to convince anyone this is a good idea, first you want to build a lazy-list library: a lazy list type, lazy-list versions of map/filter/dropwhile, etc.

It's actually pretty simple. A lazy list basically looks like this:

? ? class LazyList(collections.abc.Sequence):
? ? ? ? def __init__(self, iterable):
? ? ? ? ? ? self.lst, self.it = [], iter(iterable)
? ? ? ? def __getitem__(self, index):
? ? ? ? ? ? while index >= len(self.lst):
? ? ? ? ? ? ? ? self.lst.append(next(self.it))
? ? ? ? ? ? return self.lst[index]

You have to add slice support, set/del, etc., but it's all pretty simply. The only tricky question is what to do about slicing, because you have a choice there. You could just loop over the slice and get/set/del each index, or you could return a new LazyList around islice(self), or you could do the latter if stop is None else the former.

And then all the lazy functions just call the iterator function and wrap the result in a LazyList.

The big problem with lazy lists is that once a value is instantiated, it stays around as long as the list does. So, if you use a lazy list as an iterable, you're basically building the whole list in memory. Iterators obviously don't have that problem.


It's worth looking at Haskell and other lazy functional languages to see why they don't have that problem. Their lists are conses (singly-linked lists with tail sharing). So, making it lazy automatically means that if you just iterate L without keeping a reference, only one cons is around at a time, while if you keep a reference to L, the whole list is available in memory. That won't work for array-based lists like Python's, and I'm not sure how you'd solve that without completely changing the way iteration works in Python. (Of course you can easily implement cons lists in Python, and then make them lazy, but then they're not sequences?in particular, they're not indexable, and generally won't work with any typical Python algorithms that aren't already happy with iterators.)

From abarnert at yahoo.com  Thu Jul 17 21:53:05 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 12:53:05 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
Message-ID: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>

tl;dr: readline and friends should take an optional sep parameter (which also means adding an iterlines method).

Recently, I was trying to add -0 support to a command-line tool, which means that it reads filenames out of stdin and/or a text file with \0 separators instead of \n.

This means that my code that looked like this:

? ? with open(path, encoding=sys.getfilesystemencoding()) as f:
? ? ? ? for filename in f:
? ? ? ? ? ? do_stuff(filename)

? turned into this (from memory, not the exact code):

? ? def resplit(chunks, sep):
? ? ? ? buf = b''
? ? ? ? for chunk in chunks:
? ? ? ? ? ? parts = (buf+chunk).split(sep)

? ? ? ? ? ? yield from parts[:-1]
? ? ? ? ? ? buf = parts[-1]
? ? ? ? if buf:
? ? ? ? ? ? yield buf

? ? with open(path, 'rb') as f:
? ? ? ? chunks = iter(lambda: f.read(4096), b'')
? ? ? ? for line in resplit(chunks, b'\0'):
? ? ? ? ? ? filename = line.decode(sys.getfilesystemencoding())
? ? ? ? ? ? do_stuff(filename)

Besides being a lot more code (and involving things that a novice might have problems reading like that two-argument iter), this also means that the file pointer is way ahead of the line that's just been iterated, I'm inefficiently buffering everything twice, etc.

The problem is that readline is hardcoded to look for b'\n' for binary files, smart-universal-newline-thingy for text files, there's no way to reuse its machinery if you want to look for something different, and there's no way to access the internals that it uses if you want to reimplement it.

While it might be possible to fix the latter problems in some generic and flexible way, that doesn't seem all that useful; really, other than changing the way readline splits, I don't think anyone wants to hook anything else about file objects. (On the other hand, people might want to hook it in more complex ways?e.g., pass a separator function instead of a separator string? I'm probably reaching there?)

If I'm right, all that's needed is an extra sep=None keyword-only parameter to readline and friends (where None means the existing newline behavior), along with an iterlines method that's identical to __iter__ except that it has room for that new parameter.

One minor side problem: Sometimes you don't actually have a file, but some kind of file-like object. I realize that as 3.1 or so, this is supposed to mean it actually is an io.BufferedIOBase or etc., but there are still plenty of third-party modules that just demand and/or provide "something with read(size)" or the like. In fact, that's the case with the problem I ran into above; another feature uses a third-party module to provide file-like objects for members of all kinds of uncommon archive types, and unlike zipfile, that module wasn't changed to provide io subclasses when it was ported to 3.x. So, it might be worth having adapters that make it easier (or just possible?) to wrap such a thing in the actual io interfaces. (The existing wrappers aren't adapters?BufferedReader demands readinto(buf), not read(size); TextIOWrapper can only wrap a BufferedIOBase.) But that's really a separate issue (and the answer to that one may just be to hold firm
 with the "file-like object means IOBase" and eventually every library you care about will work that way, even if you occasionally have to fix it yourself).

From guido at python.org  Thu Jul 17 22:48:28 2014
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jul 2014 13:48:28 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
Message-ID: <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>

I think it's fine to add something to stdlib that encapsulates your
example. (TBD: where?)

I don't think it is reasonable to add a new parameter to readline(),
because streams are widely implemented using duck typing -- every
implementation would have to be updated to support this.


On Thu, Jul 17, 2014 at 12:53 PM, Andrew Barnert <
abarnert at yahoo.com.dmarc.invalid> wrote:

> tl;dr: readline and friends should take an optional sep parameter (which
> also means adding an iterlines method).
>
> Recently, I was trying to add -0 support to a command-line tool, which
> means that it reads filenames out of stdin and/or a text file with \0
> separators instead of \n.
>
> This means that my code that looked like this:
>
>     with open(path, encoding=sys.getfilesystemencoding()) as f:
>         for filename in f:
>             do_stuff(filename)
>
> ? turned into this (from memory, not the exact code):
>
>     def resplit(chunks, sep):
>         buf = b''
>         for chunk in chunks:
>             parts = (buf+chunk).split(sep)
>
>             yield from parts[:-1]
>             buf = parts[-1]
>         if buf:
>             yield buf
>
>     with open(path, 'rb') as f:
>         chunks = iter(lambda: f.read(4096), b'')
>         for line in resplit(chunks, b'\0'):
>             filename = line.decode(sys.getfilesystemencoding())
>             do_stuff(filename)
>
> Besides being a lot more code (and involving things that a novice might
> have problems reading like that two-argument iter), this also means that
> the file pointer is way ahead of the line that's just been iterated, I'm
> inefficiently buffering everything twice, etc.
>
> The problem is that readline is hardcoded to look for b'\n' for binary
> files, smart-universal-newline-thingy for text files, there's no way to
> reuse its machinery if you want to look for something different, and
> there's no way to access the internals that it uses if you want to
> reimplement it.
>
> While it might be possible to fix the latter problems in some generic and
> flexible way, that doesn't seem all that useful; really, other than
> changing the way readline splits, I don't think anyone wants to hook
> anything else about file objects. (On the other hand, people might want to
> hook it in more complex ways?e.g., pass a separator function instead of a
> separator string? I'm probably reaching there?)
>
> If I'm right, all that's needed is an extra sep=None keyword-only
> parameter to readline and friends (where None means the existing newline
> behavior), along with an iterlines method that's identical to __iter__
> except that it has room for that new parameter.
>
> One minor side problem: Sometimes you don't actually have a file, but some
> kind of file-like object. I realize that as 3.1 or so, this is supposed to
> mean it actually is an io.BufferedIOBase or etc., but there are still
> plenty of third-party modules that just demand and/or provide "something
> with read(size)" or the like. In fact, that's the case with the problem I
> ran into above; another feature uses a third-party module to provide
> file-like objects for members of all kinds of uncommon archive types, and
> unlike zipfile, that module wasn't changed to provide io subclasses when it
> was ported to 3.x. So, it might be worth having adapters that make it
> easier (or just possible?) to wrap such a thing in the actual io
> interfaces. (The existing wrappers aren't adapters?BufferedReader demands
> readinto(buf), not read(size); TextIOWrapper can only wrap a
> BufferedIOBase.) But that's really a separate issue (and the answer to that
> one may just be to hold firm
>  with the "file-like object means IOBase" and eventually every library you
> care about will work that way, even if you occasionally have to fix it
> yourself).
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/




-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140717/8a07586f/attachment.html>

From python at 2sn.net  Thu Jul 17 23:39:42 2014
From: python at 2sn.net (Alexander Heger)
Date: Fri, 18 Jul 2014 07:39:42 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
Message-ID: <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>

> I don't think it is reasonable to add a new parameter to readline(), because
> streams are widely implemented using duck typing -- every implementation
> would have to be updated to support this.

Could the "split" (or splitline) keyword-only parameter instead be
passed to the open function (and the __init__ of IOBase and be stored
there)?

From abarnert at yahoo.com  Thu Jul 17 23:59:29 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 14:59:29 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
Message-ID: <1405634369.86826.YahooMailNeo@web181006.mail.ne1.yahoo.com>

On Thursday, July 17, 2014 1:48 PM, Guido van Rossum <guido at python.org> wrote:


>I think it's fine to add something to stdlib that encapsulates your example. (TBD: where?)

Good question about the where.

The resplit function seems like it could be of more general use than just this case, but I'm not sure where it belongs. Maybe itertools?

The?iter(lambda: f.read(bufsize), b'') part seems too trivial to put anywhere, even just as an example in the docs?but given that it probably looks like a magic incantation to anyone who's a Python novice (even if they're a C or JS or whatever expert), maybe it is worth putting somewhere. Maybe io.iterchunks(f, 4096)?

If so, the combination of the two into something like iterlines(f, b'\0') seems like it should go right alongside iterchunks.


However?


>I don't think it is reasonable to add a new parameter to readline()

The problem is that my code has significant problems for many use cases, and I don't think they can be solved.

Calling readline (or iterating the file) uses the underlying buffer (and stream decoder, for text files), keeps the file pointer in the same place, etc. My code doesn't, and no external code can. So, besides being less efficient, it leaves the file pointer in the wrong place (imagine using it to parse an RFC822 header then read() the body), doesn't properly decode files where the separator can be ambiguous with other bytes (try separating on '\0' in a UTF-16 file), etc.

Maybe if we had more powerful adapters or wrappers so I could just say "here's a pre-existing buffer plus a text-file-like object, now wrap that up as a real TextIOBase for me" it would be possible to write something that worked from outside without these problems, but as things stand, I don't see an answer.

Maybe put resplit in the stdlib, then just give iterlines as a 2-liner example (in the itertools recipes, or the file-I/O section of the tutorial?) where all these problems can be raised and not answered?

From abarnert at yahoo.com  Fri Jul 18 00:21:25 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 15:21:25 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
Message-ID: <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>

> On Thursday, July 17, 2014 2:40 PM, Alexander Heger <python at 2sn.net> wrote:

> >>  I don't think it is reasonable to add a new parameter to readline(), 
> because
>>  streams are widely implemented using duck typing -- every implementation
>>  would have to be updated to support this.
> 
> Could the "split" (or splitline) keyword-only parameter instead be
> passed to the open function (and the __init__ of IOBase and be stored
> there)?


Good idea. It's less powerful/flexible, but probably good enough for almost all use cases. (I can't think of any file where I'd need to split part of it on \0 and the rest on \n?) Also, it means you can stick with the normal __iter__ instead of needing a separate iterlines method.

And, since open/__init__/etc. isn't part of the protocol, it's perfectly fine for the builtin open, etc., to be an example or template that's generally worth following if there's no good reason not to do so, rather than a requirement that must be followed. So, if I'm getting file-like objects handed to me by some third-party library or plugin API or whatever, and I need them to be \0-separated, in many cases the problems with resplit won't be an issue so I can just use it as a workaround, and in the remaining cases, I can request that the library/app/whatever add the sep parameter to the next iteration of the API.

So, I retract my original suggestion in favor of this one. And, separately, Guido's idea of adding the helpers (or at least resplit, plus documentation on how to write the other stuff) to the stdlib somewhere.

Thanks.

From guido at python.org  Fri Jul 18 00:37:58 2014
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jul 2014 15:37:58 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405634369.86826.YahooMailNeo@web181006.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <1405634369.86826.YahooMailNeo@web181006.mail.ne1.yahoo.com>
Message-ID: <CAP7+vJKrVguc7N+-=LE9GFfVNzVKgTd1Uqm3cpPbUmoBJ9RbFg@mail.gmail.com>

On Thu, Jul 17, 2014 at 2:59 PM, Andrew Barnert <
abarnert at yahoo.com.dmarc.invalid> wrote:

> On Thursday, July 17, 2014 1:48 PM, Guido van Rossum <guido at python.org>
> wrote:
>
>
> >I think it's fine to add something to stdlib that encapsulates your
> example. (TBD: where?)
>
> Good question about the where.
>
> The resplit function seems like it could be of more general use than just
> this case, but I'm not sure where it belongs. Maybe itertools?
>
> The iter(lambda: f.read(bufsize), b'') part seems too trivial to put
> anywhere, even just as an example in the docs?but given that it probably
> looks like a magic incantation to anyone who's a Python novice (even if
> they're a C or JS or whatever expert), maybe it is worth putting somewhere.
> Maybe io.iterchunks(f, 4096)?
>
> If so, the combination of the two into something like iterlines(f, b'\0')
> seems like it should go right alongside iterchunks.
>
>
> However?
>
>
> >I don't think it is reasonable to add a new parameter to readline()
>
> The problem is that my code has significant problems for many use cases,
> and I don't think they can be solved.
>
> Calling readline (or iterating the file) uses the underlying buffer (and
> stream decoder, for text files), keeps the file pointer in the same place,
> etc. My code doesn't, and no external code can. So, besides being less
> efficient, it leaves the file pointer in the wrong place (imagine using it
> to parse an RFC822 header then read() the body), doesn't properly decode
> files where the separator can be ambiguous with other bytes (try separating
> on '\0' in a UTF-16 file), etc.
>

You can implement a subclass of io.BufferedIOBase that wraps an instance of
io.RawIOBase (I think those are the right classes) where the wrapper adds a
readuntil(separator) method. Whichever thing then wants to read the rest of
the data should call read() on the wrapper object.

This still sounds a lot better to me than asking everyone to add a new
parameter to their readline() (and the implementation).

Maybe if we had more powerful adapters or wrappers so I could just say
> "here's a pre-existing buffer plus a text-file-like object, now wrap that
> up as a real TextIOBase for me" it would be possible to write something
> that worked from outside without these problems, but as things stand, I
> don't see an answer.
>

You probably have to do a separate wrapper for text streams, the types and
buffering implementation are just too different.


> Maybe put resplit in the stdlib, then just give iterlines as a 2-liner
> example (in the itertools recipes, or the file-I/O section of the
> tutorial?) where all these problems can be raised and not answered?
>

(Sorry, in a hurry / terribly distracted.)

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140717/1b0ed2be/attachment.html>

From abarnert at yahoo.com  Fri Jul 18 02:04:00 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 17:04:00 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com> 
Message-ID: <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>

On Thursday, July 17, 2014 3:21 PM, Andrew Barnert <abarnert at yahoo.com> wrote:



>? On Thursday, July 17, 2014 2:40 PM, Alexander Heger <python at 2sn.net> wrote:

>>? Could the "split" (or splitline) keyword-only
>> parameter instead be?passed to the open function?
>> (and the __init__ of IOBase and be stored?there)?
> 
> Good idea. It's less powerful/flexible, but probably
> good enough for almost all use cases. (I can't think
> of any file where I'd need to split part of it on \0
> and the rest on \n?) Also, it means you can stick with
> the normal __iter__ instead of needing a separate
> iterlines method.

It turns out to be even simpler than I expected.

I reused the "newline" parameter of open and TextIOWrapper.__init__, adding a param of the same name to the constructors for BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and FileIO.

For text files, just remove the check for newline being one of the standard values and it all works. For binary files, remove the check for truthy, make open pass each Buffered* constructor newline=(newline if binary else None), make each Buffered* class store it, and change two lines in RawIOBase.readline to use it. And that's it.

(Of course you'd also want to add it to all of the stdlib cases like zipfile.ZipFile.open/zipfile.ExtZipFile.__init__, but there aren't too many of those.)

This means that the buffer underlying a text file with a non-standard newline doesn't automatically have a matching newline. I think that's a good thing ('\r\n' and '\r' would need exceptions for backward compatibility; '\0'.encode('utf-16-le') isn't a very useful thing to split on; etc.), but doing it the other way is almost as easy, and very little code will never care.

From steve at pearwood.info  Fri Jul 18 05:21:00 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 18 Jul 2014 13:21:00 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
Message-ID: <20140718032100.GH9112@ando>

On Thu, Jul 17, 2014 at 05:04:00PM -0700, Andrew Barnert wrote:

> It turns out to be even simpler than I expected.
> 
> I reused the "newline" parameter of open and TextIOWrapper.__init__, 
> adding a param of the same name to the constructors for 
> BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and 
> FileIO.
> 
> For text files, just remove the check for newline being one of the 
> standard values and it all works. For binary files, remove the check 
> for truthy, make open pass each Buffered* constructor newline=(newline 
> if binary else None), make each Buffered* class store it, and change 
> two lines in RawIOBase.readline to use it. And that's it.

All the words are in English, but I have no idea what you're actually 
saying... :-)

You seem to be talking about the implementation of the change, but what 
is the interface? Having made all these changes, how does it effect 
Python code? You have a use-case of splitting on something other than 
the standard newlines, so how does one do that? E.g. suppose I have a 
file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line 
character. How would I iterate over lines in this file?


> This means that the buffer underlying a text file with a non-standard 
> newline doesn't automatically have a matching newline.

I don't understand what you mean by this.



-- 
Steven

From rosuav at gmail.com  Fri Jul 18 05:36:17 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 18 Jul 2014 13:36:17 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <20140718032100.GH9112@ando>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <20140718032100.GH9112@ando>
Message-ID: <CAPTjJmr6Y2Kc0avnrUFvmezjDZkHQfzZXPOcfg_W7MdEUWMTnA@mail.gmail.com>

On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, Jul 17, 2014 at 05:04:00PM -0700, Andrew Barnert wrote:
>
>> It turns out to be even simpler than I expected.
>>
>> I reused the "newline" parameter of open and TextIOWrapper.__init__,
>> adding a param of the same name to the constructors for
>> BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and
>> FileIO.
>>
>> For text files, just remove the check for newline being one of the
>> standard values and it all works. For binary files, remove the check
>> for truthy, make open pass each Buffered* constructor newline=(newline
>> if binary else None), make each Buffered* class store it, and change
>> two lines in RawIOBase.readline to use it. And that's it.
>
> All the words are in English, but I have no idea what you're actually
> saying... :-)
>
> You seem to be talking about the implementation of the change, but what
> is the interface? Having made all these changes, how does it effect
> Python code? You have a use-case of splitting on something other than
> the standard newlines, so how does one do that? E.g. suppose I have a
> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
> character. How would I iterate over lines in this file?

The way I understand it is this:

for line in open("spam.txt", newline="\u0085"):
    process(line)

If that's the case, I would be strongly in favour of this. Nice and
clean, and should break nothing; there'll be special cases for
newline=None and newline='', and the only change is that, instead of a
small number of permitted values ('\n', '\r', '\r\n'), any string (or
maybe any one-character string plus '\r\n'?) would be permitted.

Effectively, it's not "iterate over this file, divided by \0 instead
of newlines", but it's "this file uses the unusual encoding of
newline=\0, now iterate over lines in the file". Seems a smart way to
do it IMO.

ChrisA

From abarnert at yahoo.com  Fri Jul 18 06:18:08 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 21:18:08 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <20140718032100.GH9112@ando>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <20140718032100.GH9112@ando>
Message-ID: <6C155609-776E-482D-954C-DF5F1D2AD962@yahoo.com>

On Jul 17, 2014, at 20:21, Steven D'Aprano <steve at pearwood.info> wrote:

> On Thu, Jul 17, 2014 at 05:04:00PM -0700, Andrew Barnert wrote:
> 
>> It turns out to be even simpler than I expected.
>> 
>> I reused the "newline" parameter of open and TextIOWrapper.__init__, 
>> adding a param of the same name to the constructors for 
>> BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and 
>> FileIO.
>> 
>> For text files, just remove the check for newline being one of the 
>> standard values and it all works. For binary files, remove the check 
>> for truthy, make open pass each Buffered* constructor newline=(newline 
>> if binary else None), make each Buffered* class store it, and change 
>> two lines in RawIOBase.readline to use it. And that's it.
> 
> All the words are in English, but I have no idea what you're actually 
> saying... :-)
> 
> You seem to be talking about the implementation of the change, but what 
> is the interface?

"I reused the newline parameter." 

My mistake was assuming that was so simple, nothing else needed to be said. But that only works if everyone went back and completely read the previous suggestions, which I realize nobody had any good reason to do.

Basically, the only change to the API is that it's no longer an error to pass arbitrary strings (or bytes, for binary mode) for newlines. The rules for how "\0" are handled are identical to the rules for "\r". There's almost nothing else to explain, but not quite--so, like an idiot, I dove into the minor nits in detail, skipping over the main point.

> Having made all these changes, how does it effect 
> Python code?

Existing legal code does not change at all. Some code that used to be an error now does something useful (see below).

> You have a use-case of splitting on something other than 
> the standard newlines, so how does one do that? E.g. suppose I have a 
> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line 
> character. How would I iterate over lines in this file?

with open("spam.txt", newline="\u0085") as f:
    for line in f:
        process(line)

>> This means that the buffer underlying a text file with a non-standard 
>> newline doesn't automatically have a matching newline.
> 
> I don't understand what you mean by this.

If you write this:

with open("spam.txt", newline="\u0085") as f:
    for line in f.buffer:

The bytes you get back will be split on b"\n", not on "\u0085".encode(locale.getdefaultencoding()). The newlines applies only to the text file, not its underlying binary buffer. (This is exactly the same as the current behavior--if you open a file with newline='\r' in 3.4 then iterate f.buffer, it's still going to split on b'\n', not b'\r'.)

From abarnert at yahoo.com  Fri Jul 18 06:23:05 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 21:23:05 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAPTjJmr6Y2Kc0avnrUFvmezjDZkHQfzZXPOcfg_W7MdEUWMTnA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <20140718032100.GH9112@ando>
 <CAPTjJmr6Y2Kc0avnrUFvmezjDZkHQfzZXPOcfg_W7MdEUWMTnA@mail.gmail.com>
Message-ID: <9202C31C-8D17-47CB-AB24-EB6DA7CA4553@yahoo.com>

On Jul 17, 2014, at 20:36, Chris Angelico <rosuav at gmail.com> wrote:

> On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> You seem to be talking about the implementation of the change, but what
>> is the interface? Having made all these changes, how does it effect
>> Python code? You have a use-case of splitting on something other than
>> the standard newlines, so how does one do that? E.g. suppose I have a
>> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
>> character. How would I iterate over lines in this file?
> 
> The way I understand it is this:
> 
> for line in open("spam.txt", newline="\u0085"):
>    process(line)
> 
> If that's the case, I would be strongly in favour of this. Nice and
> clean, and should break nothing; there'll be special cases for
> newline=None and newline='', and the only change is that, instead of a
> small number of permitted values ('\n', '\r', '\r\n'), any string (or
> maybe any one-character string plus '\r\n'?) would be permitted.
> 
> Effectively, it's not "iterate over this file, divided by \0 instead
> of newlines", but it's "this file uses the unusual encoding of
> newline=\0, now iterate over lines in the file". Seems a smart way to
> do it IMO.

Exactly. As soon as Alexander suggested it, I immediately knew it was much better than my original idea.

(Apologies for overestimating the obviousness of that.)



From abarnert at yahoo.com  Fri Jul 18 06:40:11 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 21:40:11 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAP7+vJKrVguc7N+-=LE9GFfVNzVKgTd1Uqm3cpPbUmoBJ9RbFg@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <1405634369.86826.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJKrVguc7N+-=LE9GFfVNzVKgTd1Uqm3cpPbUmoBJ9RbFg@mail.gmail.com>
Message-ID: <03610AED-BF17-42BB-B62D-6D8E007B48EB@yahoo.com>

On Jul 17, 2014, at 15:37, Guido van Rossum <guido at python.org> wrote:

> On Thu, Jul 17, 2014 at 2:59 PM, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
>> >I don't think it is reasonable to add a new parameter to readline()
>> 
>> The problem is that my code has significant problems for many use cases, and I don't think they can be solved.
>> 
>> Calling readline (or iterating the file) uses the underlying buffer (and stream decoder, for text files), keeps the file pointer in the same place, etc. My code doesn't, and no external code can. So, besides being less efficient, it leaves the file pointer in the wrong place (imagine using it to parse an RFC822 header then read() the body), doesn't properly decode files where the separator can be ambiguous with other bytes (try separating on '\0' in a UTF-16 file), etc.
> 
> You can implement a subclass of io.BufferedIOBase that wraps an instance of io.RawIOBase (I think those are the right classes) where the wrapper adds a readuntil(separator) method. Whichever thing then wants to read the rest of the data should call read() on the wrapper object.
> 
> This still sounds a lot better to me than asking everyone to add a new parameter to their readline() (and the implementation).

[snip]

> You probably have to do a separate wrapper for text streams, the types and buffering implementation are just too different.

The problem isn't needing two separate wrappers, it's that the text wrapper if effectively impossible.

For binary files, MyBufferedReader.readuntil is a slightly modified version of _pyio.RawIOBase.readline, which only needs to access the public interface of io.BufferedReader (peek and read).

For text files, however, it needs to access private information from TextIOWrapper that isn't exposed from C to Python. And, unlike BufferedReader, TextIOWrapper has no way to peek ahead, or push data back onto the buffer, or anything else usable as a workaround, so even if you wanted to try to take care of the decoding state problems manually, you can't, except by reading one character at a time.

There are also some minor problems even for binary files (e.g., MyBufferedReader(f.raw) has a different file position from f, so if you switch between them you'll end up skipping part of the file), but these won't affect most use cases; the text file problem is the big one.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140717/18d1dabf/attachment-0001.html>

From guido at python.org  Fri Jul 18 06:47:06 2014
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jul 2014 21:47:06 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <9202C31C-8D17-47CB-AB24-EB6DA7CA4553@yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <20140718032100.GH9112@ando>
 <CAPTjJmr6Y2Kc0avnrUFvmezjDZkHQfzZXPOcfg_W7MdEUWMTnA@mail.gmail.com>
 <9202C31C-8D17-47CB-AB24-EB6DA7CA4553@yahoo.com>
Message-ID: <CAP7+vJKKdoqUmmpeNXWXmNSsgNPt1UGphns_Xm1ZLduP7Ca3og@mail.gmail.com>

Well, I had to look up the newline option for open(), even though I
probably invented it. :-)

Would it still apply only to text files?

On Thursday, July 17, 2014, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid>
wrote:

> On Jul 17, 2014, at 20:36, Chris Angelico <rosuav at gmail.com <javascript:;>>
> wrote:
>
> > On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve at pearwood.info
> <javascript:;>> wrote:
> >> You seem to be talking about the implementation of the change, but what
> >> is the interface? Having made all these changes, how does it effect
> >> Python code? You have a use-case of splitting on something other than
> >> the standard newlines, so how does one do that? E.g. suppose I have a
> >> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
> >> character. How would I iterate over lines in this file?
> >
> > The way I understand it is this:
> >
> > for line in open("spam.txt", newline="\u0085"):
> >    process(line)
> >
> > If that's the case, I would be strongly in favour of this. Nice and
> > clean, and should break nothing; there'll be special cases for
> > newline=None and newline='', and the only change is that, instead of a
> > small number of permitted values ('\n', '\r', '\r\n'), any string (or
> > maybe any one-character string plus '\r\n'?) would be permitted.
> >
> > Effectively, it's not "iterate over this file, divided by \0 instead
> > of newlines", but it's "this file uses the unusual encoding of
> > newline=\0, now iterate over lines in the file". Seems a smart way to
> > do it IMO.
>
> Exactly. As soon as Alexander suggested it, I immediately knew it was much
> better than my original idea.
>
> (Apologies for overestimating the obviousness of that.)
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org <javascript:;>
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (on iPad)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140717/64a6a9e3/attachment.html>

From abarnert at yahoo.com  Fri Jul 18 08:26:28 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 17 Jul 2014 23:26:28 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAP7+vJKKdoqUmmpeNXWXmNSsgNPt1UGphns_Xm1ZLduP7Ca3og@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <20140718032100.GH9112@ando>
 <CAPTjJmr6Y2Kc0avnrUFvmezjDZkHQfzZXPOcfg_W7MdEUWMTnA@mail.gmail.com>
 <9202C31C-8D17-47CB-AB24-EB6DA7CA4553@yahoo.com>
 <CAP7+vJKKdoqUmmpeNXWXmNSsgNPt1UGphns_Xm1ZLduP7Ca3og@mail.gmail.com>
Message-ID: <92919CA1-2052-4B07-97D9-1D8A0757F117@yahoo.com>

On Jul 17, 2014, at 21:47, Guido van Rossum <guido at python.org> wrote:

> Well, I had to look up the newline option for open(), even though I probably invented it. :-)

While we're at it, I think most places in the documentation and docstrings that refer to the parameter, except open itself, call it newlines (e.g., io.IOBase.readline), and as far as I can tell it's been like that from day one, which shows just how much people pay attention to the current feature. :)

> Would it still apply only to text files?

I think it makes sense to apply to binary files as well. Splitting binary files on \0 (or, for that matter, \r\n...) is probably at least as common a use case as text files.

Obviously the special treatment for "" (as a universal-newline-behavior flag) wouldn't carry over to b"" (which might as well just be an error, although I suppose it could also mean to split on every byte, as with bytes.split?). Also, I'm not sure if the write behavior (replace terminal "\n" with newline) should carry over from text to binary, or just ignore newline on write.

Binary files don't need the special-casing for b"" (with text files, that's more a universal-newlines flag than a newline value), and I'm not sure if they need the write behavior or only the read behavior.

> On Thursday, July 17, 2014, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
>> On Jul 17, 2014, at 20:36, Chris Angelico <rosuav at gmail.com> wrote:
>> 
>> > On Fri, Jul 18, 2014 at 1:21 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> >> You seem to be talking about the implementation of the change, but what
>> >> is the interface? Having made all these changes, how does it effect
>> >> Python code? You have a use-case of splitting on something other than
>> >> the standard newlines, so how does one do that? E.g. suppose I have a
>> >> file "spam.txt" which uses NEL (Next Line, U+0085) as the end of line
>> >> character. How would I iterate over lines in this file?
>> >
>> > The way I understand it is this:
>> >
>> > for line in open("spam.txt", newline="\u0085"):
>> >    process(line)
>> >
>> > If that's the case, I would be strongly in favour of this. Nice and
>> > clean, and should break nothing; there'll be special cases for
>> > newline=None and newline='', and the only change is that, instead of a
>> > small number of permitted values ('\n', '\r', '\r\n'), any string (or
>> > maybe any one-character string plus '\r\n'?) would be permitted.
>> >
>> > Effectively, it's not "iterate over this file, divided by \0 instead
>> > of newlines", but it's "this file uses the unusual encoding of
>> > newline=\0, now iterate over lines in the file". Seems a smart way to
>> > do it IMO.
>> 
>> Exactly. As soon as Alexander suggested it, I immediately knew it was much better than my original idea.
>> 
>> (Apologies for overestimating the obviousness of that.)
>> 
>> 
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> 
> 
> -- 
> --Guido van Rossum (on iPad)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140717/7b76fe06/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Fri Jul 18 13:53:48 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Fri, 18 Jul 2014 13:53:48 +0200
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
Message-ID: <lqb1q9$qts$1@ger.gmane.org>

On 07/18/2014 02:04 AM, Andrew Barnert wrote:
> On Thursday, July 17, 2014 3:21 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>
>
>
>>    On Thursday, July 17, 2014 2:40 PM, Alexander Heger <python at 2sn.net> wrote:
>
>>>    Could the "split" (or splitline) keyword-only
>>> parameter instead be passed to the open function
>>> (and the __init__ of IOBase and be stored there)?
>>
>> Good idea. It's less powerful/flexible, but probably
>> good enough for almost all use cases. (I can't think
>> of any file where I'd need to split part of it on \0
>> and the rest on \n?) Also, it means you can stick with
>> the normal __iter__ instead of needing a separate
>> iterlines method.
>
> It turns out to be even simpler than I expected.
>
> I reused the "newline" parameter of open and TextIOWrapper.__init__, adding a param of the same name to the constructors for BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and FileIO.
>
> For text files, just remove the check for newline being one of the standard values and it all works. For binary files, remove the check for truthy, make open pass each Buffered* constructor newline=(newline if binary else None), make each Buffered* class store it, and change two lines in RawIOBase.readline to use it. And that's it.
>

You are not the first one to come up with this idea and suggesting 
solutions. This whole thing has been hanging around on the bug tracker 
as an unresolved issue (started by Nick Coghlan) since almost a decade:

http://bugs.python.org/issue1152248

Ever since discovering it, I've been sticking to the recipe provided by 
Douglas Alan:

http://bugs.python.org/issue1152248#msg109117

Not that I wouldn't like to see this feature to be shipping with Python, 
but it may help to read through all aspects of the problem that have 
been discussed before.

Best,
Wolfgang



From abarnert at yahoo.com  Fri Jul 18 18:43:26 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 18 Jul 2014 09:43:26 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <lqb1q9$qts$1@ger.gmane.org>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
Message-ID: <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>

Before responding to Wolfgang, something that occurred to me overnight: The only insurmountable problem with Guido's suggestion of "just unwrap and rewrap the raw or buffer in a subclass that adds this behavior" is that you can't write such a subclass of TextIOWrapper, because it has no way to either peek at or push back onto the buffer. So... Why not add one of those? 

Pushing back is easier to implement (since it's already there as a private method), but a bit funky, and peeking would mean it works the same way as with buffered binary files. But I'll take a look at the idiomatic way to do similar things in other languages (C stdio, C++ iostreams, etc.), and make sure that peek is actually sensible for TextIOWrapper, before arguing for it.

While we're at it, it might be nice for the peek method to be documented as an (optional, like raw, etc.?) member of the two ABCs instead of just something that one implementation happens to have, and that the mixin code will use if it happens to be present. (Binary readline uses peek if it exists, falls back to byte by byte if not.)

On Jul 18, 2014, at 4:53, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote:

> On 07/18/2014 02:04 AM, Andrew Barnert wrote:
>> On Thursday, July 17, 2014 3:21 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>> 
>> 
>> 
>>>   On Thursday, July 17, 2014 2:40 PM, Alexander Heger <python at 2sn.net> wrote:
>> 
>>>>   Could the "split" (or splitline) keyword-only
>>>> parameter instead be passed to the open function
>>>> (and the __init__ of IOBase and be stored there)?
>>> 
>>> Good idea. It's less powerful/flexible, but probably
>>> good enough for almost all use cases. (I can't think
>>> of any file where I'd need to split part of it on \0
>>> and the rest on \n?) Also, it means you can stick with
>>> the normal __iter__ instead of needing a separate
>>> iterlines method.
>> 
>> It turns out to be even simpler than I expected.
>> 
>> I reused the "newline" parameter of open and TextIOWrapper.__init__, adding a param of the same name to the constructors for BufferedReader, BufferedWriter, BufferedRWPair, BufferedRandom, and FileIO.
>> 
>> For text files, just remove the check for newline being one of the standard values and it all works. For binary files, remove the check for truthy, make open pass each Buffered* constructor newline=(newline if binary else None), make each Buffered* class store it, and change two lines in RawIOBase.readline to use it. And that's it.
> 
> You are not the first one to come up with this idea and suggesting solutions. This whole thing has been hanging around on the bug tracker as an unresolved issue (started by Nick Coghlan) since almost a decade:
> 
> http://bugs.python.org/issue1152248
> 
> Ever since discovering it, I've been sticking to the recipe provided by Douglas Alan:
> 
> http://bugs.python.org/issue1152248#msg109117

Thanks.

Douglas's recipe is effectively the same as my resplit, except less general (since it consumes a file rather than any iterable), and some, but not all, of the limitations of that approach were mentioned. And R. David Murray's hack patch is the basically the same as the text half of my patch. 

The discussion there is also useful, as it raises the similar features in perl, awk, bash, etc.--all of which work by having the user change either a global or something on the file object, rather than putting it in the line-reading code, which reinforces my belief that Alexander's idea of putting the separator value it in the file constructors was right, and my initially putting it in readline or a new readuntil method was wrong.

> Not that I wouldn't like to see this feature to be shipping with Python, but it may help to read through all aspects of the problem that have been discussed before.
> 
> Best,
> Wolfgang
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From ncoghlan at gmail.com  Sat Jul 19 09:10:58 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Jul 2014 03:10:58 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
Message-ID: <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>

On 18 July 2014 12:43, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
> Before responding to Wolfgang, something that occurred to me overnight: The only insurmountable problem with Guido's suggestion of "just unwrap and rewrap the raw or buffer in a subclass that adds this behavior" is that you can't write such a subclass of TextIOWrapper, because it has no way to either peek at or push back onto the buffer. So... Why not add one of those?
>
> Pushing back is easier to implement (since it's already there as a private method), but a bit funky, and peeking would mean it works the same way as with buffered binary files. But I'll take a look at the idiomatic way to do similar things in other languages (C stdio, C++ iostreams, etc.), and make sure that peek is actually sensible for TextIOWrapper, before arguing for it.
>
> While we're at it, it might be nice for the peek method to be documented as an (optional, like raw, etc.?) member of the two ABCs instead of just something that one implementation happens to have, and that the mixin code will use if it happens to be present. (Binary readline uses peek if it exists, falls back to byte by byte if not.)

Slight tangent, but this rewrapping question also arises in the
context of changing encodings on an already open stream. See
http://bugs.python.org/issue15216 for (the gory) details.

> On Jul 18, 2014, at 4:53, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>> You are not the first one to come up with this idea and suggesting solutions. This whole thing has been hanging around on the bug tracker as an unresolved issue (started by Nick Coghlan) since almost a decade:
>>
>> http://bugs.python.org/issue1152248
>>
>> Ever since discovering it, I've been sticking to the recipe provided by Douglas Alan:
>>
>> http://bugs.python.org/issue1152248#msg109117
>
> Thanks.
>
> Douglas's recipe is effectively the same as my resplit, except less general (since it consumes a file rather than any iterable), and some, but not all, of the limitations of that approach were mentioned. And R. David Murray's hack patch is the basically the same as the text half of my patch.
>
> The discussion there is also useful, as it raises the similar features in perl, awk, bash, etc.--all of which work by having the user change either a global or something on the file object, rather than putting it in the line-reading code, which reinforces my belief that Alexander's idea of putting the separator value it in the file constructors was right, and my initially putting it in readline or a new readuntil method was wrong.

I still favour my proposal there to add a separate "readrecords()"
method, rather than reusing the line based iteration methods - lines
and arbitrary records *aren't* the same thing, and I don't think we'd
be doing anybody any favours by conflating them (whether we're
confusing them at the method level or at the constructor argument
level).

While, as an implementation artifact, it may be possible to get this
"easily" by abusing the existing newline parameter, that's likely to
break a lot of assumptions in *other* code, that specifically expects
newlines to refer to actual line endings. A new separate method
cleanly isolates the feature to code that wants to use it, preventing
potentially adverse and hard to debug impacts on unrelated code that
happens to receive a file object with a custom record separator
configured.

With this kind of proposal, it isn't the "what happens when it works?"
cases that worry me - it's the cases where it *fails* and someone is
stuck with figuring out what has gone wrong. A new method fails
cleanly, but changing the semantics of *existing* arguments,
attributes and methods? That doesn't fail cleanly at all, and can also
have far reaching impacts on the correctness of all sorts of
documentation.

Attempting to wedge this functionality into *existing* constructs
means *changing* a lot of expectations that are now well established
in a Python context. By contrast, adding a *new* construct,
specifically for this purpose, means nothing needs to change with
existing constructs, we don't inadvertently introduce even more
obscure corner cases in newline handling, and there's a solid
terminology hook to hang the documentation one (iteration by line vs
iteration by record - and we can also be clear that "line buffered"
really does correspond to iteration by line, and may not be available
for arbitrary record separators).

Providing this feature as a separate method also makes it possible for
the IO ABC's to provide a default implementation (along the lines of
your resplit function), that concrete implementations can optionally
override with something more optimised. Pure ducktyped cases (not
inheriting from the ABCs) will fail with a fairly obvious error
("AttributeError: 'MyCustomFileType' object has no attribute
'readrecords'" rather than something related to unknown parameter
names or illegal argument values), while those that do inherit from
the ABCs will "just work".

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rosuav at gmail.com  Sat Jul 19 09:32:53 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 19 Jul 2014 17:32:53 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
Message-ID: <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>

On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I still favour my proposal there to add a separate "readrecords()"
> method, rather than reusing the line based iteration methods - lines
> and arbitrary records *aren't* the same thing

But they might well be the same thing. Look at all the Unix commands
that usually separate output with \n, but can be told to separate with
\0 instead. If you're reading from something like that, it should be
just as easy to split on \n as on \0.

ChrisA

From ncoghlan at gmail.com  Sat Jul 19 10:18:35 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Jul 2014 04:18:35 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
Message-ID: <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>

On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
> On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I still favour my proposal there to add a separate "readrecords()"
>> method, rather than reusing the line based iteration methods - lines
>> and arbitrary records *aren't* the same thing
>
> But they might well be the same thing. Look at all the Unix commands
> that usually separate output with \n, but can be told to separate with
> \0 instead. If you're reading from something like that, it should be
> just as easy to split on \n as on \0.

Python isn't Unix, and Python has never supported \0 as a "line
ending". Changing the meaning of existing constructs is fraught with
complexity, and should only be done when there is absolutely no
alternative. In this case, there's an alternative: a new method,
specifically for reading arbitrary records.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From steve at pearwood.info  Sat Jul 19 11:01:59 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 19 Jul 2014 19:01:59 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
Message-ID: <20140719090159.GJ9112@ando>

On Sat, Jul 19, 2014 at 04:18:35AM -0400, Nick Coghlan wrote:
> On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
> > On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> I still favour my proposal there to add a separate "readrecords()"
> >> method, rather than reusing the line based iteration methods - lines
> >> and arbitrary records *aren't* the same thing
> >
> > But they might well be the same thing. Look at all the Unix commands
> > that usually separate output with \n, but can be told to separate with
> > \0 instead. If you're reading from something like that, it should be
> > just as easy to split on \n as on \0.
> 
> Python isn't Unix, and Python has never supported \0 as a "line
> ending". Changing the meaning of existing constructs is fraught with
> complexity, and should only be done when there is absolutely no
> alternative. In this case, there's an alternative: a new method,
> specifically for reading arbitrary records.

I don't have an opinion one way or the other, but I don't quite see why 
you're worried about allowing the newline parameter to be set to some 
arbitrary separator. The best I can come up with is a scenario something 
like this:

I open a file with some record-separator

  fp = open(filename, newline="\0")

then pass it to a function:

  spam(fp)

which assumes that each chunk ends with a linefeed:

   assert next(fp).endswith('\n')


But in a case like that, the function is already buggy. I can see at 
least two problems with such an assumption:

- what if universal newlines has been turned off and you're reading
  a file created under (e.g.) classic Mac OS or RISC OS?

- what if the file contains a single line which does not end with an
  end of line character at all?

   open('/tmp/junk', 'wb').write("hello world!")
   next(open('/tmp/junk', 'r'))

Have I missed something?


Although I'm don't mind whether files grow a readrecords() method, or 
re-use the readlines() method, I'm not convinced that API decisions 
should be driven solely by the needs of programs which are already 
buggy.



-- 
Steven

From stephen at xemacs.org  Sat Jul 19 11:06:59 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 19 Jul 2014 18:06:59 +0900
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
Message-ID: <87mwc5k98s.fsf@uwakimon.sk.tsukuba.ac.jp>

Chris Angelico writes:

 > But they might well be the same thing. Look at all the Unix commands
 > that usually separate output with \n, but can be told to separate with
 > \0 instead. If you're reading from something like that, it should be
 > just as easy to split on \n as on \0.

Nick's point is more general, I think, but as a special case consider
a "multiline" record.  What's the right behavior on output from the
application if the newline convention of this particular multiline
differs from that of the rest of the output stream?  IMO this goes
beyond "consenting adults" (YMMV, of course).

Steve

From ncoghlan at gmail.com  Sat Jul 19 11:27:49 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Jul 2014 05:27:49 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <20140719090159.GJ9112@ando>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
Message-ID: <CADiSq7cD-vbBe6OFm8Ko6ZX6dizF=KbKxey7gt5jvhUNX_QhvQ@mail.gmail.com>

On 19 July 2014 05:01, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sat, Jul 19, 2014 at 04:18:35AM -0400, Nick Coghlan wrote:
> But in a case like that, the function is already buggy. I can see at
> least two problems with such an assumption:
>
> - what if universal newlines has been turned off and you're reading
>   a file created under (e.g.) classic Mac OS or RISC OS?

That's exactly the point though - people *do* assume "\n", and we've
gone to great lengths to make that assumption *more correct* (even
though it's still wrong sometimes).

We can't reverse course on that, and expect the outcome to make sense
to *people*. When making use of a configurable line endings feature
breaks (and it will), they're going to be confused, and the docs
likely aren't going to help much.

> - what if the file contains a single line which does not end with an
>   end of line character at all?
>
>    open('/tmp/junk', 'wb').write("hello world!")
>    next(open('/tmp/junk', 'r'))
>
> Have I missed something?
>
>
> Although I'm don't mind whether files grow a readrecords() method, or
> re-use the readlines() method, I'm not convinced that API decisions
> should be driven solely by the needs of programs which are already
> buggy.

It's not being driven by the needs of programs that are already buggy
- my preferences are driven by the fact that line endings and record
separators are *not the same thing*.  Thinking that they are is a
matter of confusing the conceptual data model with the implementation
of the framing at the serialisation layer. If we *do* try to treat
them as the same thing, then we have to go find *every single
reference* to line endings in the documentation and add a caveat about
it being configurable at file object creation time, so it might
actually be based on something completely arbitrary.

Line endings are *already* confusing enough that the "universal
newlines" mechanism was added to make it so that Python level code
could mostly ignore the whole "\n" vs "\r" vs "\r\n" distinction, and
just assume "\n" everywhere.

This is why I'm a fan of keeping things comparatively simple, and just
adding a new method (if we only add an iterator version) or two (if we
add a list version as well) specifically for this use case.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From p.f.moore at gmail.com  Sat Jul 19 11:30:38 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 19 Jul 2014 10:30:38 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <20140719090159.GJ9112@ando>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
Message-ID: <CACac1F8cP+2iqrTs=L6GxSYtfztzP3E0FD6v9zhHnaam1mcTcw@mail.gmail.com>

On 19 July 2014 10:01, Steven D'Aprano <steve at pearwood.info> wrote:
> I open a file with some record-separator
>
>   fp = open(filename, newline="\0")
>
> then pass it to a function:
>
>   spam(fp)
>
> which assumes that each chunk ends with a linefeed:
>
>    assert next(fp).endswith('\n')

I will often do

for line in fp:
    line = line.strip()

to remove the line ending ("record separator"). This fails if you have
an arbitrary separator. And for that matter, how would you remove an
arbitrary separator? Maybe line = line[:-1] works, but what if at some
point people ask for multi-character separators ("\n\n" for "paragraph
separated", for example - ignoring the universal newline complexities
in that).

A splitrecord method still needs a means for code to to remove the
record separator, of course, but the above demonstrates how reusing
line separation could break the assumptions of *current* code.

Paul

From apalala at gmail.com  Sat Jul 19 13:49:58 2014
From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=)
Date: Sat, 19 Jul 2014 07:19:58 -0430
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
Message-ID: <CAN1YFWt-mmGjtRNoa+L+kVqS4p3KdKgFGPc+XtfUqjZvFF=CrA@mail.gmail.com>

On Sat, Jul 19, 2014 at 3:48 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Python isn't Unix, and Python has never supported \0 as a "line
> ending". Changing the meaning of existing constructs is fraught with
> complexity, and should only be done when there is absolutely no
> alternative. In this case, there's an alternative: a new method,
> specifically for reading arbitrary records.
>

"practicality beats purity."

http://legacy.python.org/dev/peps/pep-0020/


-- 
Juancarlo *A?ez*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/caaacb6d/attachment.html>

From antoine at python.org  Sat Jul 19 16:55:43 2014
From: antoine at python.org (Antoine Pitrou)
Date: Sat, 19 Jul 2014 10:55:43 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <20140719090159.GJ9112@ando>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
Message-ID: <lqe0tf$mh3$1@ger.gmane.org>


Le 19/07/2014 05:01, Steven D'Aprano a ?crit :
>
> I open a file with some record-separator
>
>    fp = open(filename, newline="\0")

Hmm... newline="\0" already *looks* wrong. To me, it's a hint that 
you're abusing the API.

The main advantage of it, though, is that you can use iteration in 
addition to the regular readline() (or readrecord()) method.

Regards

Antoine.



From python at mrabarnett.plus.com  Sat Jul 19 18:21:33 2014
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 19 Jul 2014 17:21:33 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
 easier
In-Reply-To: <20140719090159.GJ9112@ando>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
Message-ID: <53CA9B0D.1070405@mrabarnett.plus.com>

On 2014-07-19 10:01, Steven D'Aprano wrote:
[snip]

> - what if universal newlines has been turned off and you're reading
>    a file created under (e.g.) classic Mac OS or RISC OS?
>
[snip]
FTR, the line ending in RISC OS is '\n'.


From guido at python.org  Sat Jul 19 22:05:32 2014
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Jul 2014 13:05:32 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <53CA9B0D.1070405@mrabarnett.plus.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando> <53CA9B0D.1070405@mrabarnett.plus.com>
Message-ID: <CAP7+vJJVojcc0bPJa6kLgdy6t1D3xtfXO-iCDF5y3UTAqh4kLQ@mail.gmail.com>

I don't have time for this thread.

I never meant to suggest anything that would require pushing back data into
the buffer (you must have misread me).

I don't like changing the meaning of the newline argument to open (and it
doesn't solve enough use cases any way).

I personally think it's preposterous to use \0 as a separator for text
files (nothing screams binary data like a null byte :-).

I don't think it's a big deal if a method named readline() returns a record
that doesn't end in a \n character.

I value the equivalence of __next__() and readline().

I still think you should solve this using a wrapper class (that does its
own buffering if necessary, and implements the rest of the stream protocol
for the benefit of other consumers of some of the data).

Once a suitable wrapper class has been implemented as a 3rd party module
and is in common use you may petition to have it added to the standard
library, as a separate module/class/function.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/8efa3ec4/attachment.html>

From abarnert at yahoo.com  Sun Jul 20 01:28:55 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 19 Jul 2014 16:28:55 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com> 
Message-ID: <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>

(replies to multiple messages here)

On Saturday, July 19, 2014 1:19 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:


>On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
>> On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> I still favour my proposal there to add a separate "readrecords()"
>>> method, rather than reusing the line based iteration methods - lines
>>> and arbitrary records *aren't* the same thing
>>
>> But they might well be the same thing. Look at all the Unix commands
>> that usually separate output with \n, but can be told to separate with
>> \0 instead. If you're reading from something like that, it should be
>> just as easy to split on \n as on \0.
>
>Python isn't Unix, and Python has never supported \0 as a "line
>ending".

Well, yeah, but Python is used on Unix, and it's used to write scripts that interoperate with other Unix command-line tools.

For the record, the reason this came up is that someone was trying to use one of my scripts in a pipeline with find -0, and he had no problem adapting the Perl scripts he's using to handle -0 output, but no clue how to do the same with my Python script.?

In general, it's just as easy to write Unix command-line tools in Python as in Perl, and that's a good thing?it means I don't have to use Perl. But as soon as -0 comes into the mix, that's no longer true. And that's a problem.

> Changing the meaning of existing constructs is fraught with
>complexity, and should only be done when there is absolutely no
>alternative. In this case, there's an alternative: a new method,
>specifically for reading arbitrary records.

This was basically my original suggestion, so obviously I don't think it's a terrible idea. But I don't think it's as good.

First, which of these is more readable, easier for novices to figure out how to write, etc.:

? ? with open(path, newline='\0') as f:
? ? ? ? for line in f:
? ? ? ? ? ? handle(line.rstrip('\0'))

? ? with open(path) as f:
? ? ? ? for line in iter(lambda: f.readrecord('\0'), ''):
? ? ? ? ? ? handle(line.rstrip('\0'))

Second, as Guido mentioned at the start of this thread, existing file-like object types (whether they implement BufferedIOBase or TextIOBase, or just duck-type the interfaces) are not going to have the new functionality. Construction has never been part of the interface of the file-like object API; opening a real file has always looked different from opening a member file in a zip archive or making a file-like wrapper around a socket transport or whatever. But using the resulting object has always been the same. Adding a readrecord method or changing the interface readline means that's no longer true.

There might be a good argument for making the change more visible?that is, using a different parameter on the open call instead of reusing the existing newline. (And that's what Alexander originally suggested as an alternative to my readrecord idea.) That way, it's much more obvious that spam.open or eggs.makefile or whatever doesn't support alternate line endings, without having to read its documentation on what newline means. But either way, I think it should go in the open function, not the file-object API.


On Saturday, July 19, 2014 2:28 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> - my preferences are driven by the fact that line endings and record
> separators are *not the same thing*.? Thinking that they are is a
> matter of confusing the conceptual data model with the implementation
> of the framing at the serialisation layer.?

Yes, using lines implicitly as records can lead to confusion?but people actually do that all the time; this isn't a new problem, and it's exactly the same problem with \r\n, or even \n, as with \0. When you open up TextEdit and write a grocery list with one item on each line, those newlines are not part of the items. When you pipe the output of find to a script, the newlines are not part of the filenames. When you pipe the output of find -0 to a script, the \0 terminators are not part of the filenames.

> Line endings are *already* confusing enough that the "universal
> newlines" mechanism was added to make it so that Python level code
> could mostly ignore the whole "\n" vs "\r" vs?
> "\r\n" distinction, and
> just assume "\n" everywhere.

I understand the point here. There are cases where universal newlines let you successfully ignore the confusion rather than dealing with it, and newline='\0' will not be useful in those cases.

But then newline='\r' is also never useful in those cases. The new behavior will be useful in exactly the cases where '\r' already is?no more, but no less.

> This is why I'm a fan of keeping things comparatively simple, and just
> adding a new method (if we only add an iterator version) or two (if we
> add a list version as well) specifically for this use case.

Actually, the obvious new method is neither the iterator version nor the list version, but a single-record version, readrecord. Sometimes you need readline/readrecord, and it's conceptually simpler for the user. And of course the implementation is a lot simpler; you don't need to build a new iterator object that references the file for readrecord the way you do for iterrecords. And finally, if you only have one of the two,?as bad as iter(lambda: f.readrecord('\0'), '') may look to novices, next(f.iterrecords('\0')) would probably be even more confusing.

But we could also add an iterrecords, for two methods.

And as for the list-based version? well, I don't even understand why readlines still exists in 3.x (much less why the tutorial suggests it), so I'd be fine not having a readrecords, but I don't have any real objection.

On Saturday, July 19, 2014 1:06 PM, Guido van Rossum <guido at python.org> wrote:

>I never meant to suggest anything that would require pushing back data into the buffer (you must have misread me).

I get the feeling either there's a much simpler way to wrap a file object that I'm missing, or that you think there is.

In order to do the equivalent of readrecord, you have to do one of three things:

1. Read character by character, which can be incredibly slow.

2. Peek or push back on the buffer, as the io classes' readline methods do.


3. Put another buffer in front of the file, which means you have two objects both sharing the same file but with effective file pointers out of sync. And you have to reproduce all of the file-like-object API methods for your new buffered object (a lot more work, and a lot more to get wrong?effectively, it means you have to write all of BufferedReader or TextIOWrapper, but modified to wrap another buffered file instead of wrapping the lower-level thing).?And no matter how you do it, it's obviously going to be less efficient.

If there's a lighter version of #3 that makes sense, I'm not seeing it. Which is probably a problem with my lack of insight, but I'd appreciate a pointer in the right direction.

>I don't like changing the meaning of the newline argument to open (and it doesn't solve enough use cases any way).


Maybe using a different argument is a better answer. (That's what Alexander suggested originally.)

The reason both I and people on the bug thread suggested using newline instead is because the behavior you want from sep='\0' happens to be identical to the behavior you get from newline='\r', except with '\0' instead of '\r'.

And that's the best argument I have for reusing newline: someone has already worked out and documented all the implications of newline, and people have already learned them, so if we really want the same functionality, it makes sense to reuse it.?

But I realize that argument only goes so far. It wasn't obvious, until I looked into it, that I wanted the exact same functionality.

>I personally think it's preposterous to use \0 as a separator for text files (nothing screams binary data like a null byte :-).

Sure, it would have been a lot better for find and friends to grow a --escape parameter instead of -0, but I think that ship has sailed.

>I don't think it's a big deal if a method named readline() returns a record that doesn't end in a \n character.
>
>I value the equivalence of __next__() and readline().
>
>I still think you should solve this using a wrapper class (that does its own buffering if necessary, and implements the rest of the stream protocol for the benefit of other consumers of some of the data).

Again, I don't see any way to do this sensibly that wouldn't be a whole lot more work than just forking the io package.

But maybe that's the answer: I can write _io2 as a fork of _io with my changes, the same for _pyio2 (for PyPy), and then the only thing left to write is a __main__ for the package that wraps up _io2/_pyio2 in the io ABCs (and re-exports those ABCs).

From ncoghlan at gmail.com  Sun Jul 20 01:49:38 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Jul 2014 19:49:38 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
Message-ID: <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>

On 20 Jul 2014 09:28, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>
> (replies to multiple messages here)
>
> On Saturday, July 19, 2014 1:19 AM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
>
>
> >On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
> >> On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
> >>> I still favour my proposal there to add a separate "readrecords()"
> >>> method, rather than reusing the line based iteration methods - lines
> >>> and arbitrary records *aren't* the same thing
> >>
> >> But they might well be the same thing. Look at all the Unix commands
> >> that usually separate output with \n, but can be told to separate with
> >> \0 instead. If you're reading from something like that, it should be
> >> just as easy to split on \n as on \0.
> >
> >Python isn't Unix, and Python has never supported \0 as a "line
> >ending".
>
> Well, yeah, but Python is used on Unix, and it's used to write scripts
that interoperate with other Unix command-line tools.
>
> For the record, the reason this came up is that someone was trying to use
one of my scripts in a pipeline with find -0, and he had no problem
adapting the Perl scripts he's using to handle -0 output, but no clue how
to do the same with my Python script.
>
> In general, it's just as easy to write Unix command-line tools in Python
as in Perl, and that's a good thing?it means I don't have to use Perl. But
as soon as -0 comes into the mix, that's no longer true. And that's a
problem.

I would find adding NULL to the potential newline set significantly less
objectionable than opening it up to arbitrary character sequences.

Adding a single possible newline character is a much simpler change, and
one likely to have far fewer odd consequences. This is especially so if
specifying NULL as the line separator is only permitted for files opened in
binary mode.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/6295c991/attachment.html>

From rosuav at gmail.com  Sun Jul 20 01:51:26 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Sun, 20 Jul 2014 09:51:26 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
Message-ID: <CAPTjJmq=qNKY68qwConVhFX20THs0_Owf69=jSkCy0nMbpTQHw@mail.gmail.com>

On Sun, Jul 20, 2014 at 9:49 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Adding a single possible newline character is a much simpler change, and one
> likely to have far fewer odd consequences. This is especially so if
> specifying NULL as the line separator is only permitted for files opened in
> binary mode.

U+0000 is a valid Unicode character, so I'd have no objection to, for
instance, splitting a UTF-8 encoded text file on \0.

ChrisA

From ncoghlan at gmail.com  Sun Jul 20 01:56:18 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 19 Jul 2014 19:56:18 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
Message-ID: <CADiSq7ewfVGXLdJ3K=kUNFrD_evbSt_uU6N3oBC6FYZQ2V-yBw@mail.gmail.com>

On 20 Jul 2014 09:49, "Nick Coghlan" <ncoghlan at gmail.com> wrote:
>
>
> On 20 Jul 2014 09:28, "Andrew Barnert" <abarnert at yahoo.com> wrote:
> >
> > (replies to multiple messages here)
> >
> > On Saturday, July 19, 2014 1:19 AM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
> >
> >
> > >On 19 July 2014 03:32, Chris Angelico <rosuav at gmail.com> wrote:
> > >> On Sat, Jul 19, 2014 at 5:10 PM, Nick Coghlan <ncoghlan at gmail.com>
wrote:
> > >>> I still favour my proposal there to add a separate "readrecords()"
> > >>> method, rather than reusing the line based iteration methods - lines
> > >>> and arbitrary records *aren't* the same thing
> > >>
> > >> But they might well be the same thing. Look at all the Unix commands
> > >> that usually separate output with \n, but can be told to separate
with
> > >> \0 instead. If you're reading from something like that, it should be
> > >> just as easy to split on \n as on \0.
> > >
> > >Python isn't Unix, and Python has never supported \0 as a "line
> > >ending".
> >
> > Well, yeah, but Python is used on Unix, and it's used to write scripts
that interoperate with other Unix command-line tools.
> >
> > For the record, the reason this came up is that someone was trying to
use one of my scripts in a pipeline with find -0, and he had no problem
adapting the Perl scripts he's using to handle -0 output, but no clue how
to do the same with my Python script.
> >
> > In general, it's just as easy to write Unix command-line tools in
Python as in Perl, and that's a good thing?it means I don't have to use
Perl. But as soon as -0 comes into the mix, that's no longer true. And
that's a problem.
>
> I would find adding NULL to the potential newline set significantly less
objectionable than opening it up to arbitrary character sequences.
>
> Adding a single possible newline character is a much simpler change, and
one likely to have far fewer odd consequences. This is especially so if
specifying NULL as the line separator is only permitted for files opened in
binary mode.

Also, the interoperability argument is a good one, as is the analogy with
'\r'. Since this does end up touching the open() builtin and the core IO
abstractions, it will need a PEP.

As far as implementation goes, I suspect a RecordIOWrapper layered IO model
inspired by the approach used for TextIOWrapper may make sense.

Cheers,
Nick.

>
> Cheers,
> Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/c57e5251/attachment.html>

From abarnert at yahoo.com  Sun Jul 20 02:57:14 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 19 Jul 2014 17:57:14 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>	<CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>	<CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>	<1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>	<1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>	<lqb1q9$qts$1@ger.gmane.org>	<EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>	<CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>	<CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>	<CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>	<1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
Message-ID: <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>

On Saturday, July 19, 2014 4:49 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>On 20 Jul 2014 09:28, "Andrew Barnert" <abarnert at yahoo.com> wrote:


>> In general, it's just as easy to write Unix command-line tools in Python as in Perl, and that's a good thing?it means I don't have to use Perl. But as soon as -0 comes into the mix, that's no longer true. And that's a problem.

>I would find adding NULL to the potential newline set significantly less objectionable than opening it up to arbitrary character sequences.


>Adding a single possible newline character is a much simpler change, and one likely to have far fewer odd consequences. This is especially so if specifying NULL as the line separator is only permitted for files opened in binary mode.


But newline is only permitted for text mode.?Are you suggesting that we add newline to binary mode, but the only allowed values are NULL (current behavior) and \0, while on text files the list of allowed values stays the same as today?

Also, would you want the same semantics for newline='\0' on binary files that newline='\r' has on text files (including newline remapping on write)?

And I'm still not sure why you think this shouldn't be allowed in text mode in the first place (especially given that you suggested the same thing for text files _only_ a few years ago).

The output of file is a list of newline-separated or \0-separated filenames, in the filesystem's encoding. Why should I be able to handle the first as a text file, but have to handle the second as a binary file and then manually decode each line?


You could argue that file -0 isn't really separating Unicode filenames with U+0000, but separating UTF-8 or Latin-1 or whatever filenames with \x00, and it's just a coincidence that they happen to match up. But it really isn't just a coincidence; it was an intentional design decision for Unicode (and UTF-8, and Latin-1) that the ASCII control characters map in the obvious way, and one that many tools and scripts take advantage of, so why shouldn't tools and scripts written in Python be able to take advantage of it?

From ncoghlan at gmail.com  Sun Jul 20 03:23:56 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jul 2014 11:23:56 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
Message-ID: <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>

On 20 July 2014 10:57, Andrew Barnert <abarnert at yahoo.com> wrote:
> On Saturday, July 19, 2014 4:49 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>>On 20 Jul 2014 09:28, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>
>
>>> In general, it's just as easy to write Unix command-line tools in Python as in Perl, and that's a good thing?it means I don't have to use Perl. But as soon as -0 comes into the mix, that's no longer true. And that's a problem.
>
>>I would find adding NULL to the potential newline set significantly less objectionable than opening it up to arbitrary character sequences.
>
>
>>Adding a single possible newline character is a much simpler change, and one likely to have far fewer odd consequences. This is especially so if specifying NULL as the line separator is only permitted for files opened in binary mode.
>
>
> But newline is only permitted for text mode. Are you suggesting that we add newline to binary mode, but the only allowed values are NULL (current behavior) and \0, while on text files the list of allowed values stays the same as today?

Actually, I temporarily forgot that newline was only handled at the
TextIOWrapper layer. All the more reason for a PEP that clearly lays
out the status quo (both Python's own newline handling and the "-0"
option for various UNIX utilities, and the way that is handled in
other scripting langauges), and discusses the various options for
dealing with it (new RecordIOWrapper class with a new "open"
parameter, new methods on IO clases, new semantics on the existing
TextIOWrapper class).

If the description of the use cases is clear enough, then the "right
answer" amongst the presented alternatives (which includes "don't
change anything") may be obvious. At present, I'm genuinely unclear on
why someone would ever want to pass the "-0" option to the other UNIX
utilities, which then makes it very difficult to have a sensible
discussion on how we should address that use case in Python.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From rosuav at gmail.com  Sun Jul 20 03:31:10 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Sun, 20 Jul 2014 11:31:10 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
Message-ID: <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>

On Sun, Jul 20, 2014 at 11:23 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> At present, I'm genuinely unclear on
> why someone would ever want to pass the "-0" option to the other UNIX
> utilities, which then makes it very difficult to have a sensible
> discussion on how we should address that use case in Python.

That one's easy. What happens if you use 'find' to list files, and
those files might have \n in their names? You need another sep.

ChrisA

From ncoghlan at gmail.com  Sun Jul 20 03:40:25 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jul 2014 11:40:25 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
Message-ID: <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>

On 20 July 2014 11:31, Chris Angelico <rosuav at gmail.com> wrote:
> On Sun, Jul 20, 2014 at 11:23 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> At present, I'm genuinely unclear on
>> why someone would ever want to pass the "-0" option to the other UNIX
>> utilities, which then makes it very difficult to have a sensible
>> discussion on how we should address that use case in Python.
>
> That one's easy. What happens if you use 'find' to list files, and
> those files might have \n in their names? You need another sep.

Yes, but having a newline in a filename is sufficiently weird that I
find it hard to imagine a scenario where "fix the filenames" isn't a
better answer. Hence why I think the PEP needs to explain why the UNIX
utilities considered this use case sufficiently non-obscure to add
explicit support for it, rather than just assuming that the
obviousness of the use case can be taken for granted.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From abarnert at yahoo.com  Sun Jul 20 05:58:58 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 19 Jul 2014 20:58:58 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should
	be	easier
In-Reply-To: <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
Message-ID: <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>

On Saturday, July 19, 2014 6:42 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 20 July 2014 11:31, Chris Angelico <rosuav at gmail.com> wrote:
>>??On Sun, Jul 20, 2014 at 11:23 AM, Nick Coghlan <ncoghlan at gmail.com> 
> wrote:
>>>??At present, I'm genuinely unclear on
>>>??why someone would ever want to pass the "-0" option to the 
>>> other UNIX
>>>??utilities, which then makes it very difficult to have a sensible
>>>??discussion on how we should address that use case in Python.
>> 
>>??That one's easy. What happens if you use 'find' to list files, 
>> and
>> ?those files might have \n in their names? You need another sep.
> 
> Yes, but having a newline in a filename is sufficiently weird that I
> find it hard to imagine a scenario where "fix the filenames" isn't 
> a
> better answer. Hence why I think the PEP needs to explain why the UNIX
> utilities considered this use case sufficiently non-obscure to add
> explicit support for it, rather than just assuming that the
> obviousness of the use case can be taken for granted.


First, why is it so odd to have newlines in filenames? It used to be pretty common on Classic Mac. Sure, they're not too common nowadays, but that's because they're illegal on DOS/Windows, and because the shell on Unix systems makes them a pain to deal with,?not because there's something inherently nonsensical about the idea, any more than filenames with spaces or non-ASCII characters or >255 length.


Second, "fix the filenames" is almost _never_ a better answer. If you're publishing a program for other people to use, you want to document that it won't work on some perfectly good files, and close their bugs as "Not a bug, rename your files if you want to use my software"??If the files are on a read-only filesystem or a slow tape backup, you really want to copy the entire filesystem over just so you can run a script on it?

Also, even if "fix the filenames" were the right answer, you need to write a tool to do that, and why shouldn't it be possible to use Python for that tool? (In fact, one of the scripts I wanted this feature for is a replacement for the traditional rename tool (http://plasmasturm.org/code/rename/). I mainly wanted to let people use regular expressions without letting them run arbitrary Perl code, as rename -e does, but also, I couldn't figure out how to rename "foo" to "Foo" on a case-preserving-but-insensitive filesystem in Perl, and I know how to do it in Python.)

At any rate, there are decades of tradition behind using -print0, and that's not going to change just because Python isn't as good as other languages at dealing with it.?The GNU find documentation (http://linux.die.net/man/1/find) explicitly recommends, in multiple places, using -print0 instead of -print whenever possible. (For example, in the summary near the top, "If no expression is given, the expression -print is used (but you should probably consider using -print0 instead, anyway).")


And part of the reason for that is that many other tools, like xargs, split on any whitespace, not on newlines, if not given the -0 argument. Fortunately, all of those tools know how to handle backslash escapes, but unfortunately, find doesn't know how to emit them. (Actually,?frustratingly, both BSD and SysV find have the code to do it, but not in a way you can use here.)?So, if you're writing a script that uses find and might get piped to anything that handles input like xargs, you have to use -print0.

And that means, if you're writing a tool that might get find piped to it, you have to handle -print0, even if you're pretty sure nobody will ever have newlines for you to deal with, because they're probably going to want to use -print0 anyway, rather than figure out how your tool deals with other whitespace.

From greg.ewing at canterbury.ac.nz  Sun Jul 20 06:16:54 2014
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 20 Jul 2014 16:16:54 +1200
Subject: [Python-ideas] Iterating non-newline-separated files should be
 easier
In-Reply-To: <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
Message-ID: <53CB42B6.9050100@canterbury.ac.nz>

Nick Coghlan wrote:
> having a newline in a filename is sufficiently weird that I
> find it hard to imagine a scenario where "fix the filenames" isn't a
> better answer.

In Classic MacOS, the way you gave a folder an icon
was to put it in a hidden file called "Icon\r".

-- 
Greg

From greg.ewing at canterbury.ac.nz  Sun Jul 20 06:03:20 2014
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 20 Jul 2014 16:03:20 +1200
Subject: [Python-ideas] Iterating non-newline-separated files should be
 easier
In-Reply-To: <CACac1F8cP+2iqrTs=L6GxSYtfztzP3E0FD6v9zhHnaam1mcTcw@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
 <CACac1F8cP+2iqrTs=L6GxSYtfztzP3E0FD6v9zhHnaam1mcTcw@mail.gmail.com>
Message-ID: <53CB3F88.9090709@canterbury.ac.nz>

Paul Moore wrote:
> And for that matter, how would you remove an
> arbitrary separator? Maybe line = line[:-1] works, but what if at some
> point people ask for multi-character separators

If the newline mechanism is re-used, it would
convert whatever separator is used into '\n'.

-- 
Greg

From ncoghlan at gmail.com  Sun Jul 20 07:00:15 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jul 2014 15:00:15 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
Message-ID: <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>

On 20 July 2014 13:58, Andrew Barnert <abarnert at yahoo.com> wrote:
> First, why is it so odd to have newlines in filenames? It used to be pretty common on Classic Mac. Sure, they're not too common nowadays, but that's because they're illegal on DOS/Windows, and because the shell on Unix systems makes them a pain to deal with, not because there's something inherently nonsensical about the idea, any more than filenames with spaces or non-ASCII characters or >255 length.

You answered your own question: because DOS/Windows make them illegal,
and the Unix shell isn't fond of them either. I was a DOS/Windows user
for more than a decade before switching to Linux for personal use, and
in a decade of using Linux (and even going to work for a Linux
vendor), I've never encountered a filename with a newline in it. Thus
the idea that anyone *would* do such a thing, and that it would be
prevalent enough for UNIX tools to include a workaround in programs
that normally produce newline separated output is an entirely novel
concept for me. Any such file I encountered *would* be an outlier, and
I'd likely be in a position to get the offending filename fixed rather
than changing any data processing pipelines (whether written in Python
or not) to tolerate newlines in filenames (since the cost differential
between fixing one filename vs updating the data processing pipelines
would be enormous).

However, note that my attitude changed significantly once you
clarified the use case - it's clear that there *is* a use case, it's
just one that's outside my own personal experience. That's one of the
things the PEP process is for - to explain such use cases to folks
that haven't personally encountered them, and then explain why the
proposed solution addresses the use case in a way that makes sense for
the domains where the use case arises. The recent matrix
multiplication PEP was an exemplary example of the breed.

That's what I'm asking for here: a PEP that makes sense to someone
like me for whom the idea of putting a newline in a filename is
completely alien. Yes, it's technically permitted by the underlying
operating system APIs on POSIX systems, but all the affordances at
both the console and GUI level suggest "no newlines allowed". If
you're coming from a DOS/Windows background (as I did), then the idea
that a newline is technically a permitted filename character may never
even occur to you (it certainly hadn't to me, and I'd never previously
come across anything to challenge that assumption).

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From abarnert at yahoo.com  Sun Jul 20 07:02:03 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 19 Jul 2014 22:02:03 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <53CB3F88.9090709@canterbury.ac.nz>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
 <CACac1F8cP+2iqrTs=L6GxSYtfztzP3E0FD6v9zhHnaam1mcTcw@mail.gmail.com>
 <53CB3F88.9090709@canterbury.ac.nz>
Message-ID: <1405832523.20539.YahooMailNeo@web181004.mail.ne1.yahoo.com>

On Saturday, July 19, 2014 9:42 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

> > Paul Moore wrote:
>>  And for that matter, how would you remove an
>>  arbitrary separator? Maybe line = line[:-1] works, but what if at some
>>  point people ask for multi-character separators

You already can't use line[:-1] today, because '\r\n' is already a valid value, and always has been.

And however people deal with newline='\r\n' will work for any crazy separator you can think of. Maybe line[:-len(nl)]. Maybe line.rstrip(nl) if it's appropriate (it isn't always, either for \r\n or for some arbitrary separator).

> If the newline mechanism is re-used, it would

> convert whatever separator is used into '\n'.


No it wouldn't.

https://docs.python.org/3/library/io.html#io.TextIOWrapper

> When reading input from the stream, if newline is None, universal newlines mode is enabled? If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.

So, making '\0' a legal value just means the '\0' line endings will be returned to the caller untranslated.

Also, remember that binary files don't do universal newline translation ever, so just letting you change the separator there wouldn't add translation.


Of course both of those could be changed as well (although with what interface, I'm not sure?), but I don't think they should be.

From guido at python.org  Sun Jul 20 07:45:04 2014
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Jul 2014 22:45:04 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405832523.20539.YahooMailNeo@web181004.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
 <CACac1F8cP+2iqrTs=L6GxSYtfztzP3E0FD6v9zhHnaam1mcTcw@mail.gmail.com>
 <53CB3F88.9090709@canterbury.ac.nz>
 <1405832523.20539.YahooMailNeo@web181004.mail.ne1.yahoo.com>
Message-ID: <CAP7+vJLJGwGm0j1kwAsuj2dZaBA8YS--WX0a=tSiVpjkRc_gcA@mail.gmail.com>

If and when something is decided in this thread, can someone summarize it
to me? I don't have time to read all the lengthy arguments but I do care
about the outcome.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/4fa6ca5d/attachment.html>

From mertz at gnosis.cx  Sun Jul 20 07:58:53 2014
From: mertz at gnosis.cx (David Mertz)
Date: Sat, 19 Jul 2014 22:58:53 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
Message-ID: <CAEbHw4aGB-PvMV+HJnqHLuNzVnF5yCOmWff-bV0zdZiNJ81A2A@mail.gmail.com>

The pattern I use, by far, most often with the -0 option is:

    find $path -print0 | xargs -0 some_command

Embedding a '\n' in a filename might be weird, but having whitespace in
general (i.e. spaces) really isn't uncommon.  However, in this case it
doesn't really seem to matter if some_command is some_command.py.  But I
still think the null byte special delimiter is plausible for similar
pipelines.


On Sat, Jul 19, 2014 at 6:40 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 20 July 2014 11:31, Chris Angelico <rosuav at gmail.com> wrote:
> > On Sun, Jul 20, 2014 at 11:23 AM, Nick Coghlan <ncoghlan at gmail.com>
> wrote:
> >> At present, I'm genuinely unclear on
> >> why someone would ever want to pass the "-0" option to the other UNIX
> >> utilities, which then makes it very difficult to have a sensible
> >> discussion on how we should address that use case in Python.
> >
> > That one's easy. What happens if you use 'find' to list files, and
> > those files might have \n in their names? You need another sep.
>
> Yes, but having a newline in a filename is sufficiently weird that I
> find it hard to imagine a scenario where "fix the filenames" isn't a
> better answer. Hence why I think the PEP needs to explain why the UNIX
> utilities considered this use case sufficiently non-obscure to add
> explicit support for it, rather than just assuming that the
> obviousness of the use case can be taken for granted.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140719/fd79f00c/attachment-0001.html>

From wichert at wiggy.net  Sun Jul 20 09:50:10 2014
From: wichert at wiggy.net (Wichert Akkerman)
Date: Sun, 20 Jul 2014 09:50:10 +0200
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAP7+vJJVojcc0bPJa6kLgdy6t1D3xtfXO-iCDF5y3UTAqh4kLQ@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando> <53CA9B0D.1070405@mrabarnett.plus.com>
 <CAP7+vJJVojcc0bPJa6kLgdy6t1D3xtfXO-iCDF5y3UTAqh4kLQ@mail.gmail.com>
Message-ID: <F465F3B1-26FD-4E88-A944-E934AFDBB0EE@wiggy.net>


> On 19 Jul 2014, at 22:05, Guido van Rossum <guido at python.org> wrote:
> 
> I don't have time for this thread.
> 
> I never meant to suggest anything that would require pushing back data into the buffer (you must have misread me).
> 
> I don't like changing the meaning of the newline argument to open (and it doesn't solve enough use cases any way).

I see another problem with doing this by modifying the open() call: it does not work for filehandles creates using other methods such as pipe() or socket(), either used directly or via subprocess. There are have real-world examples of situations where that is very useful. One of them was even mentioned in this discussion: processing the output of find -0. 

Wichert.


From wichert at wiggy.net  Sun Jul 20 09:58:44 2014
From: wichert at wiggy.net (Wichert Akkerman)
Date: Sun, 20 Jul 2014 09:58:44 +0200
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>
Message-ID: <29BD9830-94CE-4C46-8BC9-7AB83A9DFBDD@wiggy.net>


> On 20 Jul 2014, at 03:40, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> On 20 July 2014 11:31, Chris Angelico <rosuav at gmail.com> wrote:
>> On Sun, Jul 20, 2014 at 11:23 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> At present, I'm genuinely unclear on
>>> why someone would ever want to pass the "-0" option to the other UNIX
>>> utilities, which then makes it very difficult to have a sensible
>>> discussion on how we should address that use case in Python.
>> 
>> That one's easy. What happens if you use 'find' to list files, and
>> those files might have \n in their names? You need another sep.
> 
> Yes, but having a newline in a filename is sufficiently weird that I
> find it hard to imagine a scenario where "fix the filenames" isn't a
> better answer.

Because you are likely to have no control af all over what people do with filenames. Since, on POSIX at least, filenames are allowed to contain all characters other than NUL and / you must be able to deal with that. Similar to how you must also be able to deal with a mixture of filenames using different encodings or even pure binary names.

Wichert.


From wolfgang.maier at biologie.uni-freiburg.de  Sun Jul 20 12:41:29 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Sun, 20 Jul 2014 12:41:29 +0200
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
Message-ID: <53CB9CD9.4060404@biologie.uni-freiburg.de>

On 19.07.2014 09:10, Nick Coghlan wrote:
>
> I still favour my proposal there to add a separate "readrecords()"
> method, rather than reusing the line based iteration methods - lines
> and arbitrary records *aren't* the same thing, and I don't think we'd
> be doing anybody any favours by conflating them (whether we're
> confusing them at the method level or at the constructor argument
> level).
>

Thinking about possible use-cases for my own work, made me realize one 
thing:
At least for text files, the distinction between records and lines, in 
practical terms, is that records may have *internal structure based on 
newline characters*, while lines are just lines.

If a future readrecords() method would return the record as a StringIO 
or BytesIO object, this would allow nested reading of files as lines 
(with full newline processing) within records:

for record in infile.readrecords():
     for line in record:
         do_something()

For me, that sort of feature is a more common requirement than being 
able to retrieve single lines terminated by something else than newline 
characters.
Maybe though, it's possible to have both, a readrecords method like the 
one above and an extended set of "newline" tokens that can be passed to 
open (at least allowing "\0" seems to make sense).

Best,
Wolfgang



From abarnert at yahoo.com  Sun Jul 20 13:53:01 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 20 Jul 2014 04:53:01 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <F465F3B1-26FD-4E88-A944-E934AFDBB0EE@wiggy.net>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando> <53CA9B0D.1070405@mrabarnett.plus.com>
 <CAP7+vJJVojcc0bPJa6kLgdy6t1D3xtfXO-iCDF5y3UTAqh4kLQ@mail.gmail.com>
 <F465F3B1-26FD-4E88-A944-E934AFDBB0EE@wiggy.net>
Message-ID: <D6C964AF-72F7-43CE-8E89-97743C2010A0@yahoo.com>

On Jul 20, 2014, at 0:50, Wichert Akkerman <wichert at wiggy.net> wrote:

> 
>> On 19 Jul 2014, at 22:05, Guido van Rossum <guido at python.org> wrote:
>> 
>> I don't have time for this thread.
>> 
>> I never meant to suggest anything that would require pushing back data into the buffer (you must have misread me).
>> 
>> I don't like changing the meaning of the newline argument to open (and it doesn't solve enough use cases any way).
> 
> I see another problem with doing this by modifying the open() call: it does not work for filehandles creates using other methods such as pipe() or socket(), either used directly or via subprocess. There are have real-world examples of situations where that is very useful.

A socket() is not a python file object, doesn't have a similar API, and doesn't have a readline method.

The result of calling socket.makefile, on the other hand, is a file object--and it's created by calling open.* And I'm pretty sure socket.makefile already takes a newline argument and just passes it along, in which case it will magically work with no changes at all.**

IIRC, os.pipe() just returns a pair of fds (integers), not a file object at all. It's up to you to wrap that in a file object if you want to--which you do by passing it to the open function.

So, neither of your objections works.

There are some better examples you could have raised, however. For example, a bz2.BzipFile is created with bz2.open. And, while the file delegates to a BufferedReader or TextIOWrapper, bz2.open almost certainly validates its inputs and won't pass newline on to the BufferedReader in binary mode. So, it would have to be changed to get the benefit.

However, given that there's no way to magically make every file-like object anyone has ever written automatically grow this new functionality, having the API change on the constructors, which are not part of any API and not consistent, is better than having it on the readline method. Think about where you'd get the error in each case: before even writing your code, when you look up how BzipFile instances are created and see there's no way to pass a newline argument, or deep in your code when you're using a file object that came from who knows where and it's readline method doesn't like the standard, documented newline argument?


* Or maybe it's created by constructing a BufferedReader, BufferedWriter, BufferedRandom, or TextIOWrapper directly. I don't remember off hand. But it doesn't matter, because the suggestion is to put the new parameter in those constructors, and make open forward to them, so whether makefile calls them directly or via open, it gets the same effect.

** Unless it validates the arguments before passing them along. I looked over a few stdlib classes, and there was at least one that unnecessarily does the same validation open is going to do anyway, so obviously that needs to be removed before the class magically benefits.


In some cases (like tempfile.NamedTemporaryFile), even that isn't necessary, because the implementation just passes through all  **kwargs that it doesn't want to handle to the open or constructor call.

From abarnert at yahoo.com  Sun Jul 20 13:56:28 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 20 Jul 2014 04:56:28 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CAP7+vJLJGwGm0j1kwAsuj2dZaBA8YS--WX0a=tSiVpjkRc_gcA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando>
 <CACac1F8cP+2iqrTs=L6GxSYtfztzP3E0FD6v9zhHnaam1mcTcw@mail.gmail.com>
 <53CB3F88.9090709@canterbury.ac.nz>
 <1405832523.20539.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <CAP7+vJLJGwGm0j1kwAsuj2dZaBA8YS--WX0a=tSiVpjkRc_gcA@mail.gmail.com>
Message-ID: <B18EE2BA-38C5-49A0-AE89-C6063F92221C@yahoo.com>

Per Nick's suggestion, I will write up a draft PEP, and link it to issue #1152248, which should be a lot easier to follow. If you want to wait until the first round of discussion and the corresponding update to the PEP before checking in, I'll make sure it's obvious when that's happened.

Sent from a random iPhone

On Jul 19, 2014, at 22:45, Guido van Rossum <guido at python.org> wrote:

> If and when something is decided in this thread, can someone summarize it to me? I don't have time to read all the lengthy arguments but I do care about the outcome.
> 
> -- 
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140720/aa7c375a/attachment.html>

From p.f.moore at gmail.com  Sun Jul 20 15:42:20 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 20 Jul 2014 14:42:20 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <D6C964AF-72F7-43CE-8E89-97743C2010A0@yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando> <53CA9B0D.1070405@mrabarnett.plus.com>
 <CAP7+vJJVojcc0bPJa6kLgdy6t1D3xtfXO-iCDF5y3UTAqh4kLQ@mail.gmail.com>
 <F465F3B1-26FD-4E88-A944-E934AFDBB0EE@wiggy.net>
 <D6C964AF-72F7-43CE-8E89-97743C2010A0@yahoo.com>
Message-ID: <CACac1F8pvGB-0SjcV2eLcYD=WSt9uALZ8mu0=h+=_a2haCJfwA@mail.gmail.com>

On 20 July 2014 12:53, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
> There are some better examples you could have raised, however. For example, a bz2.BzipFile is created
> with bz2.open. And, while the file delegates to a BufferedReader or TextIOWrapper, bz2.open almost
> certainly validates its inputs and won't pass newline on to the BufferedReader in binary mode.
> So, it would have to be changed to get the benefit.

The most significant example is one which has been mentioned, but you
may have missed. The motivation for this proposal is to interoperate
with the -0 flag on things like the unix find command. But that is
typically used in a pipe, which means your Python program will likely
receive \0 terminated records via sys.stdin. And sys.stdin is already
opened for you - you do not have the option to specify a newline
argument.

In actual fact, I can't think of a good example (either from my own
experience, or mentioned in this thread) where I'd expect to be
reading \0-terminated records from anything *except* sys.stdin.

Paul

From clint.hepner at gmail.com  Sun Jul 20 17:11:25 2014
From: clint.hepner at gmail.com (Clint Hepner)
Date: Sun, 20 Jul 2014 11:11:25 -0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CACac1F8pvGB-0SjcV2eLcYD=WSt9uALZ8mu0=h+=_a2haCJfwA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <20140719090159.GJ9112@ando> <53CA9B0D.1070405@mrabarnett.plus.com>
 <CAP7+vJJVojcc0bPJa6kLgdy6t1D3xtfXO-iCDF5y3UTAqh4kLQ@mail.gmail.com>
 <F465F3B1-26FD-4E88-A944-E934AFDBB0EE@wiggy.net>
 <D6C964AF-72F7-43CE-8E89-97743C2010A0@yahoo.com>
 <CACac1F8pvGB-0SjcV2eLcYD=WSt9uALZ8mu0=h+=_a2haCJfwA@mail.gmail.com>
Message-ID: <2592A06E-DFFD-420A-AD13-5755B8B5BE61@gmail.com>



--
Clint

> On Jul 20, 2014, at 9:42 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> 
> In actual fact, I can't think of a good example (either from my own
> experience, or mentioned in this thread) where I'd expect to be
> reading \0-terminated records from anything *except* sys.stdin.

Named pipes and whatever is used to implement process substitution ( < <(find ... -0) ) come to mind. 

From ram.rachum at gmail.com  Mon Jul 21 00:06:33 2014
From: ram.rachum at gmail.com (Ram Rachum)
Date: Sun, 20 Jul 2014 15:06:33 -0700 (PDT)
Subject: [Python-ideas] Changing `Sequence.__contains__`
Message-ID: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>

Why does the default `Sequence.__contains__` iterate through the items 
rather than use `.index`, which may sometimes be more efficient?

I suggest an implementation like this: 

    def __contains__(self, i):
        try: self.index(i)
        except ValueError: return False
        else: return True
        
What do you think? 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140720/571f95d5/attachment.html>

From abarnert at yahoo.com  Mon Jul 21 02:41:32 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 20 Jul 2014 17:41:32 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>	<CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>	<CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>	<1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>	<1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>	<lqb1q9$qts$1@ger.gmane.org>	<EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>	<CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>	<CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>	<CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>	<1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>	<CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>	<1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>	<CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>	<CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>	<CADiSq7cQSKgSi5ooa7NtWUOw4pDVhLqPye1oB6-5nDc3y7uejg@mail.gmail.com>	<1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com> 
Message-ID: <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>

On Saturday, July 19, 2014 10:00 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> That's one of the
> things the PEP process is for - to explain such use cases to folks
> that haven't personally encountered them, and then explain why the
> proposed solution addresses the use case in a way that makes sense for
> the domains where the use case arises.

OK, I wrote up a draft PEP, and attached it to the bug (if that's not a good thing to do, apologies); you can find it at http://bugs.python.org/file36008/pep-newline.txt

It's probably a lot more detailed than necessary in many areas, but I figured it was better to include too much than to leave things ambiguous; after I know which parts are not contentious, I can strip it down in the next revision.

Meanwhile, while writing it, and re-reading Guido's replies in this thread, I decided to come back to the alternative idea of exposing text files' buffers just like binary files' buffers. If done properly, that would make it much easier (still not trivial, but much easier) for users to just implement the readrecord functionality on their own, or for someone to package it up on PyPI. And?I don't think the idea is as radical as it sounded at first, so I don't want it to be dismissed out of hand. So, also see?http://bugs.python.org/file36009/pep-peek.txt

Finally, writing this up made me recognize a couple of minor problems with the patch I'd been writing, and I don't think I have time to clean it up and write relevant tests now, so I might not be able to upload a useful patch until next weekend. Hopefully people can still discuss the PEP without a patch to play with.

From steve at pearwood.info  Mon Jul 21 03:41:43 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 21 Jul 2014 11:41:43 +1000
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
Message-ID: <20140721014142.GL9112@ando>

On Sun, Jul 20, 2014 at 03:06:33PM -0700, Ram Rachum wrote:

> Why does the default `Sequence.__contains__` iterate through the items 
> rather than use `.index`, which may sometimes be more efficient?

Because having an index() method is not a requirement to be a sequence. 
It is optional. The implementation for Sequence.__contains__ which makes 
the least assumptions about the class is to iterate over the items.


> I suggest an implementation like this: 
> 
>     def __contains__(self, i):
>         try: self.index(i)
>         except ValueError: return False
>         else: return True
>         
> What do you think? 

That now means that sequence types will have to define an index method 
in order to be a sequence. Not only that, but the index method has to 
follow a standard API, which not all sequence types may do.

This would be marginally better:


    def __contains__(self, obj):
        try: 
            index = type(self).index
        except AttributeError:
            for o in self:
                if o is obj or o == obj:
                    return True
            return False
        else:
            try:
                index(obj)
            except ValueError:
                return False
            else:
                return True


but it has at two problems I can see:

- it's not backwards compatible with sequence types which may already 
define an index attribute which does something different, e.g.:

    class Book:
        def index(self):
            # return the index of the book
        def __getitem__(self, n):
            # return page n

- for a default implementation, it's too complicated.

If your sequence class has an efficient index method (or an efficient 
find method, or __getitem__ method, or any other efficient way of 
testing whether something exists in the sequence quickly) it's not much 
more work to define a custom __contains__ to take advantage of that. 
There's no need for the default Sequence fallback to try to guess what 
time-saving methods you might provide.

For a historical view, you should be aware that until recently, tuples 
had no index method:

[steve at ando ~]$ python2.5
Python 2.5.4 (r254:67916, Nov 25 2009, 18:45:43)
[GCC 4.1.2 20070626 (Red Hat 4.1.2-14)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> (1, 2).index
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'index'


There's no reason to expect that all sequences will have an index 
method, and certainly no reason to demand that they do.



-- 
Steven

From abarnert at yahoo.com  Mon Jul 21 03:52:58 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 20 Jul 2014 18:52:58 -0700
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <20140721014142.GL9112@ando>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <20140721014142.GL9112@ando>
Message-ID: <1405907578.10294.YahooMailNeo@web181006.mail.ne1.yahoo.com>

On Sunday, July 20, 2014 6:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> > On Sun, Jul 20, 2014 at 03:06:33PM -0700, Ram Rachum wrote:
> 
>>  Why does the default `Sequence.__contains__` iterate through the items 
>>  rather than use `.index`, which may sometimes be more efficient?
> 
> Because having an index() method is not a requirement to be a sequence. 
> It is optional.?

Sequence.__contains__ certainly can assume that your class will have an index method, because it provides one if you don't. See https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes (and you can look all the way back to 2.6 and 3.0 to verify that it's always been there).

The default implementation looks like this:

? ?for i, v in enumerate(self):
? ? ? ? if v == value:
? ? ? ? ? ? return i
? ?raise ValueError


> but it has at two problems I can see:

> 
> - it's not backwards compatible with sequence types which may already 
> define an index attribute which does something different, e.g.:
> 
> ? ? class Book:
> ? ? ? ? def index(self):
> ? ? ? ? ? ? # return the index of the book

This isn't a Sequence. You didn't inherit from collections.abc.Sequence, or even register with it. So, Sequence.__contains__ can't get called on your class in the first place.

If you _do_ make it a Sequence, then you're violating the protocol you're claiming to support, and it's your own fault if that doesn't work. You can also write a __getitem__ that requires four arguments and call yourself a Sequence, but you're going to get exceptions all over the place trying to use it.

> For a historical view, you should be aware that until recently, tuples?

> had no index method:


That was true up until the collections ABCs were aded in 2.6 and 3.0. Prior to that, yes, the "sequence protocol" was a vague thing, and you couldn't be sure that something had an index method just because it looked like a sequence?but, by the same token, prior to that, there was no Sequence ABC mixin, so the problem wasn't relevant in the first place.

From breamoreboy at yahoo.co.uk  Mon Jul 21 04:06:59 2014
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Mon, 21 Jul 2014 03:06:59 +0100
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
Message-ID: <lqhsk4$90p$1@ger.gmane.org>

On 20/07/2014 23:06, Ram Rachum wrote:
> Why does the default `Sequence.__contains__` iterate through the items
> rather than use `.index`, which may sometimes be more efficient?
>
> I suggest an implementation like this:
>
>      def __contains__(self, i):
>          try: self.index(i)
>          except ValueError: return False
>          else: return True
> What do you think?
>

I don't see how that can be more efficient than the naive

def __contains__(self, i):
     for elem in self:
         if elem == i:
             return True
     return False

What am I missing?

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



From rosuav at gmail.com  Mon Jul 21 04:09:27 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 21 Jul 2014 12:09:27 +1000
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <lqhsk4$90p$1@ger.gmane.org>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <lqhsk4$90p$1@ger.gmane.org>
Message-ID: <CAPTjJmrC5nGH6kvtyzL8EaeNW++LQRF17kdT5GBcqTfrwDAaJw@mail.gmail.com>

On Mon, Jul 21, 2014 at 12:06 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> On 20/07/2014 23:06, Ram Rachum wrote:
>>
>> Why does the default `Sequence.__contains__` iterate through the items
>> rather than use `.index`, which may sometimes be more efficient?
>>
>> I suggest an implementation like this:
>>
>>      def __contains__(self, i):
>>          try: self.index(i)
>>          except ValueError: return False
>>          else: return True
>> What do you think?
>>
>
> I don't see how that can be more efficient than the naive
>
> def __contains__(self, i):
>     for elem in self:
>         if elem == i:
>             return True
>     return False
>
> What am I missing?

If your sequence provides a more efficient index(), then __contains__
can take advantage of it. If not, it's a bit more indirection and the
same result.

ChrisA

From abarnert at yahoo.com  Mon Jul 21 04:18:44 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 20 Jul 2014 19:18:44 -0700
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <lqhsk4$90p$1@ger.gmane.org>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <lqhsk4$90p$1@ger.gmane.org>
Message-ID: <1405909124.91690.YahooMailNeo@web181001.mail.ne1.yahoo.com>

On Sunday, July 20, 2014 7:06 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:

> > On 20/07/2014 23:06, Ram Rachum wrote:
>>  Why does the default `Sequence.__contains__` iterate through the items
>>  rather than use `.index`, which may sometimes be more efficient?
>> 
>>  I suggest an implementation like this:
>> 
>> ? ? ? def __contains__(self, i):
>> ? ? ? ? ? try: self.index(i)
>> ? ? ? ? ? except ValueError: return False
>> ? ? ? ? ? else: return True
>>  What do you think?
>> 
> 
> I don't see how that can be more efficient than the naive
> 
> def __contains__(self, i):
> ? ?  for elem in self:
> ? ? ? ?  if elem == i:
> ? ? ? ? ? ?  return True
> ? ?  return False
> 
> What am I missing?


Consider a blist.sortedlist (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data structure built on a tree, skip list, etc.

The index method is O(log N), so Ram's __contains__ is also O(log N). But naively iterating is obviously O(N). (In fact, it could be worse?if you don't implement a custom __iter__, and your indexing is O(log N), then the naive __contains__ will be O(N log N)?)

Needless to say, blist.sortedlist implements a custom O(log N) __contains__, and so does (hopefully) every other such library on PyPI. But Ram's proposal would mean they no longer have to do so; they'll get O(log N) __contains__ for free just by implementing index.

Of course that only removes one method. For example, they still have to implement a custom count method or they'll get O(N) performance from the default version. If you look at the code for any of these types, __contains__ is a tiny percentage of the implementation. So, it's not a huge win. But it's a small one.

From breamoreboy at yahoo.co.uk  Mon Jul 21 04:36:52 2014
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Mon, 21 Jul 2014 03:36:52 +0100
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <CAPTjJmrC5nGH6kvtyzL8EaeNW++LQRF17kdT5GBcqTfrwDAaJw@mail.gmail.com>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <lqhsk4$90p$1@ger.gmane.org>
 <CAPTjJmrC5nGH6kvtyzL8EaeNW++LQRF17kdT5GBcqTfrwDAaJw@mail.gmail.com>
Message-ID: <lqhuc5$phh$1@ger.gmane.org>

On 21/07/2014 03:09, Chris Angelico wrote:
> On Mon, Jul 21, 2014 at 12:06 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>> On 20/07/2014 23:06, Ram Rachum wrote:
>>>
>>> Why does the default `Sequence.__contains__` iterate through the items
>>> rather than use `.index`, which may sometimes be more efficient?
>>>
>>> I suggest an implementation like this:
>>>
>>>       def __contains__(self, i):
>>>           try: self.index(i)
>>>           except ValueError: return False
>>>           else: return True
>>> What do you think?
>>>
>>
>> I don't see how that can be more efficient than the naive
>>
>> def __contains__(self, i):
>>      for elem in self:
>>          if elem == i:
>>              return True
>>      return False
>>
>> What am I missing?
>
> If your sequence provides a more efficient index(), then __contains__
> can take advantage of it. If not, it's a bit more indirection and the
> same result.
>
> ChrisA
>

The question was about the default sequence.__contains__, not mine or 
any other sequence which may or may not provide a more efficient index().

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



From breamoreboy at yahoo.co.uk  Mon Jul 21 04:40:49 2014
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Mon, 21 Jul 2014 03:40:49 +0100
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <1405909124.91690.YahooMailNeo@web181001.mail.ne1.yahoo.com>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <lqhsk4$90p$1@ger.gmane.org>
 <1405909124.91690.YahooMailNeo@web181001.mail.ne1.yahoo.com>
Message-ID: <lqhuji$sjd$1@ger.gmane.org>

On 21/07/2014 03:18, Andrew Barnert wrote:
> On Sunday, July 20, 2014 7:06 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>
>>> On 20/07/2014 23:06, Ram Rachum wrote:
>>>   Why does the default `Sequence.__contains__` iterate through the items
>>>   rather than use `.index`, which may sometimes be more efficient?
>>>
>>>   I suggest an implementation like this:
>>>
>>>        def __contains__(self, i):
>>>            try: self.index(i)
>>>            except ValueError: return False
>>>            else: return True
>>>   What do you think?
>>>
>>
>> I don't see how that can be more efficient than the naive
>>
>> def __contains__(self, i):
>>       for elem in self:
>>           if elem == i:
>>               return True
>>       return False
>>
>> What am I missing?
>
>
> Consider a blist.sortedlist (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data structure built on a tree, skip list, etc.
>
> The index method is O(log N), so Ram's __contains__ is also O(log N). But naively iterating is obviously O(N). (In fact, it could be worse?if you don't implement a custom __iter__, and your indexing is O(log N), then the naive __contains__ will be O(N log N)?)
>
> Needless to say, blist.sortedlist implements a custom O(log N) __contains__, and so does (hopefully) every other such library on PyPI. But Ram's proposal would mean they no longer have to do so; they'll get O(log N) __contains__ for free just by implementing index.
>
> Of course that only removes one method. For example, they still have to implement a custom count method or they'll get O(N) performance from the default version. If you look at the code for any of these types, __contains__ is a tiny percentage of the implementation. So, it's not a huge win. But it's a small one.
>

What has blist.sortedlist, which IIRC is one of the data structures that 
has been rejected as forming part of the standard library, got to do 
with the default sequence.__contains__ ?

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



From abarnert at yahoo.com  Mon Jul 21 04:59:26 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sun, 20 Jul 2014 19:59:26 -0700
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <lqhuji$sjd$1@ger.gmane.org>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <lqhsk4$90p$1@ger.gmane.org>
 <1405909124.91690.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <lqhuji$sjd$1@ger.gmane.org>
Message-ID: <1405911566.63080.YahooMailNeo@web181006.mail.ne1.yahoo.com>

On Sunday, July 20, 2014 7:40 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:

> > On 21/07/2014 03:18, Andrew Barnert wrote:
>>  On Sunday, July 20, 2014 7:06 PM, Mark Lawrence 
> <breamoreboy at yahoo.co.uk> wrote:
>> 
>>>>  On 20/07/2014 23:06, Ram Rachum wrote:
>>>> ?  Why does the default `Sequence.__contains__` iterate through the 
> items
>>>> ?  rather than use `.index`, which may sometimes be more efficient?
>>>> 
>>>> ?  I suggest an implementation like this:
>>>> 
>>>> ? ? ? ? def __contains__(self, i):
>>>> ? ? ? ? ? ? try: self.index(i)
>>>> ? ? ? ? ? ? except ValueError: return False
>>>> ? ? ? ? ? ? else: return True
>>>> ?  What do you think?
>>>> 
>>> 
>>>  I don't see how that can be more efficient than the naive
>>> 
>>>  def __contains__(self, i):
>>> ? ? ?  for elem in self:
>>> ? ? ? ? ?  if elem == i:
>>> ? ? ? ? ? ? ?  return True
>>> ? ? ?  return False
>>> 
>>>  What am I missing?
>> 
>> 
>>  Consider a blist.sortedlist 
> (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data 
> structure built on a tree, skip list, etc.
>> 
>>  The index method is O(log N), so Ram's __contains__ is also O(log N). 
> But naively iterating is obviously O(N). (In fact, it could be worse?if you 
> don't implement a custom __iter__, and your indexing is O(log N), then the 
> naive __contains__ will be O(N log N)?)
>> 
>>  Needless to say, blist.sortedlist implements a custom O(log N) 
> __contains__, and so does (hopefully) every other such library on PyPI. But 
> Ram's proposal would mean they no longer have to do so; they'll get 
> O(log N) __contains__ for free just by implementing index.
>> 
>>  Of course that only removes one method. For example, they still have to 
> implement a custom count method or they'll get O(N) performance from the 
> default version. If you look at the code for any of these types, __contains__ is 
> a tiny percentage of the implementation. So, it's not a huge win. But 
> it's a small one.
>> 
> 
> What has blist.sortedlist, which IIRC is one of the data structures that 
> has been rejected as forming part of the standard library, got to do 
> with the default sequence.__contains__ ?

I think you're missing the whole point here.

Sequence is an ABC?an Abstract Base Class?that's used (either by inheritance, or registration) by a wide variety of sequence classes?built-in, stdlib, or third-party.


Like most of the other ABCs in the Python stdlib, it's also usable as a mixin, providing default implementations for methods that you don't want to provide in terms of those that you do. Among the mixin methods it provides is __contains__, as documented at https://docs.python.org/dev/library/collections.abc.html#collections-abstract-base-classes and implemented at http://hg.python.org/cpython/file/default/Lib/_collections_abc.py#l629

I suspect the problem is that you parsed "the default Sequence.__contains__" wrong; Ram was referring to "the default implementation of __contains__ provided as Sequence.__contains__", but you thought he was referring to "the implementation of __contains__ in the default sequence", and whatever "the default sequence means" it obviously can't be a class from a third-party module, right?

From p.f.moore at gmail.com  Mon Jul 21 09:04:32 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 21 Jul 2014 08:04:32 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
Message-ID: <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>

On 21 July 2014 01:41, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
> OK, I wrote up a draft PEP, and attached it to the bug (if that's not a good thing to do, apologies); you can find it at http://bugs.python.org/file36008/pep-newline.txt

As a suggestion, how about adding an example of a simple nul-separated
filename filter - the sort of thing that could go in a find -print0 |
xxx | xargs -0 pipeline? If I understand it, that's one of the key
motivating examples for this change, so seeing how it's done would be
a great help.

Here's the sort of thing I mean, written for newline-separated files:

import sys

def process(filename):
    """Trivial example"""
    return filename.lower()

if __name__ == '__main__':

    for filename in sys.stdin:
        filename = process(filename)
        print(filename)

This is also an example of why I'm struggling to understand how an
open() parameter "solves all the cases". There's no explicit open()
call here, so how do you specify the record separator? Seeing how you
propose this would work would be really helpful to me.

Paul

From breamoreboy at yahoo.co.uk  Mon Jul 21 19:26:47 2014
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Mon, 21 Jul 2014 18:26:47 +0100
Subject: [Python-ideas] Changing `Sequence.__contains__`
In-Reply-To: <1405911566.63080.YahooMailNeo@web181006.mail.ne1.yahoo.com>
References: <a9afc6f3-8e53-44b9-978e-2e819c437896@googlegroups.com>
 <lqhsk4$90p$1@ger.gmane.org>
 <1405909124.91690.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <lqhuji$sjd$1@ger.gmane.org>
 <1405911566.63080.YahooMailNeo@web181006.mail.ne1.yahoo.com>
Message-ID: <lqjigh$7mc$1@ger.gmane.org>

On 21/07/2014 03:59, Andrew Barnert wrote:
> On Sunday, July 20, 2014 7:40 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>
>>> On 21/07/2014 03:18, Andrew Barnert wrote:
>>>   On Sunday, July 20, 2014 7:06 PM, Mark Lawrence
>> <breamoreboy at yahoo.co.uk> wrote:
>>>
>>>>>   On 20/07/2014 23:06, Ram Rachum wrote:
>>>>>     Why does the default `Sequence.__contains__` iterate through the
>> items
>>>>>     rather than use `.index`, which may sometimes be more efficient?
>>>>>
>>>>>     I suggest an implementation like this:
>>>>>
>>>>>          def __contains__(self, i):
>>>>>              try: self.index(i)
>>>>>              except ValueError: return False
>>>>>              else: return True
>>>>>     What do you think?
>>>>>
>>>>
>>>>   I don't see how that can be more efficient than the naive
>>>>
>>>>   def __contains__(self, i):
>>>>         for elem in self:
>>>>             if elem == i:
>>>>                 return True
>>>>         return False
>>>>
>>>>   What am I missing?
>>>
>>>
>>>   Consider a blist.sortedlist
>> (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data
>> structure built on a tree, skip list, etc.
>>>
>>>   The index method is O(log N), so Ram's __contains__ is also O(log N).
>> But naively iterating is obviously O(N). (In fact, it could be worse?if you
>> don't implement a custom __iter__, and your indexing is O(log N), then the
>> naive __contains__ will be O(N log N)?)
>>>
>>>   Needless to say, blist.sortedlist implements a custom O(log N)
>> __contains__, and so does (hopefully) every other such library on PyPI. But
>> Ram's proposal would mean they no longer have to do so; they'll get
>> O(log N) __contains__ for free just by implementing index.
>>>
>>>   Of course that only removes one method. For example, they still have to
>> implement a custom count method or they'll get O(N) performance from the
>> default version. If you look at the code for any of these types, __contains__ is
>> a tiny percentage of the implementation. So, it's not a huge win. But
>> it's a small one.
>>>
>>
>> What has blist.sortedlist, which IIRC is one of the data structures that
>> has been rejected as forming part of the standard library, got to do
>> with the default sequence.__contains__ ?
>
> I think you're missing the whole point here.
>
> Sequence is an ABC?an Abstract Base Class?that's used (either by inheritance, or registration) by a wide variety of sequence classes?built-in, stdlib, or third-party.
>
>
> Like most of the other ABCs in the Python stdlib, it's also usable as a mixin, providing default implementations for methods that you don't want to provide in terms of those that you do. Among the mixin methods it provides is __contains__, as documented at https://docs.python.org/dev/library/collections.abc.html#collections-abstract-base-classes and implemented at http://hg.python.org/cpython/file/default/Lib/_collections_abc.py#l629
>
> I suspect the problem is that you parsed "the default Sequence.__contains__" wrong; Ram was referring to "the default implementation of __contains__ provided as Sequence.__contains__", but you thought he was referring to "the implementation of __contains__ in the default sequence", and whatever "the default sequence means" it obviously can't be a class from a third-party module, right?
>

Thanks for the explanation and yes I did parse it incorrectly. 
Strangely everything seems much clearer at 6PM rather than 3AM :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com



From greg.ewing at canterbury.ac.nz  Mon Jul 21 23:12:41 2014
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 22 Jul 2014 09:12:41 +1200
Subject: [Python-ideas] Struct format with multiple endianness markers
Message-ID: <53CD8249.20302@canterbury.ac.nz>

I'd like to propose a small enhancement to the
struct module: Allow the endianness characters to
occur more than once in the format string,
rather than just as the first character.

My use case is reading ESRI shapefile headers, which
mix big and little endian data. This means I can't
use a single struct.unpack call to read what is
logically a single structure, but have to split it
up and use multiple calls.

If I could switch endianness part way through
the format, I could unpack the whole structure
with a single call.

-- 
Greg

From guido at python.org  Tue Jul 22 02:54:55 2014
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Jul 2014 17:54:55 -0700
Subject: [Python-ideas] Struct format with multiple endianness markers
In-Reply-To: <53CD8249.20302@canterbury.ac.nz>
References: <53CD8249.20302@canterbury.ac.nz>
Message-ID: <CAP7+vJ+5CgdRP_ZPy=Hea=Fx38-xwkjRQ56fuREY=xcS2SmRUQ@mail.gmail.com>

Simple and elegant. Can you submit a patch? One suggestion: disallow
endianness marker if there isn't one at the start (i.e. default).
On Jul 21, 2014 5:44 PM, "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote:

> I'd like to propose a small enhancement to the
> struct module: Allow the endianness characters to
> occur more than once in the format string,
> rather than just as the first character.
>
> My use case is reading ESRI shapefile headers, which
> mix big and little endian data. This means I can't
> use a single struct.unpack call to read what is
> logically a single structure, but have to split it
> up and use multiple calls.
>
> If I could switch endianness part way through
> the format, I could unpack the whole structure
> with a single call.
>
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140721/00388217/attachment.html>

From 4kir4.1i at gmail.com  Tue Jul 22 18:05:42 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Tue, 22 Jul 2014 20:05:42 +0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
Message-ID: <87bnshnzu1.fsf@gmail.com>

Paul Moore <p.f.moore at gmail.com> writes:

> On 21 July 2014 01:41, Andrew Barnert
> <abarnert at yahoo.com.dmarc.invalid> wrote:
>> OK, I wrote up a draft PEP, and attached it to the bug (if that's
>> not a good thing to do, apologies); you can find it at
>> http://bugs.python.org/file36008/pep-newline.txt
>
> As a suggestion, how about adding an example of a simple nul-separated
> filename filter - the sort of thing that could go in a find -print0 |
> xxx | xargs -0 pipeline? If I understand it, that's one of the key
> motivating examples for this change, so seeing how it's done would be
> a great help.
>
> Here's the sort of thing I mean, written for newline-separated files:
>
> import sys
>
> def process(filename):
>     """Trivial example"""
>     return filename.lower()
>
> if __name__ == '__main__':
>
>     for filename in sys.stdin:
>         filename = process(filename)
>         print(filename)
>
> This is also an example of why I'm struggling to understand how an
> open() parameter "solves all the cases". There's no explicit open()
> call here, so how do you specify the record separator? Seeing how you
> propose this would work would be really helpful to me.
>

`find -print0 | ./tr-filename -0 | xargs -0` example implies that you
can replace `sys.std*` streams without worrying about preserving
`sys.__std*__` streams:

  #!/usr/bin/env python
  import io
  import re
  import sys
  from pathlib import Path

  def transform_filename(filename: str) -> str: # example
      """Normalize whitespace in basename."""
      path = Path(filename)
      new_path = path.with_name(re.sub(r'\s+', ' ', path.name))
      path.replace(new_path) # rename on disk if necessary
      return str(new_path)

  def SystemTextStream(bytes_stream, **kwargs):
      encoding = sys.getfilesystemencoding()
      return io.TextIOWrapper(bytes_stream,
          encoding=encoding,
          errors='surrogateescape' if encoding != 'mbcs' else 'strict',
          **kwargs)

  nl = '\0' if '-0' in sys.argv else None
  sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
  for line in SystemTextStream(sys.stdin.detach(), newline=nl):
      print(transform_filename(line.rstrip(nl)), end=nl)

io.TextIOWrapper() plays the role of open() in this case. The code
assumes that `newline` parameter accepts '\0'.

The example function handles Unicode whitespace to demonstrate why
opaque bytes-based cookies can't be used to represent filenames in this
case even on POSIX, though which characters are recognized depends on
sys.getfilesystemencoding().

Note:

- `end=nl` is necessary because `print()` prints '\n' by default -- it
  does not use `file.newline`
- `-0` option is required in the current implementation if filenames may
  have a trailing whitespace. It can be improved  
- SystemTextStream() handles undecodable in the current locale filenames
  i.e., non-ascii names are allowed even in C locale (LC_CTYPE=C)
- undecodable filenames are not supported on Windows. It is not clear
  how to pass an undecodable filename via a pipe on Windows -- perhaps
  `GetShortPathNameW -> fsencode -> pipe` might work in some cases. It
  assumes that the short path exists and it is always encodable using
  mbcs. If we can control all parts of the pipeline *and* Windows API
  uses proper utf-16 (not ucs-2) then utf-8 can be used to pass
  filenames via a pipe otherwise ReadConsoleW/WriteConsoleW could be
  tried e.g., https://github.com/Drekin/win-unicode-console


--
Akira


From p.f.moore at gmail.com  Tue Jul 22 19:35:58 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 22 Jul 2014 18:35:58 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87bnshnzu1.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com>
Message-ID: <CACac1F-LFE9u3if=NVKf3npsRrcqmtAUhcVShNFLw3UqopuiEg@mail.gmail.com>

On 22 July 2014 17:05, Akira Li <4kir4.1i at gmail.com> wrote:
> The example function handles Unicode whitespace to demonstrate why
> opaque bytes-based cookies can't be used to represent filenames in this
> case even on POSIX, though which characters are recognized depends on
> sys.getfilesystemencoding().

Thanks. That's how you'd do it now.

A question for the OP: how would the proposed change improve this code?
Paul

From 4kir4.1i at gmail.com  Wed Jul 23 01:48:06 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Wed, 23 Jul 2014 03:48:06 +0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com>
 <CACac1F-LFE9u3if=NVKf3npsRrcqmtAUhcVShNFLw3UqopuiEg@mail.gmail.com>
Message-ID: <87wqb5lzux.fsf@gmail.com>

Paul Moore <p.f.moore at gmail.com> writes:

> On 22 July 2014 17:05, Akira Li
> <4kir4.1i at gmail.com> wrote:
>> The example function handles Unicode whitespace to demonstrate why
>> opaque bytes-based cookies can't be used to represent filenames in this
>> case even on POSIX, though which characters are recognized depends on
>> sys.getfilesystemencoding().
>
> Thanks. That's how you'd do it now.

You've cut too much e.g. I wrote in [1]:

>> io.TextIOWrapper() plays the role of open() in this case. The code
>> assumes that `newline` parameter accepts '\0'.

[1] https://mail.python.org/pipermail/python-ideas/2014-July/028372.html

> A question for the OP: how would the proposed change improve this code?
> Paul

I'm not sure who is OP in this context but I can answer: the proposed
change might allow TextIOWrapper(.., newline='\0') and the code in [1]
doesn't support `-0` command-line parameter without it.


--
Akira


From abarnert at yahoo.com  Wed Jul 23 06:24:12 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 22 Jul 2014 21:24:12 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
Message-ID: <01067774-6B85-436D-B240-83E14CBDA315@yahoo.com>

On Jul 21, 2014, at 0:04, Paul Moore <p.f.moore at gmail.com> wrote:

> On 21 July 2014 01:41, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
>> OK, I wrote up a draft PEP, and attached it to the bug (if that's not a good thing to do, apologies); you can find it at http://bugs.python.org/file36008/pep-newline.txt
> 
> As a suggestion, how about adding an example of a simple nul-separated
> filename filter - the sort of thing that could go in a find -print0 |
> xxx | xargs -0 pipeline? If I understand it, that's one of the key
> motivating examples for this change, so seeing how it's done would be
> a great help.
> 
> Here's the sort of thing I mean, written for newline-separated files:
> 
> import sys
> 
> def process(filename):
>    """Trivial example"""
>    return filename.lower()
> 
> if __name__ == '__main__':
> 
>    for filename in sys.stdin:
>        filename = process(filename)
>        print(filename)

for file in io.TextIOWrapper(sys.stdin.buffer, encoding=sys.stdin.encoding, errors=sys.stdin.errors, newline='\0'):
    filename = process(filename.rstrip('\0'))
    print(filename)

I assume you wanted an rstrip('\n') in the original, so I did the equivalent here.

If you want to pipe the result to another -0 tool, you also need to add end='\0' to the print, of course.

If we had Nick Coghlan's separate idea of adding rewrap methods to the stream classes (not part of this proposal, but I would be happy to have it), it would be even simpler:

for file in sys.stdin.rewrap(newline='\0'):
    filename = process(filename.rstrip('\0'))
    print(filename)

Anyway, this isn't perfect if, e.g., you might have illegal-as-UTF8 Latin-1 filenames hiding in your UTF8 filesystem, but neither is your code; in fact, this does exactly the same thing, except that it takes \0 terminators (so it can handle filenames with embedded newlines, or pipelines that use -print0 just because they can't be sure which tools in the chain can handle spaces).

It's obviously a little more complicated than your code, but that's to be expected; it's a lot simpler than anything we can write today. (And it runs at the same speed of your code instead of 2x slower or worse.)

> This is also an example of why I'm struggling to understand how an
> open() parameter "solves all the cases". There's no explicit open()
> call here, so how do you specify the record separator? Seeing how you
> propose this would work would be really helpful to me.

The open function is just a shortcut to constructing a stack of io classes; you can always construct them manually. It would be nice if some cases of that were made a little easier (again, see Nick's proposal above), but it's easy enough to live with.

From abarnert at yahoo.com  Wed Jul 23 06:40:54 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 22 Jul 2014 21:40:54 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87bnshnzu1.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com>
Message-ID: <E1FD0B3B-D82D-4805-98EC-4C595CE9D534@yahoo.com>

On Jul 22, 2014, at 9:05, Akira Li <4kir4.1i at gmail.com> wrote:

> Paul Moore <p.f.moore at gmail.com> writes:
> 
>> On 21 July 2014 01:41, Andrew Barnert
>> <abarnert at yahoo.com.dmarc.invalid> wrote:
>>> OK, I wrote up a draft PEP, and attached it to the bug (if that's
>>> not a good thing to do, apologies); you can find it at
>>> http://bugs.python.org/file36008/pep-newline.txt
>> 
>> As a suggestion, how about adding an example of a simple nul-separated
>> filename filter - the sort of thing that could go in a find -print0 |
>> xxx | xargs -0 pipeline? If I understand it, that's one of the key
>> motivating examples for this change, so seeing how it's done would be
>> a great help.
>> 
>> Here's the sort of thing I mean, written for newline-separated files:
>> 
>> import sys
>> 
>> def process(filename):
>>    """Trivial example"""
>>    return filename.lower()
>> 
>> if __name__ == '__main__':
>> 
>>    for filename in sys.stdin:
>>        filename = process(filename)
>>        print(filename)
>> 
>> This is also an example of why I'm struggling to understand how an
>> open() parameter "solves all the cases". There's no explicit open()
>> call here, so how do you specify the record separator? Seeing how you
>> propose this would work would be really helpful to me.
> 
> `find -print0 | ./tr-filename -0 | xargs -0` example implies that you
> can replace `sys.std*` streams without worrying about preserving
> `sys.__std*__` streams:
> 
>  #!/usr/bin/env python
>  import io
>  import re
>  import sys
>  from pathlib import Path
> 
>  def transform_filename(filename: str) -> str: # example
>      """Normalize whitespace in basename."""
>      path = Path(filename)
>      new_path = path.with_name(re.sub(r'\s+', ' ', path.name))
>      path.replace(new_path) # rename on disk if necessary
>      return str(new_path)
> 
>  def SystemTextStream(bytes_stream, **kwargs):
>      encoding = sys.getfilesystemencoding()
>      return io.TextIOWrapper(bytes_stream,
>          encoding=encoding,
>          errors='surrogateescape' if encoding != 'mbcs' else 'strict',
>          **kwargs)
> 
>  nl = '\0' if '-0' in sys.argv else None
>  sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>  for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>      print(transform_filename(line.rstrip(nl)), end=nl)

Nice, much more complete example than mine. I just tried to handle as many edge cases as the original he asked about, but you handle everything.

> io.TextIOWrapper() plays the role of open() in this case. The code
> assumes that `newline` parameter accepts '\0'.
> 
> The example function handles Unicode whitespace to demonstrate why
> opaque bytes-based cookies can't be used to represent filenames in this
> case even on POSIX, though which characters are recognized depends on
> sys.getfilesystemencoding().
> 
> Note:
> 
> - `end=nl` is necessary because `print()` prints '\n' by default -- it
>  does not use `file.newline`

Actually, yes it does. Or, rather, print pastes on a '\n', but sys.stdout.write translates any '\n' characters to sys.stdout.writenl (a private variable that's initialized from the newline argument at construction time if it's anything other than None or '').

But of course that's the newline argument to sys.stdout, and you only changed sys.stdin, so you do need end=nl anyway. (And you wouldn't want output translation here anyway, because that could also translate '\n' characters in the middle of a line, re-creating the same problem we're trying to avoid...)

But it uses sys.stdout.newline, not sys.stdin.newline.

> - `-0` option is required in the current implementation if filenames may
>  have a trailing whitespace. It can be improved  
> - SystemTextStream() handles undecodable in the current locale filenames
>  i.e., non-ascii names are allowed even in C locale (LC_CTYPE=C)
> - undecodable filenames are not supported on Windows. It is not clear
>  how to pass an undecodable filename via a pipe on Windows -- perhaps
>  `GetShortPathNameW -> fsencode -> pipe` might work in some cases. It
>  assumes that the short path exists and it is always encodable using
>  mbcs. If we can control all parts of the pipeline *and* Windows API
>  uses proper utf-16 (not ucs-2) then utf-8 can be used to pass
>  filenames via a pipe otherwise ReadConsoleW/WriteConsoleW could be
>  tried e.g., https://github.com/Drekin/win-unicode-console

First, don't both the Win32 APIs and the POSIX-ish layer in msvcrt on top of it guarantee that you can never get such unencodable filenames (sometimes by just pretending the file doesn't exist, but if possible by having the filesystem map it to something valid, unique, and persistent for this session, usually the short name)?

Second, trying to solve this implies that you have some other native (as opposed to Cygwin) tool that passes or accepts such filenames over simple pipes (as opposed to PowerShell typed ones). Are there any? What does, say, mingw's find do with invalid filenames if it finds them?

On Unix, of course, it's a real problem.

From p.f.moore at gmail.com  Wed Jul 23 10:11:23 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 23 Jul 2014 09:11:23 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87wqb5lzux.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com>
 <CACac1F-LFE9u3if=NVKf3npsRrcqmtAUhcVShNFLw3UqopuiEg@mail.gmail.com>
 <87wqb5lzux.fsf@gmail.com>
Message-ID: <CACac1F-+tBydOOnZb4-5U3pE14Ckzc_DTAPbx+34-mWNOG9XkQ@mail.gmail.com>

On 23 July 2014 00:48, Akira Li <4kir4.1i at gmail.com> wrote:
> I'm not sure who is OP in this context but I can answer: the proposed
> change might allow TextIOWrapper(.., newline='\0') and the code in [1]
> doesn't support `-0` command-line parameter without it.

I see. My apologies, I read that part but didn't spot what you meant.
Thanks for clarifying.

From p.f.moore at gmail.com  Wed Jul 23 10:14:31 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 23 Jul 2014 09:14:31 +0100
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <01067774-6B85-436D-B240-83E14CBDA315@yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CAP7+vJ+fGeAaMgKN8hsPnzrMV=S2NLrh049D=_UfcBjPW5a=1A@mail.gmail.com>
 <CAN3CYHzrnEqcpgK9nZ7PrmovP5kp-vuR6p9hVdn_oPJDPSXx4g@mail.gmail.com>
 <1405635685.60281.YahooMailNeo@web181004.mail.ne1.yahoo.com>
 <1405641840.13158.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <01067774-6B85-436D-B240-83E14CBDA315@yahoo.com>
Message-ID: <CACac1F_J9TASjNc_oMuNpvYXy_7b84Fmq83oLqSRcHYuF2TckQ@mail.gmail.com>

On 23 July 2014 05:24, Andrew Barnert <abarnert at yahoo.com> wrote:
>> This is also an example of why I'm struggling to understand how an
>> open() parameter "solves all the cases". There's no explicit open()
>> call here, so how do you specify the record separator? Seeing how you
>> propose this would work would be really helpful to me.
>
> The open function is just a shortcut to constructing a stack of io classes;

Ah, yes, I get what you're saying now. I was reading your proposal too
literally as being about "open", and forgetting you can use the
underlying classes to rewrap existing streams.

Thanks for your patience.
Paul

From 4kir4.1i at gmail.com  Wed Jul 23 14:13:06 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Wed, 23 Jul 2014 16:13:06 +0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <E1FD0B3B-D82D-4805-98EC-4C595CE9D534@yahoo.com> (Andrew
 Barnert's message of "Tue, 22 Jul 2014 21:40:54 -0700")
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com>
 <E1FD0B3B-D82D-4805-98EC-4C595CE9D534@yahoo.com>
Message-ID: <87lhrkmeos.fsf@gmail.com>

Andrew Barnert <abarnert at yahoo.com> writes:

> On Jul 22, 2014, at 9:05, Akira Li <4kir4.1i at gmail.com> wrote:
>
>> Paul Moore <p.f.moore at gmail.com> writes:
>>
>>> On 21 July 2014 01:41, Andrew Barnert
>>> <abarnert at yahoo.com.dmarc.invalid> wrote:
>>>> OK, I wrote up a draft PEP, and attached it to the bug (if that's
>>>> not a good thing to do, apologies); you can find it at
>>>> http://bugs.python.org/file36008/pep-newline.txt
>>>
>>> As a suggestion, how about adding an example of a simple nul-separated
>>> filename filter - the sort of thing that could go in a find -print0 |
>>> xxx | xargs -0 pipeline? If I understand it, that's one of the key
>>> motivating examples for this change, so seeing how it's done would be
>>> a great help.
>>>
>>> Here's the sort of thing I mean, written for newline-separated files:
>>>
>>> import sys
>>>
>>> def process(filename):
>>>    """Trivial example"""
>>>    return filename.lower()
>>>
>>> if __name__ == '__main__':
>>>
>>>    for filename in sys.stdin:
>>>        filename = process(filename)
>>>        print(filename)
>>>
>>> This is also an example of why I'm struggling to understand how an
>>> open() parameter "solves all the cases". There's no explicit open()
>>> call here, so how do you specify the record separator? Seeing how you
>>> propose this would work would be really helpful to me.
>>
>> `find -print0 | ./tr-filename -0 | xargs -0` example implies that you
>> can replace `sys.std*` streams without worrying about preserving
>> `sys.__std*__` streams:
>>
>>  #!/usr/bin/env python
>>  import io
>>  import re
>>  import sys
>>  from pathlib import Path
>>
>>  def transform_filename(filename: str) -> str: # example
>>      """Normalize whitespace in basename."""
>>      path = Path(filename)
>>      new_path = path.with_name(re.sub(r'\s+', ' ', path.name))
>>      path.replace(new_path) # rename on disk if necessary
>>      return str(new_path)
>>
>>  def SystemTextStream(bytes_stream, **kwargs):
>>      encoding = sys.getfilesystemencoding()
>>      return io.TextIOWrapper(bytes_stream,
>>          encoding=encoding,
>>          errors='surrogateescape' if encoding != 'mbcs' else 'strict',
>>          **kwargs)
>>
>>  nl = '\0' if '-0' in sys.argv else None
>>  sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>  for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>      print(transform_filename(line.rstrip(nl)), end=nl)
>
> Nice, much more complete example than mine. I just tried to handle as
> many edge cases as the original he asked about, but you handle
> everything.
>
>> io.TextIOWrapper() plays the role of open() in this case. The code
>> assumes that `newline` parameter accepts '\0'.
>>
>> The example function handles Unicode whitespace to demonstrate why
>> opaque bytes-based cookies can't be used to represent filenames in this
>> case even on POSIX, though which characters are recognized depends on
>> sys.getfilesystemencoding().
>>
>> Note:
>>
>> - `end=nl` is necessary because `print()` prints '\n' by default -- it
>>  does not use `file.newline`
>
> Actually, yes it does. Or, rather, print pastes on a '\n', but
> sys.stdout.write translates any '\n' characters to sys.stdout.writenl
> (a private variable that's initialized from the newline argument at
> construction time if it's anything other than None or '').

You are right. I've stopped reading the source for print() function at
`PyFile_WriteString("\n", file);` line assuming that "\n" is not
translated if newline="\0". But the current behaviour if "\0" were in
"the other legal values" category (like "\r") would be to translate "\n"
[1]:

  When writing output to the stream, if newline is None, any '\n'
  characters written are translated to the system default line
  separator, os.linesep. If newline is '' or '\n', no translation takes
  place. If newline is any of the other legal values, any '\n'
  characters written are translated to the given string.

[1] https://docs.python.org/3/library/io.html#io.TextIOWrapper

Example:

  $ ./python -c 'import sys, io;
  sys.stdout=io.TextIOWrapper(sys.stdout.detach(), newline="\r\n");
  sys.stdout.write("\n\r\r\n")'| xxd
  0000000: 0d0a 0d0d 0d0a                           ......

"\n" is translated to b"\r\n" here and "\r" is left untouched (b"\r").

In order to newline="\0" case to work, it should behave similar to
newline='' or newline='\n' case instead i.e., no translation should take
place, to avoid corrupting embed "\n\r" characters. My original code
works as is in this case i.e., *end=nl is still necessary*.

> But of course that's the newline argument to sys.stdout, and you only
> changed sys.stdin, so you do need end=nl anyway. (And you wouldn't
> want output translation here anyway, because that could also translate
> \n' characters in the middle of a line, re-creating the same problem
> we're trying to avoid...)
>
> But it uses sys.stdout.newline, not sys.stdin.newline.

The code affects *both* sys.stdout/sys.stdin. Look [2]:

>>  sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>  for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>      print(transform_filename(line.rstrip(nl)), end=nl)

[2] https://mail.python.org/pipermail/python-ideas/2014-July/028372.html

>> - SystemTextStream() handles undecodable in the current locale filenames
>>  i.e., non-ascii names are allowed even in C locale (LC_CTYPE=C)
>> - undecodable filenames are not supported on Windows. It is not clear
>>  how to pass an undecodable filename via a pipe on Windows -- perhaps
>>  `GetShortPathNameW -> fsencode -> pipe` might work in some cases. It
>>  assumes that the short path exists and it is always encodable using
>>  mbcs. If we can control all parts of the pipeline *and* Windows API
>>  uses proper utf-16 (not ucs-2) then utf-8 can be used to pass
>>  filenames via a pipe otherwise ReadConsoleW/WriteConsoleW could be
>>  tried e.g., https://github.com/Drekin/win-unicode-console
>
> First, don't both the Win32 APIs and the POSIX-ish layer in msvcrt on
> top of it guarantee that you can never get such unencodable filenames
> (sometimes by just pretending the file doesn't exist, but if possible
> by having the filesystem map it to something valid, unique, and
> persistent for this session, usually the short name)?
> Second, trying to solve this implies that you have some other native
> (as opposed to Cygwin) tool that passes or accepts such filenames over
> simple pipes (as opposed to PowerShell typed ones). Are there any?
> What does, say, mingw's find do with invalid filenames if it finds
> them?

In short: I don't know :)

To be clear, I'm talking about native Windows applications (not
find/xargs on Cygwin). The goal is to process robustly *arbitrary*
filenames on Windows via a pipe (SystemTextStream()) or network (bytes
interface).

I know that (A)nsi API (and therefore "POSIX-ish layer" that uses narrow
strings such main(), fopen(), fstream is broken e.g., Thai filenames on
Greek computer [3]. Unicode (W) API should enforce utf-16 in principle
since Windows 2000 [4]. But I expect ucs-2 shows its ugly head in many
places due to bad programming practices (based on the common wrong
assumption that Unicode == UTF-16 == UCS-2) and/or bugs that are not
fixed due to MS' backwards compatibility policies in the past [5].

[3]
http://blog.gatunka.com/2014/04/25/character-encodings-for-modern-programmers/
[4] http://en.wikipedia.org/wiki/UTF-16#Use_in_major_operating_systems_and_environments
[5] http://blogs.msdn.com/b/oldnewthing/archive/2003/10/15/55296.aspx


--
Akira

From abarnert at yahoo.com  Wed Jul 23 17:49:19 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 23 Jul 2014 08:49:19 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87oawgmfxp.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org> <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com> <E1FD0B3B-D82D-4805-98EC-4C595CE9D534@yahoo.com>
 <87oawgmfxp.fsf@gmail.com>
Message-ID: <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com>

On Jul 23, 2014, at 5:13, Akira Li <4kir4.1i at gmail.com> wrote:

> Andrew Barnert <abarnert at yahoo.com> writes:
> 
>> On Jul 22, 2014, at 9:05, Akira Li <4kir4.1i at gmail.com> wrote:
>> 
>>> Paul Moore <p.f.moore at gmail.com> writes:
>>> 
>>>> On 21 July 2014 01:41, Andrew Barnert
>>>> <abarnert at yahoo.com.dmarc.invalid> wrote:
>>>>> OK, I wrote up a draft PEP, and attached it to the bug (if that's
>>>>> not a good thing to do, apologies); you can find it at
>>>>> http://bugs.python.org/file36008/pep-newline.txt
>>>> 
>>>> As a suggestion, how about adding an example of a simple nul-separated
>>>> filename filter - the sort of thing that could go in a find -print0 |
>>>> xxx | xargs -0 pipeline? If I understand it, that's one of the key
>>>> motivating examples for this change, so seeing how it's done would be
>>>> a great help.
>>>> 
>>>> Here's the sort of thing I mean, written for newline-separated files:
>>>> 
>>>> import sys
>>>> 
>>>> def process(filename):
>>>>  """Trivial example"""
>>>>  return filename.lower()
>>>> 
>>>> if __name__ == '__main__':
>>>> 
>>>>  for filename in sys.stdin:
>>>>      filename = process(filename)
>>>>      print(filename)
>>>> 
>>>> This is also an example of why I'm struggling to understand how an
>>>> open() parameter "solves all the cases". There's no explicit open()
>>>> call here, so how do you specify the record separator? Seeing how you
>>>> propose this would work would be really helpful to me.
>>> 
>>> `find -print0 | ./tr-filename -0 | xargs -0` example implies that you
>>> can replace `sys.std*` streams without worrying about preserving
>>> `sys.__std*__` streams:
>>> 
>>> #!/usr/bin/env python
>>> import io
>>> import re
>>> import sys
>>> from pathlib import Path
>>> 
>>> def transform_filename(filename: str) -> str: # example
>>>    """Normalize whitespace in basename."""
>>>    path = Path(filename)
>>>    new_path = path.with_name(re.sub(r'\s+', ' ', path.name))
>>>    path.replace(new_path) # rename on disk if necessary
>>>    return str(new_path)
>>> 
>>> def SystemTextStream(bytes_stream, **kwargs):
>>>    encoding = sys.getfilesystemencoding()
>>>    return io.TextIOWrapper(bytes_stream,
>>>        encoding=encoding,
>>>        errors='surrogateescape' if encoding != 'mbcs' else 'strict',
>>>        **kwargs)
>>> 
>>> nl = '\0' if '-0' in sys.argv else None
>>> sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>> for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>>    print(transform_filename(line.rstrip(nl)), end=nl)
>> 
>> Nice, much more complete example than mine. I just tried to handle as
>> many edge cases as the original he asked about, but you handle
>> everything.
>> 
>>> io.TextIOWrapper() plays the role of open() in this case. The code
>>> assumes that `newline` parameter accepts '\0'.
>>> 
>>> The example function handles Unicode whitespace to demonstrate why
>>> opaque bytes-based cookies can't be used to represent filenames in this
>>> case even on POSIX, though which characters are recognized depends on
>>> sys.getfilesystemencoding().
>>> 
>>> Note:
>>> 
>>> - `end=nl` is necessary because `print()` prints '\n' by default -- it
>>> does not use `file.newline`
>> 
>> Actually, yes it does. Or, rather, print pastes on a '\n', but
>> sys.stdout.write translates any '\n' characters to sys.stdout.writenl
>> (a private variable that's initialized from the newline argument at
>> construction time if it's anything other than None or '').
> 
> You are right. I've stopped reading the source for print() function at
> `PyFile_WriteString("\n", file);` line assuming that "\n" is not
> translated if newline="\0". But the current behaviour if "\0" were in
> "the other legal values" category (like "\r") would be to translate "\n"
> [1]: 
> 
> When writing output to the stream, if newline is None, any '\n'
> characters written are translated to the system default line
> separator, os.linesep. If newline is '' or '\n', no translation takes
> place. If newline is any of the other legal values, any '\n'
> characters written are translated to the given string.
> 
> [1] https://docs.python.org/3/library/io.html#io.TextIOWrapper
> 
> Example:
> 
> $ ./python -c 'import sys, io; 
> sys.stdout=io.TextIOWrapper(sys.stdout.detach(), newline="\r\n"); 
> sys.stdout.write("\n\r\r\n")'| xxd
> 0000000: 0d0a 0d0d 0d0a                           ......
> 
> "\n" is translated to b"\r\n" here and "\r" is left untouched (b"\r").
> 
> In order to newline="\0" case to work, it should behave similar to
> newline='' or newline='\n' case instead i.e., no translation should take
> place, to avoid corrupting embed "\n\r" characters.

The draft PEP discusses this. I think it would be more consistent to translate for \0, just like \r and \r\n.

For the your script, there is no reason to pass newline=nl to the stdout replacement. The only effect that has on output is \n replacement, which you don't want. And if we removed that effect from the proposal, it would have no effect at all on output, so why pass it?

Do you have a use case where you need to pass a non-standard newline to a text file/stream, but don't want newline replacement? Or is it just a matter of avoiding confusion if people accidentally pass it for stdout when they didn't want it?

> My original code
> works as is in this case i.e., *end=nl is still necessary*.

>> But of course that's the newline argument to sys.stdout, and you only
>> changed sys.stdin, so you do need end=nl anyway. (And you wouldn't
>> want output translation here anyway, because that could also translate
>> \n' characters in the middle of a line, re-creating the same problem
>> we're trying to avoid...)
>> 
>> But it uses sys.stdout.newline, not sys.stdin.newline.
> 
> The code affects *both* sys.stdout/sys.stdin. Look [2]:

I didn't notice that you passed it for stdout as well--as I explained above, you don't need it, and shouldn't do it.

As a side note, I think it might have been a better design to have separate arguments for input newline, output newline, and universal newlines mode, instead of cramming them all into one argument; for some simple cases the current design makes things a little less verbose, but it gets in the way for more complex cases, even today with \r or \r\n. However, I don't think that needs to be changed as part of this proposal.

It also might be nice to have a full set of PYTHONIOFOO env variables rather than just PYTHONIOENCODING, but again, I don't think that needs to be part of this proposal. And likewise for Nick Coghlan's rewrap method proposal on TextIOWrapper and maybe BufferedFoo.

>>> sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>> for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>>    print(transform_filename(line.rstrip(nl)), end=nl)
> 
> [2] https://mail.python.org/pipermail/python-ideas/2014-July/028372.html
> 
>>> - SystemTextStream() handles undecodable in the current locale filenames
>>> i.e., non-ascii names are allowed even in C locale (LC_CTYPE=C)
>>> - undecodable filenames are not supported on Windows. It is not clear
>>> how to pass an undecodable filename via a pipe on Windows -- perhaps
>>> `GetShortPathNameW -> fsencode -> pipe` might work in some cases. It
>>> assumes that the short path exists and it is always encodable using
>>> mbcs. If we can control all parts of the pipeline *and* Windows API
>>> uses proper utf-16 (not ucs-2) then utf-8 can be used to pass
>>> filenames via a pipe otherwise ReadConsoleW/WriteConsoleW could be
>>> tried e.g., https://github.com/Drekin/win-unicode-console
>> 
>> First, don't both the Win32 APIs and the POSIX-ish layer in msvcrt on
>> top of it guarantee that you can never get such unencodable filenames
>> (sometimes by just pretending the file doesn't exist, but if possible
>> by having the filesystem map it to something valid, unique, and
>> persistent for this session, usually the short name)?
>> Second, trying to solve this implies that you have some other native
>> (as opposed to Cygwin) tool that passes or accepts such filenames over
>> simple pipes (as opposed to PowerShell typed ones). Are there any?
>> What does, say, mingw's find do with invalid filenames if it finds
>> them?
> 
> In short: I don't know :)
> 
> To be clear, I'm talking about native Windows applications (not
> find/xargs on Cygwin). The goal is to process robustly *arbitrary*
> filenames on Windows via a pipe (SystemTextStream()) or network (bytes
> interface).

Yes, I assumed that, I just wanted to make that clear.

My point is that if there isn't already an ecosystem of tools that do so on Windows, or a recommended answer from Microsoft, we don't need to fit into existing practices here. (Actually, there _is_ a recommended answer from Microsoft, but it's "don't send encoded filenames over a binary stream, send them as an array of UTF-16 strings over PowerShell cmdlet typed pipes"--and, more generally, "don't use any ANSI interfaces except for backward compatibility reasons".) 

At any rate, if the filenames-over-pipes encoding problem exists on Windows, and if it's solvable, it's still outside the scope of this proposal, unless you think the documentation needs a completely worked example that shows how to interact with some Windows tool, alongside one for interacting with find -print0 on Unix. (And I don't think it does. If we want a Windows example, resource compiler string input files, which are \0-terminated UTF-16, probably serve better.)

> I know that (A)nsi API (and therefore "POSIX-ish layer" that uses narrow
> strings such main(), fopen(), fstream is broken e.g., Thai filenames on
> Greek computer [3].

Yes, and broken in a way that people cannot easily work around except by using the UTF-16 interfaces. That's been Microsoft's recommended answer to the problem since NT 3.5, Win 95, and MSVCRT 3: if you want to handle all filenames, use _wmain, _wfopen, etc.--or, better, use CreateFileW instead of fopen. They never really addressed the issue of passing filenames between command-line tools at all, until PowerShell, where you pass them as a list of UTF-16 strings rather than a stream of newline-separated encoded bytes. (As a side note, I have no idea how well Python works for writing PowerShell cmdlets, but I don't think that's relevant to the current proposal.)

> Unicode (W) API should enforce utf-16 in principle
> since Windows 2000 [4]. But I expect ucs-2 shows its ugly head in many
> places due to bad programming practices (based on the common wrong
> assumption that Unicode == UTF-16 == UCS-2) and/or bugs that are not
> fixed due to MS' backwards compatibility policies in the past [5].

Yes, I've run into such bugs in the past. It's even more fun when you're dealing with unterminated string with separate length interfaces. Fortunately, as far as I know, no such bugs affect reading and writing binary files, pipes, and sockets, so they don't affect us here.

From 4kir4.1i at gmail.com  Thu Jul 24 11:07:59 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Thu, 24 Jul 2014 13:07:59 +0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com> (Andrew
 Barnert's message of "Wed, 23 Jul 2014 08:49:19 -0700")
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com>
 <E1FD0B3B-D82D-4805-98EC-4C595CE9D534@yahoo.com>
 <87oawgmfxp.fsf@gmail.com>
 <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com>
Message-ID: <87egxbm8eo.fsf@gmail.com>

Andrew Barnert <abarnert at yahoo.com> writes:

> On Jul 23, 2014, at 5:13, Akira Li <4kir4.1i at gmail.com> wrote:
>> Andrew Barnert <abarnert at yahoo.com> writes:
>>> On Jul 22, 2014, at 9:05, Akira Li <4kir4.1i at gmail.com> wrote:
>>>> Paul Moore <p.f.moore at gmail.com> writes:
>>>>> On 21 July 2014 01:41, Andrew Barnert
>>>>> <abarnert at yahoo.com.dmarc.invalid> wrote:
>>>>>> OK, I wrote up a draft PEP, and attached it to the bug (if that's
>>>>>> not a good thing to do, apologies); you can find it at
>>>>>> http://bugs.python.org/file36008/pep-newline.txt
>>>>>
>>>>> As a suggestion, how about adding an example of a simple nul-separated
>>>>> filename filter - the sort of thing that could go in a find -print0 |
>>>>> xxx | xargs -0 pipeline? If I understand it, that's one of the key
>>>>> motivating examples for this change, so seeing how it's done would be
>>>>> a great help.
>>>>
>>>> `find -print0 | ./tr-filename -0 | xargs -0` example implies that you
>>>> can replace `sys.std*` streams without worrying about preserving
>>>> `sys.__std*__` streams:
>>>>
>>>> #!/usr/bin/env python
>>>> import io
>>>> import re
>>>> import sys
>>>> from pathlib import Path
>>>>
>>>> def transform_filename(filename: str) -> str: # example
>>>>    """Normalize whitespace in basename."""
>>>>    path = Path(filename)
>>>>    new_path = path.with_name(re.sub(r'\s+', ' ', path.name))
>>>>    path.replace(new_path) # rename on disk if necessary
>>>>    return str(new_path)
>>>>
>>>> def SystemTextStream(bytes_stream, **kwargs):
>>>>    encoding = sys.getfilesystemencoding()
>>>>    return io.TextIOWrapper(bytes_stream,
>>>>        encoding=encoding,
>>>>        errors='surrogateescape' if encoding != 'mbcs' else 'strict',
>>>>        **kwargs)
>>>>
>>>> nl = '\0' if '-0' in sys.argv else None
>>>> sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
>>>> for line in SystemTextStream(sys.stdin.detach(), newline=nl):
>>>>    print(transform_filename(line.rstrip(nl)), end=nl)
>>>
>>> Nice, much more complete example than mine. I just tried to handle as
>>> many edge cases as the original he asked about, but you handle
>>> everything.
>>>>
>>>> io.TextIOWrapper() plays the role of open() in this case. The code
>>>> assumes that `newline` parameter accepts '\0'.
>>>>
>>>> The example function handles Unicode whitespace to demonstrate why
>>>> opaque bytes-based cookies can't be used to represent filenames in this
>>>> case even on POSIX, though which characters are recognized depends on
>>>> sys.getfilesystemencoding().
>>>>
>>>> Note:
>>>>
>>>> - `end=nl` is necessary because `print()` prints '\n' by default -- it
>>>> does not use `file.newline`
>>>
>>> Actually, yes it does. Or, rather, print pastes on a '\n', but
>>> sys.stdout.write translates any '\n' characters to sys.stdout.writenl
>>> (a private variable that's initialized from the newline argument at
>>> construction time if it's anything other than None or '').
>>
>> You are right. I've stopped reading the source for print() function at
>> `PyFile_WriteString("\n", file);` line assuming that "\n" is not
>> translated if newline="\0". But the current behaviour if "\0" were in
>> "the other legal values" category (like "\r") would be to translate "\n"
>> [1]:
>>
>> When writing output to the stream, if newline is None, any '\n'
>> characters written are translated to the system default line
>> separator, os.linesep. If newline is '' or '\n', no translation takes
>> place. If newline is any of the other legal values, any '\n'
>> characters written are translated to the given string.
>>
>> [1] https://docs.python.org/3/library/io.html#io.TextIOWrapper
>>
>> Example:
>>
>> $ ./python -c 'import sys, io;
>> sys.stdout=io.TextIOWrapper(sys.stdout.detach(), newline="\r\n");
>> sys.stdout.write("\n\r\r\n")'| xxd
>> 0000000: 0d0a 0d0d 0d0a                           ......
>>
>> "\n" is translated to b"\r\n" here and "\r" is left untouched (b"\r").
>>
>> In order to newline="\0" case to work, it should behave similar to
>> newline='' or newline='\n' case instead i.e., no translation should take
>> place, to avoid corrupting embed "\n\r" characters.
>
> The draft PEP discusses this. I think it would be more consistent to
> translate for \0, just like \r and \r\n.

I read the [draft]. No translation is a better choice here. Otherwise
(at the very least) it breaks `find -print0` use case.

[draft] http://bugs.python.org/file36008/pep-newline.txt

Simple things should be simple (i.e., no translation unless special case):

- binary file -- a stream of bytes: no structure, no translation on
  read/write
- text file -- a stream of Unicode codepoints
- file with fixed-length chunks:

    for chunk in iter(partial(file.read, chunksize), EOF):
        pass

- file with variable-length records (aka lines) which end with a
  separator or EOF: no translation, no escaping (no embed separators):

    for line in file:
        pass

  or

    line = file.readline() # next(file)

newline in {None, '', '\r', '\r\n'} is a (very important) special case
that represents the complicated legacy behavior for text files.

newline='\0' (like '\n') should be a *much simpler* case: no
translation on read/write, no escaping (no embed '\0', each '\0' in the
stream is a separator).

newline='\0' is simple to explain: readline/next return everything until
the next '\0' (including it) or EOF. It is simple to implement - no
translation is required.

readline(keep_end=True) keyword-only parameter and/or chomp()-like
method could be added to simplify removing a trailing newline.

newline in {"\N{NEL}", "\n\n", "\r\r", "\n\r"} behave like newline="\n"
i.e., no translation. New *docs for writing text files*:

  When writing output to the stream:

  - if newline is None, any '\n' characters written are translated to
    the system default line separator, os.linesep
  - if newline is '\r' or '\r\n', any '\n' characters written are
    translated to the given string
  - no translation takes place for any other newline value.

The docs for binary files are simpler:

   No translation takes place for any newline value. The line terminator
   is newline parameter (default is b'\n').

The new *docs for reading text files*:

  When reading input from the stream:

  - if newline is None, universal newlines mode is enabled: lines in the
    input can end in '\n', '\r', or '\r\n', and these are translated
    into '\n' before being returned to the caller
  - if newline is '', universal newlines mode is enabled, but line
    endings are returned to the caller untranslated
  - if newline is any other value, input lines are only terminated by
    the given string, and the line ending is returned to the caller
    untranslated.

The new behavior being more powerful is no more complex than the old one
https://docs.python.org/3.4/library/io.html#io.TextIOWrapper

Backwards compatibility is preserved except that newline parameter
accepts more values.

> For the your script, there is no reason to pass newline=nl to the
> stdout replacement. The only effect that has on output is \n
> replacement, which you don't want. And if we removed that effect from
> the proposal, it would have no effect at all on output, so why pass
> it?

Keep in mind, I expect that newline='\0' does *not* translate '\n' to
'\0'. If you remove newline=nl then embed \n might be corrupted i.e., it
breaks `find -print0` use-case. Both newline=nl for stdout and end=nl
are required here. Though (optionally) it would be nice to change
`print()` so that it would use `end=file.newline or '\n'` by default
instead.

There is also line_buffering parameter. From the docs:

  If line_buffering is True, flush() is implied when a call to write
  contains a newline character.

i.e., you might also need newline=nl to flush() the stream in time.

For example, the absense of the flush() call on newline may lead to a
deadlock if subprocess module is used to implement pexpect-like
behavior. There are corresponding Python issues:

- text mode http://bugs.python.org/issue21332 : add line_buffering=True
  if bufsize=1, to avoid a deadlock (regression from Python 2 behavior)

- binary mode http://bugs.python.org/issue21471 : implement
  line_buffering=True behavior for binary files when bufsize=1

> Do you have a use case where you need to pass a non-standard newline
> to a text file/stream, but don't want newline replacement?

`find -print0` use case that my code implements above.

> Or is it just a matter of avoiding confusion if people accidentally
> pass it for stdout when they didn't want it?

See the explanation above that starts with "Simple things should be simple."

>> My original code
>> works as is in this case i.e., *end=nl is still necessary*.
>
>>> But of course that's the newline argument to sys.stdout, and you only
>>> changed sys.stdin, so you do need end=nl anyway. (And you wouldn't
>>> want output translation here anyway, because that could also translate
>>> \n' characters in the middle of a line, re-creating the same problem
>>> we're trying to avoid...)
>>>
>>> But it uses sys.stdout.newline, not sys.stdin.newline.
>>
>> The code affects *both* sys.stdout/sys.stdin. Look [2]:
>
> I didn't notice that you passed it for stdout as well--as I explained
> above, you don't need it, and shouldn't do it.

Both newline=nl and end=nl are needed because I assume that there is no
newline translation in newline='\0' case. See the explanation
above. Here's the same code for context:

  sys.stdout = SystemTextStream(sys.stdout.detach(), newline=nl)
  for line in SystemTextStream(sys.stdin.detach(), newline=nl):
      print(transform_filename(line.rstrip(nl)), end=nl)
 
[2] https://mail.python.org/pipermail/python-ideas/2014-July/028372.html

> As a side note, I think it might have been a better design to have
> separate arguments for input newline, output newline, and universal
> newlines mode, instead of cramming them all into one argument; for
> some simple cases the current design makes things a little less
> verbose, but it gets in the way for more complex cases, even today
> with \r or \r\n. However, I don't think that needs to be changed as
> part of this proposal.

Usually different objects are used for input and output i.e., a single
newline parameter allows input newlines to be different from output
newlines. 

The newline behavior for reading and writing is different but it is
closely related. Having two parameters wouldn't make the documentation
simpler.

Separate parameters might be useful if the same file object is used for
reading and writing *and* input/output newlines are different from each
other. But I don't think it is worth it to complicate the common case
(separate objects).


--
Akira

From wolfgang.maier at biologie.uni-freiburg.de  Thu Jul 24 15:45:53 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Thu, 24 Jul 2014 15:45:53 +0200
Subject: [Python-ideas] os.path.argparse - optional startdir argument
Message-ID: <lqr2mh$s27$1@ger.gmane.org>

Dear all,

currently, os.path.abspath(somepath) is, essentially, equivalent to

os.path.normpath(os.path.join(os.getcwd(),path)).

However, I'd find it useful, occasionally, to be able to specify a 
starting directory other than the current working directory.

One such situation is when reading a config file of an application: if 
you encounter a relative link in such a file, you'll typically want to 
transform it into an absolute path using that application's working 
directory as opposed to your own one.

My suggestion would be to add an optional startdir argument to abspath, 
which, when provided would be used instead of os.getcwd(). If startdir 
itself is not an absolute path either, it would be turned into one 
through recursion.

Currently, you have to write:

os.path.normpath(os.path.join(startdir, path))
or even
os.path.normpath(os.path.join(os.path.abspath(startdir), path))

instead of the proposed:

os.path.abspath(path, startdir)

Before posting I checked the bug tracker and found that this idea has 
been brought up years ago (http://bugs.python.org/issue9882), but not 
pursued further.
The patch suggested there is a bit of an oversimplification, but I have 
my own one, which I could provide if someone's interested.
For issue9882 it was suggested to bring it up on python-ideas, but to 
the best of my knowledge that was never done, so I'm doing it now.

Thoughts ?

Wolfgang


From wolfgang.maier at biologie.uni-freiburg.de  Thu Jul 24 16:43:59 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Thu, 24 Jul 2014 16:43:59 +0200
Subject: [Python-ideas] os.path.abspath - what was I thinking (was Re:
 os.path.argparse - optional startdir argument)
In-Reply-To: <lqr2mh$s27$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
Message-ID: <lqr63f$77c$1@ger.gmane.org>

Just realized my typo: I meant os.path.abspath in the title - don't know 
what I was thinking about when I typed that


From apalala at gmail.com  Thu Jul 24 17:21:15 2014
From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=)
Date: Thu, 24 Jul 2014 10:51:15 -0430
Subject: [Python-ideas] os.path.argparse - optional startdir argument
In-Reply-To: <lqr2mh$s27$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
Message-ID: <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>

On Thu, Jul 24, 2014 at 9:15 AM, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote:

> os.path.normpath(os.path.join(os.getcwd(),path)).
>
> However, I'd find it useful, occasionally, to be able to specify a
> starting directory other than the current working directory.
>

os.path.normpath(os.path.join(config_dir, path))

Better yet, use the pathlib module.

Cheers,




-- 
Juancarlo *A?ez*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140724/ce86d208/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Thu Jul 24 17:30:58 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Thu, 24 Jul 2014 17:30:58 +0200
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
Message-ID: <lqr8ri$s27$1@ger.gmane.org>

On 24.07.2014 17:21, Juancarlo A?ez wrote:
>
> On Thu, Jul 24, 2014 at 9:15 AM, Wolfgang Maier
> <wolfgang.maier at biologie.uni-freiburg.de
> <mailto:wolfgang.maier at biologie.uni-freiburg.de>>
> wrote:
>
>     os.path.normpath(os.path.join(__os.getcwd(),path)).
>
>     However, I'd find it useful, occasionally, to be able to specify a
>     starting directory other than the current working directory.
>
>
> os.path.normpath(os.path.join(config_dir, path))
>

As I said, I'm aware of this, but it's ugly and even uglier if you have 
to turn config_dir into an absolute path itself.

> Better yet, use the pathlib module.
>

As it stands, the pathlib module is only provisional plus, IMO, kind of 
overkill for a simple task like that.

> Cheers,
> Juancarlo *A?ez*



From techtonik at gmail.com  Thu Jul 24 18:51:08 2014
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 24 Jul 2014 19:51:08 +0300
Subject: [Python-ideas] os.path.cansymlink(path)
Message-ID: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>

This is a live code from current virtualenv.py:

    if hasattr(os, 'symlink'):
        logger.info('Symlinking Python bootstrap modules')

This code is wrong, because OS support for
symlinks doesn't guarantee that mounted filesystem
can do this, resulting in OSError at runtime. So, the
proper check would be to check if specific path
supports symlinking.

The idea is:

    os.path.cansymlink(path)  - Return True if filesystem
        of specified path can be symlinked.

Yes/No/Opinions?
-- 
anatoly t.

From apalala at gmail.com  Thu Jul 24 18:53:00 2014
From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=)
Date: Thu, 24 Jul 2014 12:23:00 -0430
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqr8ri$s27$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org>
Message-ID: <CAN1YFWva1=u3eDbyOodwWnzGo=q_kHE9_shubPR142dKpQAX6A@mail.gmail.com>

On Thu, Jul 24, 2014 at 11:00 AM, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote:

> As it stands, the pathlib module is only provisional plus, IMO, kind of
> overkill for a simple task like that.


https://docs.python.org/3/library/pathlib.html

The pathlib module is "*New in version 3.4"*. There's an implementation for
previous versions of Python in PyPi.

https://pypi.python.org/pypi/pathlib

The pathlib module is not overkill, as it provides the same functionality
as os.path, but in a more OO and syntactically simpler form:

(configpath / filepath).resolve()

Cheers,

-- 
Juancarlo *A?ez*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140724/85c0cb0e/attachment.html>

From geoffspear at gmail.com  Thu Jul 24 19:33:28 2014
From: geoffspear at gmail.com (Geoffrey Spear)
Date: Thu, 24 Jul 2014 13:33:28 -0400
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
Message-ID: <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>

On Thu, Jul 24, 2014 at 12:51 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> This is a live code from current virtualenv.py:
>
>     if hasattr(os, 'symlink'):
>         logger.info('Symlinking Python bootstrap modules')
>
> This code is wrong, because OS support for
> symlinks doesn't guarantee that mounted filesystem
> can do this, resulting in OSError at runtime. So, the
> proper check would be to check if specific path
> supports symlinking.
>
> The idea is:
>
>     os.path.cansymlink(path)  - Return True if filesystem
>         of specified path can be symlinked.
>
> Yes/No/Opinions?

Surely the third-party module you found that wrong code in has their
own communication channels?

From steve at pearwood.info  Thu Jul 24 19:43:41 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 25 Jul 2014 03:43:41 +1000
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
Message-ID: <20140724174341.GU9112@ando>

On Thu, Jul 24, 2014 at 07:51:08PM +0300, anatoly techtonik wrote:
> This is a live code from current virtualenv.py:
> 
>     if hasattr(os, 'symlink'):
>         logger.info('Symlinking Python bootstrap modules')
> 
> This code is wrong, because OS support for
> symlinks doesn't guarantee that mounted filesystem
> can do this, resulting in OSError at runtime. So, the
> proper check would be to check if specific path
> supports symlinking.
> 
> The idea is:
> 
>     os.path.cansymlink(path)  - Return True if filesystem
>         of specified path can be symlinked.
> 
> Yes/No/Opinions?

No. Even if the file system supports symlinks, doesn't mean that you can 
create one. You may not have privileges to create the symlink, or some 
other runtime error may occur.

Like most other file system operations, you should guard them with a 
try...except, not "Look Before You Leap".


-- 
Steven

From steve at pearwood.info  Thu Jul 24 19:45:31 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 25 Jul 2014 03:45:31 +1000
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
 <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
Message-ID: <20140724174531.GV9112@ando>

On Thu, Jul 24, 2014 at 01:33:28PM -0400, Geoffrey Spear wrote:
> On Thu, Jul 24, 2014 at 12:51 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> > This is a live code from current virtualenv.py:
[...]
> Surely the third-party module you found that wrong code in has their
> own communication channels?

Anatoly is not asking for a fix for the (possibly) buggy code in 
virtualenv, but suggesting an enhancement for the Python standard 
library. That makes this the right place to ask the question.


-- 
Steven

From dw+python-ideas at hmmz.org  Thu Jul 24 19:53:16 2014
From: dw+python-ideas at hmmz.org (dw+python-ideas at hmmz.org)
Date: Thu, 24 Jul 2014 17:53:16 +0000
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
 <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
Message-ID: <20140724175316.GA14260@k2>

On Thu, Jul 24, 2014 at 01:33:28PM -0400, Geoffrey Spear wrote:

> > This code is wrong, because OS support for symlinks doesn't
> > guarantee that mounted filesystem can do this, resulting in OSError
> > at runtime. So, the proper check would be to check if specific path
> > supports symlinking.
> >
> > The idea is:
> >
> >     os.path.cansymlink(path)  - Return True if filesystem
> >         of specified path can be symlinked.
> >
> > Yes/No/Opinions?

-1, since there is no sane way to guarantee a FS operation will succeed
without trying it in most cases. Even if a filesystem (driver) supports
the operation, the filesystem (data) might be exhausted, e.g. inode
count, max directory entries, ... And if not that, then e.g. in the case
of NFS or CIFS, while the protocol might support the operation, there is
no mechanism for a particular server implementation to communicate that
it does not support it.

Even if none of this were true, it also introduces a race between a
program testing the state of the filesystem, and that state changing,
e.g. due to USB disconnect, or a lazy unmount succeeding, or..


David

> 
> Surely the third-party module you found that wrong code in has their
> own communication channels?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From storchaka at gmail.com  Thu Jul 24 21:24:40 2014
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 24 Jul 2014 22:24:40 +0300
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqr2mh$s27$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
Message-ID: <lqrmgd$6kr$1@ger.gmane.org>

24.07.14 16:45, Wolfgang Maier ???????(??):
> currently, os.path.abspath(somepath) is, essentially, equivalent to
>
> os.path.normpath(os.path.join(os.getcwd(),path)).

Actually currently posixpath.abspath() is more complicated and 
ntpath.abspath() has totally different implementation.

> Currently, you have to write:
>
> os.path.normpath(os.path.join(startdir, path))
> or even
> os.path.normpath(os.path.join(os.path.abspath(startdir), path))

Yes, it is natural and straightforward way. You can define your own 
function if you need this often.

> Before posting I checked the bug tracker and found that this idea has
> been brought up years ago (http://bugs.python.org/issue9882), but not
> pursued further.
> The patch suggested there is a bit of an oversimplification, but I have
> my own one, which I could provide if someone's interested.
> For issue9882 it was suggested to bring it up on python-ideas, but to
> the best of my knowledge that was never done, so I'm doing it now.
>
> Thoughts ?

This will add to abspath() a feature which is unrelated to the purpose 
of abspath(). This will complicate the API without significant benefit.

I'm -1.


From wolfgang.maier at biologie.uni-freiburg.de  Thu Jul 24 23:32:01 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Thu, 24 Jul 2014 23:32:01 +0200
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqrmgd$6kr$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org> <lqrmgd$6kr$1@ger.gmane.org>
Message-ID: <lqru0h$gs3$1@ger.gmane.org>

On 24.07.2014 21:24, Serhiy Storchaka wrote:
> 24.07.14 16:45, Wolfgang Maier ???????(??):
>> currently, os.path.abspath(somepath) is, essentially, equivalent to
>>
>> os.path.normpath(os.path.join(os.getcwd(),path)).
>
> Actually currently posixpath.abspath() is more complicated and
> ntpath.abspath() has totally different implementation.

I know, that's why I wrote "essentially" and "equivalent" instead of "is 
implemented as". It's still easy to patch both the posixpath and the 
ntpath version though.

>
>> Currently, you have to write:
>>
>> os.path.normpath(os.path.join(startdir, path))
>> or even
>> os.path.normpath(os.path.join(os.path.abspath(startdir), path))
>
> Yes, it is natural and straightforward way. You can define your own
> function if you need this often.
>

I'm not saying, this is a must-have in Python. I don't have a problem 
with sticking to normpath, just thought it's a tiny change giving some 
benefit in readability.

>> Before posting I checked the bug tracker and found that this idea has
>> been brought up years ago (http://bugs.python.org/issue9882), but not
>> pursued further.
>> The patch suggested there is a bit of an oversimplification, but I have
>> my own one, which I could provide if someone's interested.
>> For issue9882 it was suggested to bring it up on python-ideas, but to
>> the best of my knowledge that was never done, so I'm doing it now.
>>
>> Thoughts ?
>
> This will add to abspath() a feature which is unrelated to the purpose
> of abspath(). This will complicate the API without significant benefit.
>

It would not complicate the API all that much. If you don't want to use 
the argument, just ignore it, it would be optional. As pointed out in 
the bug tracker issue, it is also not without precedence, 
os.path.relpath has a start argument already.



From tjreedy at udel.edu  Thu Jul 24 23:49:51 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 24 Jul 2014 17:49:51 -0400
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqr8ri$s27$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org>
Message-ID: <lqrv27$ua2$1@ger.gmane.org>

On 7/24/2014 11:30 AM, Wolfgang Maier wrote:
> On 24.07.2014 17:21, Juancarlo A?ez wrote:

>> Better yet, use the pathlib module.

Thank for the reminder. I took a better look at it.

> As it stands, the pathlib module is only provisional plus,

'Provisional' means that there *could* be a few api changes that would 
break code. The module is not going away.

> IMO, kind of overkill for a simple task like that.

Overkill?

import pathlib as path
import os.path as path

are equally easy

The 'simple task' combines joining, normalizing, and 'absoluting'. 
pathlib.Path joins, Path.resolve normalizes and 'absolutes'. Together 
they combine the functions of os.path.join, os.path.abspath and 
os.path.normpath, with a nicer syntax, and with OS awareness.

 >>> path.Path('../../../Python27/lib', 'ast.py').resolve()
WindowsPath('C:/Programs/Python27/Lib/ast.py')

If one starts with a Path object, as would be typical, one can use '/' 
to join, as JuanCarlo mentioned.

 >>> base = path.Path('.')
 >>> (base / '../../../Python27/lib' / 'ast.py').resolve()
WindowsPath('C:/Programs/Python27/Lib/ast.py')

-- 
Terry Jan Reedy



From apalala at gmail.com  Fri Jul 25 00:29:44 2014
From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=)
Date: Thu, 24 Jul 2014 17:59:44 -0430
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqrv27$ua2$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
Message-ID: <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>

On a related topic...

What's missing in Python 3.4, is that most modules with functions or
methods that take file names or file paths as parameters are not
pathlib-aware, so a mandatory str(mypahtlibpath) is required.

For example, you cannot do:

f = open(Path(_file__) / 'app.conf')

It will fail.

But pathlib as part of the standard lib is new, so it's OK.

It will take time how know where in the module dependency hierarchy it
should belong.

Cheers,



On Thu, Jul 24, 2014 at 5:19 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 7/24/2014 11:30 AM, Wolfgang Maier wrote:
>
>> On 24.07.2014 17:21, Juancarlo A?ez wrote:
>>
>
>  Better yet, use the pathlib module.
>>>
>>
> Thank for the reminder. I took a better look at it.
>
>
>  As it stands, the pathlib module is only provisional plus,
>>
>
> 'Provisional' means that there *could* be a few api changes that would
> break code. The module is not going away.
>
>
>  IMO, kind of overkill for a simple task like that.
>>
>
> Overkill?
>
> import pathlib as path
> import os.path as path
>
> are equally easy
>
> The 'simple task' combines joining, normalizing, and 'absoluting'.
> pathlib.Path joins, Path.resolve normalizes and 'absolutes'. Together they
> combine the functions of os.path.join, os.path.abspath and
> os.path.normpath, with a nicer syntax, and with OS awareness.
>
> >>> path.Path('../../../Python27/lib', 'ast.py').resolve()
> WindowsPath('C:/Programs/Python27/Lib/ast.py')
>
> If one starts with a Path object, as would be typical, one can use '/' to
> join, as JuanCarlo mentioned.
>
> >>> base = path.Path('.')
> >>> (base / '../../../Python27/lib' / 'ast.py').resolve()
> WindowsPath('C:/Programs/Python27/Lib/ast.py')
>
> --
> Terry Jan Reedy
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
Juancarlo *A?ez*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140724/24fd9a11/attachment-0001.html>

From rymg19 at gmail.com  Fri Jul 25 03:54:00 2014
From: rymg19 at gmail.com (Ryan)
Date: Thu, 24 Jul 2014 20:54:00 -0500
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
Message-ID: <dc8c7d70-c696-446f-a715-9536f9e38890@email.android.com>

Instead, though, you'd do:

f = (Path(_file__) / 'app.conf').open()

https://docs.python.org/3/library/pathlib.html#pathlib.Path.open 


"Juancarlo A?ez" <apalala at gmail.com> wrote:
>On a related topic...
>
>What's missing in Python 3.4, is that most modules with functions or
>methods that take file names or file paths as parameters are not
>pathlib-aware, so a mandatory str(mypahtlibpath) is required.
>
>For example, you cannot do:
>
>f = open(Path(_file__) / 'app.conf')
>
>It will fail.
>
>But pathlib as part of the standard lib is new, so it's OK.
>
>It will take time how know where in the module dependency hierarchy it
>should belong.
>
>Cheers,
>
>
>
>On Thu, Jul 24, 2014 at 5:19 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> On 7/24/2014 11:30 AM, Wolfgang Maier wrote:
>>
>>> On 24.07.2014 17:21, Juancarlo A?ez wrote:
>>>
>>
>>  Better yet, use the pathlib module.
>>>>
>>>
>> Thank for the reminder. I took a better look at it.
>>
>>
>>  As it stands, the pathlib module is only provisional plus,
>>>
>>
>> 'Provisional' means that there *could* be a few api changes that
>would
>> break code. The module is not going away.
>>
>>
>>  IMO, kind of overkill for a simple task like that.
>>>
>>
>> Overkill?
>>
>> import pathlib as path
>> import os.path as path
>>
>> are equally easy
>>
>> The 'simple task' combines joining, normalizing, and 'absoluting'.
>> pathlib.Path joins, Path.resolve normalizes and 'absolutes'. Together
>they
>> combine the functions of os.path.join, os.path.abspath and
>> os.path.normpath, with a nicer syntax, and with OS awareness.
>>
>> >>> path.Path('../../../Python27/lib', 'ast.py').resolve()
>> WindowsPath('C:/Programs/Python27/Lib/ast.py')
>>
>> If one starts with a Path object, as would be typical, one can use
>'/' to
>> join, as JuanCarlo mentioned.
>>
>> >>> base = path.Path('.')
>> >>> (base / '../../../Python27/lib' / 'ast.py').resolve()
>> WindowsPath('C:/Programs/Python27/Lib/ast.py')
>>
>> --
>> Terry Jan Reedy
>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
>-- 
>Juancarlo *A?ez*
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140724/ec7c06f2/attachment.html>

From apalala at gmail.com  Fri Jul 25 04:43:13 2014
From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=)
Date: Thu, 24 Jul 2014 22:13:13 -0430
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <dc8c7d70-c696-446f-a715-9536f9e38890@email.android.com>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
 <dc8c7d70-c696-446f-a715-9536f9e38890@email.android.com>
Message-ID: <CAN1YFWt9BAjz+1HLLyXCDxUSiAq+7Aq=kP46k1YwZfFzEJqPeA@mail.gmail.com>

On Thu, Jul 24, 2014 at 9:24 PM, Ryan <rymg19 at gmail.com> wrote:

> Instead, though, you'd do:
>
> f = (Path(_file__) / 'app.conf').open()
>

Indeed, that solves the "right place in the module dependency hierarchy"
thing, and it even has an "econding=" kwarg!

I hadn't paid attention to it. Sorry.

Problem solved!

Thanks!


-- 
Juancarlo *A?ez*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140724/1aaf2b4a/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Fri Jul 25 09:40:32 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Fri, 25 Jul 2014 09:40:32 +0200
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqrv27$ua2$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
Message-ID: <53D209F0.8040302@biologie.uni-freiburg.de>

On 24.07.2014 23:49, Terry Reedy wrote:
> On 7/24/2014 11:30 AM, Wolfgang Maier wrote:
>> On 24.07.2014 17:21, Juancarlo A?ez wrote:
>
>>> Better yet, use the pathlib module.
>
> Thank for the reminder. I took a better look at it.
>
>> As it stands, the pathlib module is only provisional plus,
>
> 'Provisional' means that there *could* be a few api changes that would
> break code. The module is not going away.
>

The 3.4 docs explicitly mention the possibility:

<quote>

Note:

This module has been included in the standard library on a provisional 
basis. Backwards incompatible changes (up to and including removal of 
the package) may occur if deemed necessary by the core developers.

</quote>

>> IMO, kind of overkill for a simple task like that.
>
> Overkill?
>
> import pathlib as path
> import os.path as path
>
> are equally easy
>
> The 'simple task' combines joining, normalizing, and 'absoluting'.
> pathlib.Path joins, Path.resolve normalizes and 'absolutes'. Together
> they combine the functions of os.path.join, os.path.abspath and
> os.path.normpath, with a nicer syntax, and with OS awareness.
>

Yes, the syntax is nicer *now*, but with my proposed change to 
os.path.abspath things would look quite similar:

pathlib version now:
>  >>> path.Path('../../../Python27/lib', 'ast.py').resolve()

os.path as proposed:
os.path.abspath('ast.py', '../../../Python27/lib')

So I would see this as an argument for the proposal rather than against it.

Even if the pathlib module will stay, I am not sure whether that should 
exclude enhancements in overlapping parts of os.path.

Anyway, that whole thing is not that important to me, so if nobody finds 
it useful, then let's stick to the status quo.


From ncoghlan at gmail.com  Fri Jul 25 10:18:08 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Jul 2014 18:18:08 +1000
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
Message-ID: <CADiSq7fOE4qD1fhvsFVcTejoVUqE8O1LnrexKQjQ4FScAnk9WA@mail.gmail.com>

On 25 Jul 2014 08:33, "Juancarlo A?ez" <apalala at gmail.com> wrote:
>
> On a related topic...
>
> What's missing in Python 3.4, is that most modules with functions or
methods that take file names or file paths as parameters are not
pathlib-aware, so a mandatory str(mypahtlibpath) is required.
>
> For example, you cannot do:
>
> f = open(Path(_file__) / 'app.conf')
>
> It will fail.

Just like ipaddress, this is a deliberate design choice that avoids
coupling low level APIs to a high level convenience library.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140725/41429be8/attachment.html>

From tjreedy at udel.edu  Fri Jul 25 10:26:15 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 25 Jul 2014 04:26:15 -0400
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <53D209F0.8040302@biologie.uni-freiburg.de>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <53D209F0.8040302@biologie.uni-freiburg.de>
Message-ID: <lqt4bf$t3i$1@ger.gmane.org>

On 7/25/2014 3:40 AM, Wolfgang Maier wrote:

> Yes, the syntax is nicer *now*, but with my proposed change to
> os.path.abspath things would look quite similar:
>
> pathlib version now:
>>  >>> path.Path('../../../Python27/lib', 'ast.py').resolve()
>
> os.path as proposed:
> os.path.abspath('ast.py', '../../../Python27/lib')
>
> So I would see this as an argument for the proposal rather than against it.
>
> Even if the pathlib module will stay, I am not sure whether that should
> exclude enhancements in overlapping parts of os.path.

I understand your reasoning. But it leaves out the following. When a 
feature is added, use of the feature makes code incompatible with 
previous versions. So we generally like new features to add more than 
this one would.

If you look hard enough, I am sure that you can find an addition that by 
this criteria should not have been added. If you do, I will probably 
agree that it should not have been.

> Anyway, that whole thing is not that important to me, so if nobody finds
> it useful, then let's stick to the status quo.

It is a matter of useful enough to justify the cost.

-- 
Terry Jan Reedy


From me+python at ixokai.io  Fri Jul 25 09:54:59 2014
From: me+python at ixokai.io (Stephen Hansen)
Date: Fri, 25 Jul 2014 00:54:59 -0700
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <53D209F0.8040302@biologie.uni-freiburg.de>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <53D209F0.8040302@biologie.uni-freiburg.de>
Message-ID: <CAM1gar4Y+5qc=tRzhejyTZk_GKYTYP7N7ww+NKXMnf7+Mbvgew@mail.gmail.com>

Warning: Lurker...

On Fri, Jul 25, 2014 at 12:40 AM, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote

>
> Yes, the syntax is nicer *now*, but with my proposed change to
> os.path.abspath things would look quite similar:
>
> pathlib version now:
>
>   >>> path.Path('../../../Python27/lib', 'ast.py').resolve()
>>
>
> os.path as proposed:
> os.path.abspath('ast.py', '../../../Python27/lib')
>
> So I would see this as an argument for the proposal rather than against it.
>

Am I the only one who sees this as completely crazy-talk and an argument
against? The idea that os.path.xxx(y,z) could be interpreted as z+y then
resolved is a completely horrible API. The pathlib version keeps the parts
of the path in order, and then resolves them, and where things are, well,
they're clear. The proposed os.path modification reads, to me, as nonsense.
Half of me wants to say it is asking to find the absolute path of ast.py
and find this additional component in relation to that absolute path, the
other half of me just shuts down. "os.path.abspath('ast.py',
'../../../Python27/lib')" speaks in no way to me of absoluteness. There's
two relative paths in its arguments and no sensible way of interpreting
that comes forth, to me.

It may make sense if you were adding a keyword-only argument, maybe,
(maaaybe), but as an example of how they are similar it is IMHO a stark
sign against why its ever so not similar and in fact, bad.

The pathlib version conveys a fairly clear idea of where the files its
talking about are located. The proposal is just weird.

/relurk.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140725/ec08de1d/attachment.html>

From g.rodola at gmail.com  Fri Jul 25 11:54:07 2014
From: g.rodola at gmail.com (Giampaolo Rodola')
Date: Fri, 25 Jul 2014 11:54:07 +0200
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <20140724175316.GA14260@k2>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
 <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
 <20140724175316.GA14260@k2>
Message-ID: <CAFYqXL9WDtrsVU=NAnjq8FbA0rqVWxeaKj+VoBSq=yXCb_g0eg@mail.gmail.com>

-1 for me as well given the reasons mentioned above.
Il 24/lug/2014 20:01 <dw+python-ideas at hmmz.org> ha scritto:

> On Thu, Jul 24, 2014 at 01:33:28PM -0400, Geoffrey Spear wrote:
>
> > > This code is wrong, because OS support for symlinks doesn't
> > > guarantee that mounted filesystem can do this, resulting in OSError
> > > at runtime. So, the proper check would be to check if specific path
> > > supports symlinking.
> > >
> > > The idea is:
> > >
> > >     os.path.cansymlink(path)  - Return True if filesystem
> > >         of specified path can be symlinked.
> > >
> > > Yes/No/Opinions?
>
> -1, since there is no sane way to guarantee a FS operation will succeed
> without trying it in most cases. Even if a filesystem (driver) supports
> the operation, the filesystem (data) might be exhausted, e.g. inode
> count, max directory entries, ... And if not that, then e.g. in the case
> of NFS or CIFS, while the protocol might support the operation, there is
> no mechanism for a particular server implementation to communicate that
> it does not support it.
>
> Even if none of this were true, it also introduces a race between a
> program testing the state of the filesystem, and that state changing,
> e.g. due to USB disconnect, or a lazy unmount succeeding, or..
>
>
> David
>
> >
> > Surely the third-party module you found that wrong code in has their
> > own communication channels?
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140725/e16314ef/attachment.html>

From techtonik at gmail.com  Fri Jul 25 12:08:23 2014
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 25 Jul 2014 13:08:23 +0300
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
 <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
Message-ID: <CAPkN8x+gouoa4b9js77BfmRf1xiRHhCgYjNCDBamMOTHjHZgTA@mail.gmail.com>

On Thu, Jul 24, 2014 at 8:33 PM, Geoffrey Spear <geoffspear at gmail.com> wrote:
> On Thu, Jul 24, 2014 at 12:51 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>> This is a live code from current virtualenv.py:
>>
>>     if hasattr(os, 'symlink'):
>>         logger.info('Symlinking Python bootstrap modules')
>>
>> This code is wrong, because OS support for
>> symlinks doesn't guarantee that mounted filesystem
>> can do this, resulting in OSError at runtime. So, the
>> proper check would be to check if specific path
>> supports symlinking.
>>
>> The idea is:
>>
>>     os.path.cansymlink(path)  - Return True if filesystem
>>         of specified path can be symlinked.
>>
>> Yes/No/Opinions?
>
> Surely the third-party module you found that wrong code in has their
> own communication channels?

I can't get how does this comment contributes to the idea. Care to
explain?
-- 
anatoly t.

From storchaka at gmail.com  Fri Jul 25 12:12:07 2014
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 25 Jul 2014 13:12:07 +0300
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqru0h$gs3$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org> <lqrmgd$6kr$1@ger.gmane.org>
 <lqru0h$gs3$1@ger.gmane.org>
Message-ID: <lqtagd$ba8$1@ger.gmane.org>

25.07.14 00:32, Wolfgang Maier ???????(??):
> On 24.07.2014 21:24, Serhiy Storchaka wrote:
>> 24.07.14 16:45, Wolfgang Maier ???????(??):
> I'm not saying, this is a must-have in Python. I don't have a problem
> with sticking to normpath, just thought it's a tiny change giving some
> benefit in readability.

To me explicit well known join() and normpath() are more readable then 
unexpected second argument to abspath().

>> This will add to abspath() a feature which is unrelated to the purpose
>> of abspath(). This will complicate the API without significant benefit.
> It would not complicate the API all that much. If you don't want to use
> the argument, just ignore it, it would be optional. As pointed out in
> the bug tracker issue, it is also not without precedence,
> os.path.relpath has a start argument already.

The inverse of two-argument relpath() is join(), not abspath(). 
Two-argument relpath() is only the way to compute relative patch between 
two patches. This is essential functionality, there is no redundancy. 
But two-argument abspath() will be redundant.

Not every one-line function should be added to the stdlib. And I found 
only 10 usages of normpath(join()) combination in Python source three 
(including 4 in tests and 4 in PC build script, therefore only 2 in the 
stdlib itself) against 205 usages of abspath().


From techtonik at gmail.com  Fri Jul 25 12:17:13 2014
From: techtonik at gmail.com (anatoly techtonik)
Date: Fri, 25 Jul 2014 13:17:13 +0300
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <20140724175316.GA14260@k2>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
 <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
 <20140724175316.GA14260@k2>
Message-ID: <CAPkN8xLdiOXR02hAuT5RMhptjVkiSyF-iQj+6WFH=fBuJW2O1Q@mail.gmail.com>

On Thu, Jul 24, 2014 at 8:53 PM,  <dw+python-ideas at hmmz.org> wrote:
> On Thu, Jul 24, 2014 at 01:33:28PM -0400, Geoffrey Spear wrote:
>
>> > This code is wrong, because OS support for symlinks doesn't
>> > guarantee that mounted filesystem can do this, resulting in OSError
>> > at runtime. So, the proper check would be to check if specific path
>> > supports symlinking.
>> >
>> > The idea is:
>> >
>> >     os.path.cansymlink(path)  - Return True if filesystem
>> >         of specified path can be symlinked.
>> >
>> > Yes/No/Opinions?
>
> -1, since there is no sane way to guarantee a FS operation will succeed
> without trying it in most cases.
>
>  Even if a filesystem (driver) supports
> the operation, the filesystem (data) might be exhausted, e.g. inode
> count, max directory entries, ... And if not that, then e.g. in the case
> of NFS or CIFS, while the protocol might support the operation, there is
> no mechanism for a particular server implementation to communicate that
> it does not support it.

You do realize that high level program logic changes depending on the fact
that FS supports symlinks or not. It is not "an exceptional" case as you've
presented it.

This is not a replacement for os.symlink(), but doc link to this function will
help people avoid this trap and runtime errors in future.
-- 
anatoly t.

From phd at phdru.name  Fri Jul 25 12:23:40 2014
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 25 Jul 2014 12:23:40 +0200
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
Message-ID: <20140725102340.GA4015@phdru.name>

Hi!

On Thu, Jul 24, 2014 at 07:51:08PM +0300, anatoly techtonik <techtonik at gmail.com> wrote:
> This is a live code from current virtualenv.py:
> 
>     if hasattr(os, 'symlink'):
>         logger.info('Symlinking Python bootstrap modules')
> 
> This code is wrong, because OS support for
> symlinks doesn't guarantee that mounted filesystem
> can do this, resulting in OSError at runtime. So, the
> proper check would be to check if specific path
> supports symlinking.
> 
> The idea is:
> 
>     os.path.cansymlink(path)  - Return True if filesystem
>         of specified path can be symlinked.
> 
> Yes/No/Opinions?

   Such function (if it would be a function) should return one of three
answer, not two. Something like:

None - I don't know if the OS/fs support symlinks because another
       OSError occurred during test (perhaps not enough rights to write
       to the path);
False- the path clearly doesn't support symlinks;
True - the path positively supports symlinks.

   Implement the function in a module and publish the module at PyPI.
Warn users (in accompanying docs) that even if a path supports (or
doesn't support) symlinks this says nothing about any subpath of the
path because a subpath can be a mount of a different fs.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From rosuav at gmail.com  Fri Jul 25 12:53:02 2014
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 25 Jul 2014 20:53:02 +1000
Subject: [Python-ideas] os.path.cansymlink(path)
In-Reply-To: <CAPkN8xLdiOXR02hAuT5RMhptjVkiSyF-iQj+6WFH=fBuJW2O1Q@mail.gmail.com>
References: <CAPkN8xK5h4rEqNLKRTLaRW7WzdFfqTDJp8-T1Msey7dvQFvZjw@mail.gmail.com>
 <CAGifb9EQT5Es+yYxTmZn77XSG0Jq9oj5mJzYjzjWbh6DhGeVjw@mail.gmail.com>
 <20140724175316.GA14260@k2>
 <CAPkN8xLdiOXR02hAuT5RMhptjVkiSyF-iQj+6WFH=fBuJW2O1Q@mail.gmail.com>
Message-ID: <CAPTjJmo7Ltyk_szz2ebEGoOR+wdbpw0S3qwFFBZ7+gU2zXGeGQ@mail.gmail.com>

On Fri, Jul 25, 2014 at 8:17 PM, anatoly techtonik <techtonik at gmail.com> wrote:
> You do realize that high level program logic changes depending on the fact
> that FS supports symlinks or not. It is not "an exceptional" case as you've
> presented it.

There are plenty of other non-exceptional cases that are signalled
with exceptions. It's part of the EAFP model and its reliability. If
high level logic changes, it surely can be like this:

try:
    os.symlink(whatever)
except OSError:
    alternate_logic()

How would asking the path if it's symlinkable (by the way, do you ask
the source or destination?) improve that?

ChrisA

From wolfgang.maier at biologie.uni-freiburg.de  Fri Jul 25 13:31:27 2014
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Fri, 25 Jul 2014 13:31:27 +0200
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <lqtagd$ba8$1@ger.gmane.org>
References: <lqr2mh$s27$1@ger.gmane.org> <lqrmgd$6kr$1@ger.gmane.org>
 <lqru0h$gs3$1@ger.gmane.org> <lqtagd$ba8$1@ger.gmane.org>
Message-ID: <53D2400F.2080709@biologie.uni-freiburg.de>

On 25.07.2014 12:12, Serhiy Storchaka wrote:
> 25.07.14 00:32, Wolfgang Maier ???????(??):
>> On 24.07.2014 21:24, Serhiy Storchaka wrote:
>>> 24.07.14 16:45, Wolfgang Maier ???????(??):
>> I'm not saying, this is a must-have in Python. I don't have a problem
>> with sticking to normpath, just thought it's a tiny change giving some
>> benefit in readability.
>
> To me explicit well known join() and normpath() are more readable then
> unexpected second argument to abspath().
>

Ok, I just seem to think differently than all of you. whenever I need 
this functionality (and just like for the stdlib, it's less often than 
regular abspath), I think: oh, this must be addressable with abspath, 
then after a moment I realize there is no start option like in relpath. 
Then I consult the docs where I find this for abspath:

"Return a normalized absolutized version of the pathname path. On most 
platforms, this is equivalent to calling the function normpath() as 
follows: normpath(join(os.getcwd(), path))."

 From which the solution is apparent.
Never have I thought first, ah, that's a job for normpath. Maybe that's 
because I can't remember a single case where I used normpath for 
anything else in my code, so I'm kind of thinking about normpath as a 
low-level function needed a lot in os.path, but typically not needed 
much outside of it because there are higher-level functions like abspath 
that do the normalization in the background.

It's interesting to learn that I seem to be quite alone with this view, 
but that's ok, I'm sure it will help me remember normpath next time :)

> Not every one-line function should be added to the stdlib. And I found
> only 10 usages of normpath(join()) combination in Python source three
> (including 4 in tests and 4 in PC build script, therefore only 2 in the
> stdlib itself) against 205 usages of abspath().
>



From antoine at python.org  Fri Jul 25 15:41:50 2014
From: antoine at python.org (Antoine Pitrou)
Date: Fri, 25 Jul 2014 09:41:50 -0400
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <CADiSq7fOE4qD1fhvsFVcTejoVUqE8O1LnrexKQjQ4FScAnk9WA@mail.gmail.com>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
 <CADiSq7fOE4qD1fhvsFVcTejoVUqE8O1LnrexKQjQ4FScAnk9WA@mail.gmail.com>
Message-ID: <53D25E9E.6060209@python.org>

Le 25/07/2014 04:18, Nick Coghlan a ?crit :
>  > For example, you cannot do:
>  >
>  > f = open(Path(_file__) / 'app.conf')
>  >
>  > It will fail.
>
> Just like ipaddress, this is a deliberate design choice that avoids
> coupling low level APIs to a high level convenience library.

Note the gap could be crossed without coupling by introducing a __path__ 
protocol (or something similar for IP addresses).

Regards

Antoine.



From ncoghlan at gmail.com  Fri Jul 25 16:01:50 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Jul 2014 00:01:50 +1000
Subject: [Python-ideas] os.path.abspath - optional startdir argument
In-Reply-To: <53D25E9E.6060209@python.org>
References: <lqr2mh$s27$1@ger.gmane.org>
 <CAN1YFWsbzA9aSjUHD6PWOU0R_34Rjspr2jyHf0WYu7nxuuT9Dg@mail.gmail.com>
 <lqr8ri$s27$1@ger.gmane.org> <lqrv27$ua2$1@ger.gmane.org>
 <CAN1YFWsJCdLewHNY03beyPWUbJRamopox6ZN81OeY9-QAmRtLA@mail.gmail.com>
 <CADiSq7fOE4qD1fhvsFVcTejoVUqE8O1LnrexKQjQ4FScAnk9WA@mail.gmail.com>
 <53D25E9E.6060209@python.org>
Message-ID: <CADiSq7eusntZ97+QHNtC-MA4zZiL3CHoubVbKQjoU9h5e6psrQ@mail.gmail.com>

On 25 July 2014 23:41, Antoine Pitrou <antoine at python.org> wrote:
> Le 25/07/2014 04:18, Nick Coghlan a ?crit :
>
>>  > For example, you cannot do:
>>  >
>>  > f = open(Path(_file__) / 'app.conf')
>>  >
>>  > It will fail.
>>
>> Just like ipaddress, this is a deliberate design choice that avoids
>> coupling low level APIs to a high level convenience library.
>
>
> Note the gap could be crossed without coupling by introducing a __path__
> protocol (or something similar for IP addresses).

My main concern with that approach is the sheer number of places we'd
need to touch. I'm not implacably opposed to the idea, I just strongly
suspect it wouldn't be worth the hassle to save the str() calls, as:

- explicit str() calls would still be needed for anyone still
supporting older versions of Python
- explicit str() calls would still be needed when dealing with third
party libraries that don't support the new protocol yet
- we wouldn't get to simplify any of the low level APIs, since they'd
still need to support str objects - the new protocol would be strictly
additive.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From abarnert at yahoo.com  Fri Jul 25 20:29:11 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 25 Jul 2014 11:29:11 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87egxbm8eo.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>	<lqb1q9$qts$1@ger.gmane.org>	<EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>	<CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>	<CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>	<CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>	<1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>	<CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>	<1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>	<CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>	<CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>	<1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>	<CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>	<1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>	<CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>	<87bnshnzu1.fsf@gmail.com>	<E1FD0B3B-D82D-4805-98EC-4C595CE9D534@yahoo.com>	<87oawgmfxp.fsf@gmail.com>	<3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com>
 <87egxbm8eo.fsf@gmail.com>
Message-ID: <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>

On Thursday, July 24, 2014 2:08 AM, Akira Li <4kir4.1i at gmail.com> wrote:

> > Andrew Barnert <abarnert at yahoo.com> writes:
> 
>>  On Jul 23, 2014, at 5:13, Akira Li <4kir4.1i at gmail.com> wrote:
>>>  In order to newline="\0" case to work, it should behave?

>>> similar to
>>>  newline='' or newline='\n' case instead i.e., no 
>>> translation should take
>>>  place, to avoid corrupting embed "\n\r" characters.
>> 
>>  The draft PEP discusses this. I think it would be more consistent to
>>  translate for \0, just like \r and \r\n.
> 
> I read the [draft]. No translation is a better choice here. Otherwise
>> (at the very least) it breaks `find -print0` use case.

No it doesn't. The only reason it breaks your code is that you add newline='\0' to your stdout wrapper as well as your stdin wrapper. If you just passed '', it would not do anything. And this is exactly parallel with the existing case with, e.g., trying to pass through a classic-Mac file full of '\r'-delimited strings that might contain embedded '\n' characters that you don't want to translate.

As I've said before, I don't really like the design for '\r' and '\r\n', or the fact that three separate notions (universal-newlines flag, line ending for readline, and output translation for write) are all conflated into one idea and crammed into one parameter, but I think it's probably too late and too radical to change that.

(It's less of an issue for binary files, because binary files can't take a newline parameter at all today, and because "no output translation" has been part of the definition of what "binary file" means all the way back to Python 1.x.)


> Backwards compatibility is preserved except that newline parameter
> accepts more values.

The same is true with the draft proposal. You've basically copied the exact same thing, except for what happens on output for newlines other than None, '', '\n', '\r', and '\r\n' in text files. Since that case cannot arise today, there are no backward compatibility issues. Your version is only a small change to the documentation and a small change to the code, but my version is an even smaller change to the documentation and no change to the code, so you can't argue this from a conservative point of view.

> 
>>  For the your script, there is no reason to pass newline=nl to the
>>  stdout replacement. The only effect that has on output is \n
>>  replacement, which you don't want. And if we removed that effect from
>>  the proposal, it would have no effect at all on output, so why pass
>>  it?
> 
> Keep in mind, I expect that newline='\0' does *not* translate 
> '\n' to
> '\0'. If you remove newline=nl then embed \n might be corrupted?

No, it's only corrupted if you _pass_ newline=nl. If you instead passed, e.g., newline='', nothing could possibly corrupted.

> i.e., it

> breaks `find -print0` use-case. Both newline=nl for stdout and end=nl
> are required here. Though (optionally) it would be nice to change
> `print()` so that it would use `end=file.newline or '\n'` by default
> instead.

That might be a nice change; I'll mention it in the next draft. But I think it's better to keep the changes as small and conservative as possible, so unless there's an upswell of support for it, I think anything that isn't actually necessary to solving the problem should be left out.

> There is also line_buffering parameter. From the docs:
> 
> ? If line_buffering is True, flush() is implied when a call to write
> ? contains a newline character.

The way this is actually defined seems broken to me; IIRC (I'll check the code later) it flushes on any '\r', and on any translated '\n'. So, it's doing the wrong thing with '\r' in most modes, and with '\n' in '' mode on non-Unix systems. So my thought was, just leave it broken.

But now that I think about it, the existing code can only flush excessively, never insufficiently, and that's probably a property worth preserving. So maybe there _is_ a reason to pass newline for output without translation after all. In other words, the parameter may actually conflate _four_ things, not just three...

I'll need to think this through (and reread the code) this weekend; thanks for bringing it up.

>>  Do you have a use case where you need to pass a non-standard newline
>>  to a text file/stream, but don't want newline replacement?
> 
> `find -print0` use case that my code implements above.
> 
>>  Or is it just a matter of avoiding confusion if people accidentally
>>  pass it for stdout when they didn't want it?
> 
> See the explanation above that starts with "Simple things should be 
> simple."

I still don't understand your point here, and just repeating it isn't helping. You're making simple things _less_ simple than they are in the draft, requiring slightly more change to the documentation and to the code and slightly more for people to understand just to allow them to pass an unnecessary parameter. That doesn't sound like an argument from simplicity to me.

But line_buffering definitely might be a good argument, in which case it doesn't matter how good this one is.


From abarnert at yahoo.com  Fri Jul 25 22:46:29 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 25 Jul 2014 13:46:29 -0700
Subject: [Python-ideas] Expose __abstractmethods__/__isabstractmethod__ in
	abc
Message-ID: <1406321189.64180.YahooMailNeo@web181003.mail.ne1.yahoo.com>

An ABC built with abc.ABCMeta has a member?__abstractmethods__, which is an iterable of all of the abstract methods defined in that ABC that need to be overridden.?A method decorated with @abstractmethod gets a member __isabstractmethod__=True, which is how the ABC (or, rather, the interpreter) checks whether each of its abstract methods have been overridden.


However, they're part of a private protocol used by the CPython implementation of the abc module, which means any third-party code that uses them isn't portable or future-proof. Which is a shame, because there are all kinds of things you can build easily on top of abc with them, but would have to duplicate most of the module (and the special interpreter support for it) without them.

The simplest change is just to document these two members as part of the module interface.


Alternatively, there could be functions abc.isabstractmethod(method) and abc.abstractmethods(cls), which would allow for implementations that didn't use the same protocol internally but supported the same interface.

Examples where this could be useful:


*?Write a runtime check-and-register function.

* Explicitly test that a set of classes have implemented their ABC(s) without having to know how to correctly instantiate them.

* Write a generic @autoabc decorator or similar for creating ABCs that are automatically virtual base classes of any type with the right methods, instead of doing it manually in each class (as collections.abc does today), like Go interfaces, C++ auto concepts, traditional ObjC checked informal protocols, etc.).

* Build a signature-checking (rather than just name-checking) ABC (like https://github.com/apieum/ducktype).

* Build a simplified version of PyProtocols-like adapters on top of abc instead of PyProtocols.

Some of these might belong in the stdlib (in fact, it looks like http://bugs.python.org/issue9731 basically covers the first two), in which case they don't need to be implementable from outside? but that's certainly not true for all of them. (Without a precise algorithm for "compatible signature" or a standardized notion of adaptation, stdlib inclusion isn't even sensible for the last two, much less a good idea.) So, outside libraries should be able to implement them.


From ncoghlan at gmail.com  Sat Jul 26 01:28:23 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Jul 2014 09:28:23 +1000
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <lqb1q9$qts$1@ger.gmane.org>
 <EDBFBEB6-3416-41CC-885D-B2450A956A9A@yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com> <87oawgmfxp.fsf@gmail.com>
 <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com>
 <87egxbm8eo.fsf@gmail.com>
 <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>
Message-ID: <CADiSq7dACQUf_+UV1=qt078sraXBiGMnu=isOR=hn+8jigCc2A@mail.gmail.com>

On 26 Jul 2014 04:33, "Andrew Barnert" <abarnert at yahoo.com.dmarc.invalid>
wrote:
> As I've said before, I don't really like the design for '\r' and '\r\n',
or the fact that three separate notions (universal-newlines flag, line
ending for readline, and output translation for write) are all conflated
into one idea and crammed into one parameter, but I think it's probably too
late and too radical to change that.

It's potentially still worth spelling out that idea as a Rejected
Alternative in the PEP. A draft design that separates them may help clarify
the concepts being conflated more effectively than simply describing them,
even if your own pragmatic assessment is "too much pain for not enough
gain".

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/1890c0da/attachment.html>

From ncoghlan at gmail.com  Sat Jul 26 01:34:28 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Jul 2014 09:34:28 +1000
Subject: [Python-ideas] Expose __abstractmethods__/__isabstractmethod__
	in abc
In-Reply-To: <1406321189.64180.YahooMailNeo@web181003.mail.ne1.yahoo.com>
References: <1406321189.64180.YahooMailNeo@web181003.mail.ne1.yahoo.com>
Message-ID: <CADiSq7eGD_RdY=0dKs3D-RyAzt=HWr74vc-P2pzeiGp5DuWNmQ@mail.gmail.com>

The additional module level functions sound like a good idea to me. I see
it as similar to the functools.singledispatch driven addition to expose a
way to obtain a cache validity token for the virtual object graph.

I thought "__isabstractmethod__" was already documented though, since we
rely on it to control the pass through behaviour of property and other
decorators like classmethod and staticmethod. If it isn't, that's really a
bug rather than an RFE.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/92e77f0f/attachment.html>

From 4kir4.1i at gmail.com  Sat Jul 26 04:13:24 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Sat, 26 Jul 2014 06:13:24 +0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com> <87oawgmfxp.fsf@gmail.com>
 <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com>
 <87egxbm8eo.fsf@gmail.com>
 <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>
Message-ID: <87vbqklvej.fsf@gmail.com>


I've added a patch that demonstrates "no translation" for alternative
newlines behavior http://bugs.python.org/issue1152248#msg224016

Andrew Barnert
<abarnert at yahoo.com.dmarc.invalid> writes:

> On Thursday, July 24, 2014 2:08 AM, Akira Li
> <4kir4.1i at gmail.com> wrote:
>
>> > Andrew Barnert <abarnert at yahoo.com> writes:
>> 
>>>  On Jul 23, 2014, at 5:13, Akira Li
>>> <4kir4.1i at gmail.com> wrote:
>>>>  In order to newline="\0" case to work, it should behave?
>
>>>> similar to
>>>>  newline='' or newline='\n' case instead i.e., no 
>>>> translation should take
>>>>  place, to avoid corrupting embed "\n\r" characters.
>>> 
>>>  The draft PEP discusses this. I think it would be more consistent to
>>>  translate for \0, just like \r and \r\n.
>> 
>> I read the [draft]. No translation is a better choice here. Otherwise
>>> (at the very least) it breaks `find -print0` use case.
>
> No it doesn't. The only reason it breaks your code is that you add
> newline='\0' to your stdout wrapper as well as your stdin wrapper. If
> you just passed '', it would not do anything. And this is exactly
> parallel with the existing case with, e.g., trying to pass through a
> classic-Mac file full of '\r'-delimited strings that might contain
> embedded '\n' characters that you don't want to translate.

I won't repeat it several times but as you've already found out newline='\0'
for stdout (at the very least) can be useful for line_buffering=True
behavior.

...
>> There is also line_buffering parameter. From the docs:
>> 
>> ? If line_buffering is True, flush() is implied when a call to write
>> ? contains a newline character.
>
> The way this is actually defined seems broken to me; IIRC (I'll check
> the code later) it flushes on any '\r', and on any translated
> \n'. So, it's doing the wrong thing with '\r' in most modes, and with
> \n' in '' mode on non-Unix systems. So my thought was, just leave it
> broken.

Yes. I've found at least one issue http://bugs.python.org/issue22069

> But now that I think about it, the existing code can only flush
> excessively, never insufficiently, and that's probably a property
> worth preserving. So maybe there _is_ a reason to pass newline for
> output without translation after all. In other words, the parameter
> may actually conflate _four_ things, not just three...
>
> I'll need to think this through (and reread the code) this weekend;
> thanks for bringing it up.


--
Akira


From abarnert at yahoo.com  Sat Jul 26 04:22:26 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 25 Jul 2014 19:22:26 -0700
Subject: [Python-ideas] Expose __abstractmethods__/__isabstractmethod__
	in abc
In-Reply-To: <CADiSq7eGD_RdY=0dKs3D-RyAzt=HWr74vc-P2pzeiGp5DuWNmQ@mail.gmail.com>
References: <1406321189.64180.YahooMailNeo@web181003.mail.ne1.yahoo.com>
 <CADiSq7eGD_RdY=0dKs3D-RyAzt=HWr74vc-P2pzeiGp5DuWNmQ@mail.gmail.com>
Message-ID: <D8A58E67-1B06-461D-91DA-E5DB1BDF6106@yahoo.com>

On Jul 25, 2014, at 16:34, Nick Coghlan <ncoghlan at gmail.com> wrote:
> The additional module level functions sound like a good idea to me. I see it as similar to the functools.singledispatch driven addition to expose a way to obtain a cache validity token for the virtual object graph.
> 
Another reason the function seems better than the attribute.

The only advantage to the attribute is that we could document that it already existed in 3.4 and earlier, instead of just documenting a new function. And if that was desirable we could always add that as a note to the documentation of the function.
> I thought "__isabstractmethod__" was already documented though, since we rely on it to control the pass through behaviour of property and other decorators like classmethod and staticmethod. If it isn't, that's really a bug rather than an RFE.
> 
You're right; I was looking for it in the wrong place. It doesn't document that abstract methods created by @abstractmethod have that attribute, but it does document that if you want to create an abstract method manually you have to set it, and shows how @property both uses and exposes the attribute, which is more than enough. So, never mind that part.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140725/b700b1b0/attachment.html>

From 4kir4.1i at gmail.com  Sat Jul 26 04:24:16 2014
From: 4kir4.1i at gmail.com (Akira Li)
Date: Sat, 26 Jul 2014 06:24:16 +0400
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com> <87oawgmfxp.fsf@gmail.com>
 <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com>
 <87egxbm8eo.fsf@gmail.com>
 <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7dACQUf_+UV1=qt078sraXBiGMnu=isOR=hn+8jigCc2A@mail.gmail.com>
Message-ID: <87tx64luwf.fsf@gmail.com>

Nick Coghlan <ncoghlan at gmail.com> writes:

> On 26 Jul 2014 04:33, "Andrew Barnert"
> <abarnert at yahoo.com.dmarc.invalid>
> wrote:
>> As I've said before, I don't really like the design for '\r' and '\r\n',
> or the fact that three separate notions (universal-newlines flag, line
> ending for readline, and output translation for write) are all conflated
> into one idea and crammed into one parameter, but I think it's probably too
> late and too radical to change that.
>
> It's potentially still worth spelling out that idea as a Rejected
> Alternative in the PEP. A draft design that separates them may help clarify
> the concepts being conflated more effectively than simply describing them,
> even if your own pragmatic assessment is "too much pain for not enough
> gain".
>

It can't be in the rejected ideas because it is the current behavior for
io.TextIOWrapper(newline=..) and it will never change (in Python 3) due
to backward compatibility.

As I understand Andrew doesn't like that *newline* parameter does too
much:

- *newline* parameter turns on/off universal newline mode
- it may specify the line separator e.g., newline='\r'
- it specifies whether newline translation happens e.g., newline=''
  turns it off
- together with *line_buffering*, it may enable flush() if newline is
  written


It is unrelated to my proposal [1] that shouldn't change the old
behavior if newline in {None, '', '\n', '\r', '\r\n'}.

[1] http://bugs.python.org/issue1152248#msg224016


--
Akira


From abarnert at yahoo.com  Sat Jul 26 06:03:30 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 25 Jul 2014 21:03:30 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87tx64luwf.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7ejpH6P-uSVcEhx4Th4aK8=iWkkPASjKtsa4uxeRNw2pA@mail.gmail.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com> <87oawgmfxp.fsf@gmail.com>
 <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com> <87egxbm8eo.fsf@gmail.com>
 <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7dACQUf_+UV1=qt078sraXBiGMnu=isOR=hn+8jigCc2A@mail.gmail.com>
 <87tx64luwf.fsf@gmail.com>
Message-ID: <3D379B63-8016-4130-87F1-7242E11CBF59@yahoo.com>

On Jul 25, 2014, at 19:24, Akira Li <4kir4.1i at gmail.com> wrote:

> Nick Coghlan <ncoghlan at gmail.com> writes:
> 
>> On 26 Jul 2014 04:33, "Andrew Barnert"
>> <abarnert at yahoo.com.dmarc.invalid>
>> wrote:
>>> As I've said before, I don't really like the design for '\r' and '\r\n',
>> or the fact that three separate notions (universal-newlines flag, line
>> ending for readline, and output translation for write) are all conflated
>> into one idea and crammed into one parameter, but I think it's probably too
>> late and too radical to change that.
>> 
>> It's potentially still worth spelling out that idea as a Rejected
>> Alternative in the PEP. A draft design that separates them may help clarify
>> the concepts being conflated more effectively than simply describing them,
>> even if your own pragmatic assessment is "too much pain for not enough
>> gain".
> 
> It can't be in the rejected ideas because it is the current behavior for
> io.TextIOWrapper(newline=..) and it will never change (in Python 3) due
> to backward compatibility.

That's exactly why changing it would be a "rejected idea". It certainly doesn't hurt to document the fact that we thought about it and decided not to change it for backward compatibility reasons.

> As I understand Andrew doesn't like that *newline* parameter does too
> much:
> 
> - *newline* parameter turns on/off universal newline mode
> - it may specify the line separator e.g., newline='\r'
> - it specifies whether newline translation happens e.g., newline=''
>  turns it off
> - together with *line_buffering*, it may enable flush() if newline is
>  written

Exactly. And the fourth one only indirectly; "newline" flushing doesn't exactly mean _either_ of "\n" or the newline argument. And the related-but-definitely-not-the-same newlines attribute makes it even more confusing. (I've found bug reports with both Guido and Nick confused into thinking that newline was available as an attribute after construction; what hope do the rest of us have?)

But the reality is, it rarely affects real-life programs, so it's definitely not worth breaking compatibility over. And it's still a whole lot cleaner than the 2.x design despite having a lot more details to deal with.

From abarnert at yahoo.com  Sat Jul 26 06:09:41 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 25 Jul 2014 21:09:41 -0700
Subject: [Python-ideas] Iterating non-newline-separated files should be
	easier
In-Reply-To: <87vbqklvej.fsf@gmail.com>
References: <1405626785.14773.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CADiSq7ffuJrXrYP3SgAWpLSyxh1xv1sOa5mH_BHgwy-ttU90xA@mail.gmail.com>
 <CAPTjJmrcHxYqSMm34=UZ2Yb32cnfP=TgRWF5By3f8jbSNVOVnQ@mail.gmail.com>
 <CADiSq7et1qbTNCXQTev1jy26B6TZQBxWFeWFFf3CGr9oNy5tgw@mail.gmail.com>
 <1405812535.88058.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <CADiSq7cdfNm+N67oWHEt6c8GW2HX3TZwePJcW=QfQb40B7edRw@mail.gmail.com>
 <1405817834.46270.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <CADiSq7f2OeV8xGbQrN-5=mCK2Muhxw79u_e+tM=+afzd298UHg@mail.gmail.com>
 <CAPTjJmqchiFsNt2c0z4Mtmt=6v5=b51k-_eAN8Gu=hxXrNQgMA@mail.gmail.com>
 <1405828738.93713.YahooMailNeo@web181005.mail.ne1.yahoo.com>
 <1405903292.28722.YahooMailNeo@web181006.mail.ne1.yahoo.com>
 <CACac1F9ROd6wz_D197UP_Pz-n8zkurjhCsGLCRTOpb0jpL=1oA@mail.gmail.com>
 <87bnshnzu1.fsf@gmail.com> <87oawgmfxp.fsf@gmail.com>
 <3E31BD23-A903-4B48-82E5-6DDA4AA2E15C@yahoo.com> <87egxbm8eo.fsf@gmail.com>
 <1406312951.60505.YahooMailNeo@web181001.mail.ne1.yahoo.com>
 <87vbqklvej.fsf@gmail.com>
Message-ID: <DE349E90-23DD-4D89-8631-98BF5D379E0A@yahoo.com>

On Jul 25, 2014, at 19:13, Akira Li <4kir4.1i at gmail.com> wrote:

> I've added a patch that demonstrates "no translation" for alternative
> newlines behavior http://bugs.python.org/issue1152248#msg224016

Having taken a better look at the line buffering code, I now agree with you that this is necessary; otherwise we'd have to make a much bigger change to the implementation (which I don't think we want).

When I update the draft PEP I'll change that and add a rationale (this also makes the rationale for "no translation for binary files" and for "only readnl is exposed, not writenl" a lot simpler). 

I'll also change it in my C patch (which I hope to be able to clean up and upload this weekend).

> Andrew Barnert
> <abarnert at yahoo.com.dmarc.invalid> writes:
> 
>> On Thursday, July 24, 2014 2:08 AM, Akira Li
>> <4kir4.1i at gmail.com> wrote:
>> 
>>>> Andrew Barnert <abarnert at yahoo.com> writes:
>>> 
>>>> On Jul 23, 2014, at 5:13, Akira Li
>>>> <4kir4.1i at gmail.com> wrote:
>>>>> In order to newline="\0" case to work, it should behave 
>> 
>>>>> similar to
>>>>> newline='' or newline='\n' case instead i.e., no 
>>>>> translation should take
>>>>> place, to avoid corrupting embed "\n\r" characters.
>>>> 
>>>> The draft PEP discusses this. I think it would be more consistent to
>>>> translate for \0, just like \r and \r\n.
>>> 
>>> I read the [draft]. No translation is a better choice here. Otherwise
>>>> (at the very least) it breaks `find -print0` use case.
>> 
>> No it doesn't. The only reason it breaks your code is that you add
>> newline='\0' to your stdout wrapper as well as your stdin wrapper. If
>> you just passed '', it would not do anything. And this is exactly
>> parallel with the existing case with, e.g., trying to pass through a
>> classic-Mac file full of '\r'-delimited strings that might contain
>> embedded '\n' characters that you don't want to translate.
> 
> I won't repeat it several times but as you've already found out newline='\0'
> for stdout (at the very least) can be useful for line_buffering=True
> behavior.
> 
> ...
>>> There is also line_buffering parameter. From the docs:
>>> 
>>>   If line_buffering is True, flush() is implied when a call to write
>>>   contains a newline character.
>> 
>> The way this is actually defined seems broken to me; IIRC (I'll check
>> the code later) it flushes on any '\r', and on any translated
>> \n'. So, it's doing the wrong thing with '\r' in most modes, and with
>> \n' in '' mode on non-Unix systems. So my thought was, just leave it
>> broken.
> 
> Yes. I've found at least one issue http://bugs.python.org/issue22069
> 
>> But now that I think about it, the existing code can only flush
>> excessively, never insufficiently, and that's probably a property
>> worth preserving. So maybe there _is_ a reason to pass newline for
>> output without translation after all. In other words, the parameter
>> may actually conflate _four_ things, not just three...
>> 
>> I'll need to think this through (and reread the code) this weekend;
>> thanks for bringing it up.
> 
> 
> --
> Akira
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From ronaldoussoren at mac.com  Sat Jul 26 10:03:13 2014
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Sat, 26 Jul 2014 10:03:13 +0200
Subject: [Python-ideas] PEP 447 revisited
Message-ID: <5BB87CC4-F31B-4213-AAAC-0C0CE738460C@mac.com>

Hi,

After a long hiatus I?ve done some updates to PEP 447 which proposes a new metaclass method that?s used in attribute resolution for normal and super instances. There have been two updates, the first one is trivial, the proposed method has a new name (__getdescriptor__).  The second change to the PEP is to add a Python pseudo implementation of object.__getattribute__ and super.__getattribute__ to make it easier to reason about the impact of the proposal.

I?d like to move forward with this PEP, either to rejection or (preferable) to acceptance of the feature in some form. That said, I?m not too attached to the exact proposal, it just seems to be the minmal clean change that can be used to implement my use case for this.

My use case is fairly obscure, but hopefully it is not too obscure :-).  The problem I have at the moment is basically that it is not possible to hook into the attribute resolution algorithm used by super.__getattribute__ and this PEP would solve that.

My use case for this PEP is PyObjC, the PEP would make it possible to remove a custom ?super? class used in that project. I?ll try to sketch what PyObjC does and why the current super is a problem in the paragraphs below.

PyObjC is a bridge between Python and Objective-C.  The bit that?s important for this discussion is that every Objective-C object and class can be proxied into Python code. That?s done completely dynamically: the PyObjC bridge reads information from the Objective-C runtime (using a public API for that) to determine which classes are present there and which methods those classes have.  

Accessing the information on methods is done on demand, the bridge only looks for a method when Python code tries to access it. There are two reasons for that, the first one is performance: extracting method information eagerly is too expensive because there are a lot of them and Python code typically uses only a fraction of them.  The second reason is more important than that: Objective-C classes are almost as dynamic Python classes and it is possible to add new methods at runtime either by loading add-on bundles (?Categories?) or by interacting with the Objective-C runtime. Both are actually used by Apple?s frameworks.   There are no hooks that can be used to detect there modification, the only option I?ve found that can be used to keep the Python representation of a class in sync with the Objective-C representation is to eagerly scan classes every time they might be accessed, for example in the __getattribute__ of the proxies for Objective-C classes and instances.

That?s terribly expensive, and still leaves a race condition when using super, in code like the code below the superclass might grow a new method between the call to the python method and using the superclass method:

     def myMethod(self):
          self.objectiveCMethod()
          super().otherMethod()

Because of this the current PyObjC release doesn?t even try to keep the Python representation in sync, but always lazily looks for methods (but with a cache for all found methods to avoid the overhead of looking for them when methods are used multiple times). As that definitely will break builtin.super PyObjC also includes a custom super implementation that must be used.  That works, but can lead to confusing errors when users forget to add ?from objc import super? to modules that use super in subclasses from Objective-C classes.

The performance impact on CPython seemed to be minimal according to the testing I performed last year, but I have no idea what the impact would be on other implementation (in particular PyPy?s JIT).

A link to the PEP: http://legacy.python.org/dev/peps/pep-0447/

I?d really appreciate further feedback on this PEP. 

Regards,

   Ronald

From ncoghlan at gmail.com  Sat Jul 26 13:59:35 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 26 Jul 2014 21:59:35 +1000
Subject: [Python-ideas] PEP 447 revisited
In-Reply-To: <5BB87CC4-F31B-4213-AAAC-0C0CE738460C@mac.com>
References: <5BB87CC4-F31B-4213-AAAC-0C0CE738460C@mac.com>
Message-ID: <CADiSq7dNdCwqcMfAsmuoDUPXcFQBjq+Agv29rgB6Jzx50XnrNw@mail.gmail.com>

On 26 July 2014 18:03, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
> Hi,
>
> After a long hiatus I?ve done some updates to PEP 447 which proposes a new metaclass method that?s used in attribute resolution for normal and super instances. There have been two updates, the first one is trivial, the proposed method has a new name (__getdescriptor__).  The second change to the PEP is to add a Python pseudo implementation of object.__getattribute__ and super.__getattribute__ to make it easier to reason about the impact of the proposal.
>
> I?d like to move forward with this PEP, either to rejection or (preferable) to acceptance of the feature in some form. That said, I?m not too attached to the exact proposal, it just seems to be the minmal clean change that can be used to implement my use case for this.
>
> My use case is fairly obscure, but hopefully it is not too obscure :-).  The problem I have at the moment is basically that it is not possible to hook into the attribute resolution algorithm used by super.__getattribute__ and this PEP would solve that.

The use case seems reasonable to me, and the new slot name seems much
easier to document and explain than the previous iteration.

I'd like to see the PEP look into the inspect module and consider the
consequences for the functions there (e.g. another way for
getattr_static to miss methods), as well as any possible implications
for dir(). We had a few issues there with the enum changes for 3.4
(and some more serious ones with Argument Clinic) - it's not a
blocker, it's just nice going in to have some idea of the impact going
in :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From castironpi at gmail.com  Sat Jul 26 22:51:16 2014
From: castironpi at gmail.com (Aaron Brady)
Date: Sat, 26 Jul 2014 13:51:16 -0700 (PDT)
Subject: [Python-ideas] Mutating while iterating
Message-ID: <cd149d5b-5b57-4fbe-a639-a7f13fae64cf@googlegroups.com>

Hi, I asked about the inconsistency of the "RuntimeError" being raised when 
mutating a container while iterating over it here [1], "set and dict 
iteration" on Aug 16, 2012.

[1] http://www.gossamer-threads.com/lists/python/python/1004659

Continuing new from the bugs issue page [2]:

[2] http://bugs.python.org/issue22084

Other prior discussion [3] [4]:

[3] http://bugs.python.org/issue19332

[4] http://bugs.python.org/issue6017

Thanks Mr. Storchaka for your comments.  The new documentation didn't help. 
 The current behavior is still a rare but inconsistent silent error.  An 
implementation sketch in pseudocode might simplify the endeavor [5]:

[5] 
http://home.comcast.net/~castironpi-misc/irc-0168%20mutating%20while%20iterating%20markup.html

I gather we wouldn't want to pursue the "custom" data container, option 
"2e": we would still need both "malloc/free" and a reference count.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/f962af17/attachment.html>

From oreilldf at gmail.com  Sat Jul 26 22:59:16 2014
From: oreilldf at gmail.com (Dan O'Reilly)
Date: Sat, 26 Jul 2014 16:59:16 -0400
Subject: [Python-ideas] Better integration of multiprocessing with asyncio
Message-ID: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>

I think it would be helpful for folks using the asyncio module to be able
to make non-blocking calls to objects in the multiprocessing module more
easily. While some use-cases for using multiprocessing can be replaced with
ProcessPoolExecutor/run_in_executor, there are others that cannot; more
advanced usages of multiprocessing.Pool aren't supported by
ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
multiprocessing classes like Lock and Queue have blocking methods that
could be made into coroutines.

Consider this (extremely contrived, but use your imagination) example of a
asyncio-friendly Queue:

import asyncio
import time

def do_proc_work(q, val, val2):
    time.sleep(3)  # Imagine this is some expensive CPU work.
    ok = val + val2
    print("Passing {} to parent".format(ok))
    q.put(ok) # The Queue can be used with the normal blocking API, too.
    item = q.get()
    print("got {} back from parent".format(item))

def do_some_async_io_task():
    # Imagine there's some kind of asynchronous I/O
    # going on here that utilizes asyncio.
    asyncio.sleep(5)

@asyncio.coroutine
def do_work(q):
    loop.run_in_executor(ProcessPoolExecutor(),
                         do_proc_work, q, 1, 2)
    do_some_async_io_task()
    item = yield from q.coro_get() # Non-blocking get that won't affect our
io_task
    print("Got {} from worker".format(item))
    item = item + 25
    yield from q.coro_put(item)


if __name__  == "__main__":
    q = AsyncProcessQueue()  # This is our new asyncio-friendly version of
multiprocessing.Queue
    loop = asyncio.get_event_loop()
    loop.run_until_complete(do_work(q))

I have seen some rumblings about a desire to do this kind of integration on
the bug tracker (http://bugs.python.org/issue10037#msg162497 and
http://bugs.python.org/issue9248#msg221963) though that discussion is
specifically tied to merging the enhancements from the Billiard library
into multiprocessing.Pool. Are there still plans to do that? If so, should
asyncio integration with multiprocessing be rolled into those plans, or
does it make sense to pursue it separately?

Even more generally, do people think this kind of integration is a good
idea to begin with? I know using asyncio is primarily about *avoiding* the
headaches of concurrent threads/processes, but there are always going to be
cases where CPU-intensive work is going to be required in a primarily
I/O-bound application. The easier it is to for developers to handle those
use-cases, the better, IMO.

Note that the same sort of integration could be done with the threading
module, though I think there's a fairly limited use-case for that; most
times you'd want to use threads over processes, you could probably just use
non-blocking I/O instead.

Thanks,
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/23887195/attachment.html>

From python at 2sn.net  Sun Jul 27 01:34:16 2014
From: python at 2sn.net (Alexander Heger)
Date: Sun, 27 Jul 2014 09:34:16 +1000
Subject: [Python-ideas] adding dictionaries
Message-ID: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>

Is there a good reason for not implementing the "+" operator for dict.update()?

A = dict(a=1, b=1)
B = dict(a=2, c=2)
B += A
B
dict(a=1, b=1, c=2)

That is

B += A

should be equivalent to

B.update(A)

It would be even better if there was also a regular "addition"
operator that is equivalent to creating a shallow copy and then
calling update():

C = A + B

should equal to

C = dict(A)
C.update(B)

(obviously not the same as C = B + A, but the "+" operator is not
commutative for most operations)

class NewDict(dict):
    def __add__(self, other):
         x = dict(self)
         x.update(other)
         return x
    def __iadd__(self, other):
         self.update(other)


My apologies if this has been posted before but with a quick google
search I could not see it; if it was, could you please point me to the
thread?  I assume this must be a design decision that has been made a
long time ago, but it is not obvious to me why.

From steve at pearwood.info  Sun Jul 27 03:17:39 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 27 Jul 2014 11:17:39 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
Message-ID: <20140727011739.GC9112@ando>

On Sun, Jul 27, 2014 at 09:34:16AM +1000, Alexander Heger wrote:

> Is there a good reason for not implementing the "+" operator for dict.update()?
[...]
> That is
> 
> B += A
> 
> should be equivalent to
> 
> B.update(A)

You're asking the wrong question. The burden is not on people to justify 
*not* adding new features, the burden is on somebody to justify adding 
them. Is there a good reason for implementing the + operator as 
dict.update? We can already write B.update(A), under what circumstances 
would you spell it B += A instead, and why?


> It would be even better if there was also a regular "addition"
> operator that is equivalent to creating a shallow copy and then
> calling update():
> 
> C = A + B
> 
> should equal to
> 
> C = dict(A)
> C.update(B)

That would be spelled C = dict(A, **B).

I'd be more inclined to enhance the dict constructor and update methods 
so you can provide multiple arguments:

dict(A, B, C, D)  # Rather than A + B + C  + D
D.update(A, B, C)  # Rather than D += A + B + C


> My apologies if this has been posted before but with a quick google
> search I could not see it; if it was, could you please point me to the
> thread?  I assume this must be a design decision that has been made a
> long time ago, but it is not obvious to me why.

I'm not sure it's so much a deliberate decision not to implement 
dictionary addition, as uncertainty as to what dictionary addition ought 
to mean. Given two dicts:

    A = {'a': 1, 'b': 1}
    B = {'a': 2, 'c': 2}

I can think of at least four things that C = A + B could do:

    # add values, defaulting to 0 for missing keys
    C = {'a': 3, 'b': 1, 'c': 2}

    # add values, raising KeyError if there are missing keys

    # shallow copy of A, update with B
    C = {'a': 2, 'b': 1, 'c': 2}  

    # shallow copy of A, insert keys from B only if not already in A
    C = {'a': 1, 'b': 1, 'c': 2}

Except for the second one, I've come across people suggesting that each 
of the other three is the one and only obvious thing for A+B to do.


-- 
Steven

From tjreedy at udel.edu  Sun Jul 27 03:27:04 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 26 Jul 2014 21:27:04 -0400
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
Message-ID: <lr1khj$k6a$1@ger.gmane.org>

On 7/26/2014 7:34 PM, Alexander Heger wrote:
> Is there a good reason for not implementing the "+" operator for dict.update()?

As you immediate noticed, this is an incoherent request as stated. A op 
B should be a new object.

> A = dict(a=1, b=1)
> B = dict(a=2, c=2)
> B += A

Since "B op= A" is *defined* as resulting in B having the value of "B op 
A", with the operations possibly being done in-place if B is mutable, we 
would first have to define addition on dicts.

> B
> dict(a=1, b=1, c=2)
>
> That is
>
> B += A
>
> should be equivalent to
>
> B.update(A)
>
> It would be even better if there was also a regular "addition"
> operator that is equivalent to creating a shallow copy and then
> calling update():

You have this backwards. Dict addition would have to come first, and 
there are multiple possible and contextually useful definitions. The 
idea of choosing anyone of them as '+' has been rejected.

As indicated, augmented dict addition would follow from the choice of 
dict addition. It would not necessarily be equivalent to .update.  The 
addition needed to make this true would be asymmetric, like catenation.

But unlike sequence catenation, information is erased in that items in 
the updated dict get subtracted. Conceptually, update is replacement 
rather than just addition.

> My apologies if this has been posted

Multiple dict additions have been proposed and discussed here on 
python-ideas and probably on python-list.

-- 
Terry Jan Reedy


From python at 2sn.net  Sun Jul 27 04:18:48 2014
From: python at 2sn.net (Alexander Heger)
Date: Sun, 27 Jul 2014 12:18:48 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <lr1khj$k6a$1@ger.gmane.org>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <lr1khj$k6a$1@ger.gmane.org>
Message-ID: <CAN3CYHwROsraP=9xDh_E5vD_AhbCpibfh9-ernLv9p1hgPJM+g@mail.gmail.com>

Dear Terry,

> As you immediate noticed, this is an incoherent request as stated. A op B
> should be a new object.
> [...]
> You have this backwards. Dict addition would have to come first, and there
> are multiple possible and contextually useful definitions. The idea of
> choosing anyone of them as '+' has been rejected.

I had set out wanting to have a short form for dict.update(), hence
the apparently reversed order.
The proposed full addition does the same after first making a shallow
copy; the operator interface does define both __iadd__ and __add__.

> As indicated, augmented dict addition would follow from the choice of dict
> addition. It would not necessarily be equivalent to .update.  The addition
> needed to make this true would be asymmetric, like catenation.

yes.  As I note, most uses of the "+" operator in Python are not
symmetric (commutative).

> But unlike sequence catenation, information is erased in that items in the
> updated dict get subtracted. Conceptually, update is replacement rather than
> just addition.

Yes., not being able to have multiple identical keys is the nature of
dictionaries.
This does not mean that things should not be done in the best way they
can be done.
I was considering the set union operator "|" but that is also
symmetric and may cause more confusion.

Another consideration suggested was the element-wise addition in some form.
This is the natural way of doing things for structures of fixed length
like arrays, including numpy arrays.
And this is being accepted.
In contrast, for data structures with variable length, like lists and
strings, "addition" is concatenation, and what I would see the most
natural extension for dictionaries hence is to add the keys (not the
key values or values to each other), with the common behavior to
overwrite existing keys. You do have the choice in which order you
write the operation.

It would be funny if addition of strings would add their ASCII, char,
or unicode values and return the resulting string.

Sorry for bringing up, again, the old discussion of how to add
dictionaries as part of this.



-Alexander

On 27 July 2014 11:27, Terry Reedy <tjreedy at udel.edu> wrote:
> On 7/26/2014 7:34 PM, Alexander Heger wrote:
>>
>> Is there a good reason for not implementing the "+" operator for
>> dict.update()?
>
>
> As you immediate noticed, this is an incoherent request as stated. A op B
> should be a new object.
>
>
>> A = dict(a=1, b=1)
>> B = dict(a=2, c=2)
>> B += A
>
>
> Since "B op= A" is *defined* as resulting in B having the value of "B op A",
> with the operations possibly being done in-place if B is mutable, we would
> first have to define addition on dicts.
>
>
>> B
>> dict(a=1, b=1, c=2)
>>
>> That is
>>
>> B += A
>>
>> should be equivalent to
>>
>> B.update(A)
>>
>> It would be even better if there was also a regular "addition"
>> operator that is equivalent to creating a shallow copy and then
>> calling update():
>
>
> You have this backwards. Dict addition would have to come first, and there
> are multiple possible and contextually useful definitions. The idea of
> choosing anyone of them as '+' has been rejected.
>
> As indicated, augmented dict addition would follow from the choice of dict
> addition. It would not necessarily be equivalent to .update.  The addition
> needed to make this true would be asymmetric, like catenation.
>
> But unlike sequence catenation, information is erased in that items in the
> updated dict get subtracted. Conceptually, update is replacement rather than
> just addition.
>
>
>> My apologies if this has been posted
>
>
> Multiple dict additions have been proposed and discussed here on
> python-ideas and probably on python-list.
>
> --
> Terry Jan Reedy
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From guido at python.org  Sun Jul 27 04:39:23 2014
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Jul 2014 19:39:23 -0700
Subject: [Python-ideas] Better integration of multiprocessing with
	asyncio
In-Reply-To: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>
References: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>
Message-ID: <CAP7+vJLzRK+QS-sumP7=kZPGw1uJ9s4qp0JNSsLR3MK8J-76kw@mail.gmail.com>

I actually know very little about multiprocessing (have never used it) but
I imagine the way you normally interact with multiprocessing is using a
synchronous calls that talk to the subprocesses and their work queues and
so on, right?

In the asyncio world you would put that work in a thread and then use
run_in_executor() with a thread executor -- the thread would then be
managing the subprocesses and talking to them. While you are waiting for
that thread to complete your other coroutines will still work.

Unless you want to rewrite the communication and process management as
coroutines, but that sounds like a lot of work.


On Sat, Jul 26, 2014 at 1:59 PM, Dan O'Reilly <oreilldf at gmail.com> wrote:

> I think it would be helpful for folks using the asyncio module to be able
> to make non-blocking calls to objects in the multiprocessing module more
> easily. While some use-cases for using multiprocessing can be replaced with
> ProcessPoolExecutor/run_in_executor, there are others that cannot; more
> advanced usages of multiprocessing.Pool aren't supported by
> ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
> multiprocessing classes like Lock and Queue have blocking methods that
> could be made into coroutines.
>
> Consider this (extremely contrived, but use your imagination) example of a
> asyncio-friendly Queue:
>
> import asyncio
> import time
>
> def do_proc_work(q, val, val2):
>     time.sleep(3)  # Imagine this is some expensive CPU work.
>     ok = val + val2
>     print("Passing {} to parent".format(ok))
>     q.put(ok) # The Queue can be used with the normal blocking API, too.
>     item = q.get()
>     print("got {} back from parent".format(item))
>
> def do_some_async_io_task():
>     # Imagine there's some kind of asynchronous I/O
>     # going on here that utilizes asyncio.
>     asyncio.sleep(5)
>
> @asyncio.coroutine
> def do_work(q):
>     loop.run_in_executor(ProcessPoolExecutor(),
>                          do_proc_work, q, 1, 2)
>     do_some_async_io_task()
>     item = yield from q.coro_get() # Non-blocking get that won't affect
> our io_task
>     print("Got {} from worker".format(item))
>     item = item + 25
>     yield from q.coro_put(item)
>
>
> if __name__  == "__main__":
>     q = AsyncProcessQueue()  # This is our new asyncio-friendly version of
> multiprocessing.Queue
>     loop = asyncio.get_event_loop()
>     loop.run_until_complete(do_work(q))
>
> I have seen some rumblings about a desire to do this kind of integration
> on the bug tracker (http://bugs.python.org/issue10037#msg162497 and
> http://bugs.python.org/issue9248#msg221963) though that discussion is
> specifically tied to merging the enhancements from the Billiard library
> into multiprocessing.Pool. Are there still plans to do that? If so, should
> asyncio integration with multiprocessing be rolled into those plans, or
> does it make sense to pursue it separately?
>
> Even more generally, do people think this kind of integration is a good
> idea to begin with? I know using asyncio is primarily about *avoiding* the
> headaches of concurrent threads/processes, but there are always going to be
> cases where CPU-intensive work is going to be required in a primarily
> I/O-bound application. The easier it is to for developers to handle those
> use-cases, the better, IMO.
>
> Note that the same sort of integration could be done with the threading
> module, though I think there's a fairly limited use-case for that; most
> times you'd want to use threads over processes, you could probably just use
> non-blocking I/O instead.
>
> Thanks,
> Dan
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/0fd8244d/attachment-0001.html>

From oreilldf at gmail.com  Sun Jul 27 05:34:29 2014
From: oreilldf at gmail.com (Dan O'Reilly)
Date: Sat, 26 Jul 2014 23:34:29 -0400
Subject: [Python-ideas] Better integration of multiprocessing with
	asyncio
In-Reply-To: <CAP7+vJLzRK+QS-sumP7=kZPGw1uJ9s4qp0JNSsLR3MK8J-76kw@mail.gmail.com>
References: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>
 <CAP7+vJLzRK+QS-sumP7=kZPGw1uJ9s4qp0JNSsLR3MK8J-76kw@mail.gmail.com>
Message-ID: <CAP3foKLKjtG5Nfo449H2hQZunsVRbO=NT8w01bmgWp34UmcEXA@mail.gmail.com>

Right, this is the same approach I've used myself. For example, the
AsyncProcessQueue in my example above was implemented like this:

def AsyncProcessQueue(maxsize=0):
    m = Manager()
    q = m.Queue(maxsize=maxsize)
    return _ProcQueue(q)

class _ProcQueue(object):
    def __init__(self, q):
        self._queue = q
        self._executor = self._get_executor()
        self._cancelled_join = False

    def __getstate__(self):
        self_dict = self.__dict__
        self_dict['_executor'] = None
        return self_dict

    def _get_executor(self):
        return ThreadPoolExecutor(max_workers=cpu_count())

    def __setstate__(self, self_dict):
        self_dict['_executor'] = self._get_executor()
        self.__dict__.update(self_dict)

    def __getattr__(self, name):
        if name in ['qsize', 'empty', 'full', 'put', 'put_nowait',
                    'get', 'get_nowait', 'close']:
            return getattr(self._queue, name)
        else:
            raise AttributeError("'%s' object has no attribute '%s'" %
                                    (self.__class__.__name__, name))

    @asyncio.coroutine
    def coro_put(self, item):
        loop = asyncio.get_event_loop()
        return (yield from loop.run_in_executor(self._executor, self.put,
item))

    @asyncio.coroutine
    def coro_get(self):
        loop = asyncio.get_event_loop()
        return (yield from loop.run_in_executor(self._executor, self.get))

    def cancel_join_thread(self):
        self._cancelled_join = True
        self._queue.cancel_join_thread()

    def join_thread(self):
        self._queue.join_thread()
        if self._executor and not self._cancelled_join:
            self._executor.shutdown()

I'm wondering if a complete library providing this kind of behavior for all
or some subset of multiprocessing is worth adding to the the asyncio
module, or if you prefer users to deal with this on their own (or perhaps
just distribute something that provides this behavior as a stand-alone
library). I suppose adding asyncio-friendly methods to the existing objects
in multiprocessing is also an option, but I doubt its desirable to add
asyncio-specific code to modules other than asyncio.

It also sort of sounds like some of the work that's gone on in Billiard
would make the alternative, more complicated approach you mentioned a
realistic possibility, at least going by this comment by Ask Solem (from
http://bugs.python.org/issue9248#msg221963):

> we have a version of multiprocessing.Pool using async IO and one pipe per process that drastically improves performance and also avoids the threads+forking issues (well, not the initial fork), but I have not yet adapted it to use the new asyncio module in 3.4.

I don't know the details there, though. Hopefully someone more
familiar with Billiard/multiprocessing than I am can provide some
additional information.





On Sat, Jul 26, 2014 at 10:39 PM, Guido van Rossum <guido at python.org> wrote:

> I actually know very little about multiprocessing (have never used it) but
> I imagine the way you normally interact with multiprocessing is using a
> synchronous calls that talk to the subprocesses and their work queues and
> so on, right?
>
> In the asyncio world you would put that work in a thread and then use
> run_in_executor() with a thread executor -- the thread would then be
> managing the subprocesses and talking to them. While you are waiting for
> that thread to complete your other coroutines will still work.
>
> Unless you want to rewrite the communication and process management as
> coroutines, but that sounds like a lot of work.
>
>
> On Sat, Jul 26, 2014 at 1:59 PM, Dan O'Reilly <oreilldf at gmail.com> wrote:
>
>> I think it would be helpful for folks using the asyncio module to be able
>> to make non-blocking calls to objects in the multiprocessing module more
>> easily. While some use-cases for using multiprocessing can be replaced with
>> ProcessPoolExecutor/run_in_executor, there are others that cannot; more
>> advanced usages of multiprocessing.Pool aren't supported by
>> ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
>> multiprocessing classes like Lock and Queue have blocking methods that
>> could be made into coroutines.
>>
>> Consider this (extremely contrived, but use your imagination) example of
>> a asyncio-friendly Queue:
>>
>> import asyncio
>> import time
>>
>> def do_proc_work(q, val, val2):
>>     time.sleep(3)  # Imagine this is some expensive CPU work.
>>     ok = val + val2
>>     print("Passing {} to parent".format(ok))
>>     q.put(ok) # The Queue can be used with the normal blocking API, too.
>>     item = q.get()
>>     print("got {} back from parent".format(item))
>>
>> def do_some_async_io_task():
>>     # Imagine there's some kind of asynchronous I/O
>>     # going on here that utilizes asyncio.
>>     asyncio.sleep(5)
>>
>> @asyncio.coroutine
>> def do_work(q):
>>     loop.run_in_executor(ProcessPoolExecutor(),
>>                          do_proc_work, q, 1, 2)
>>     do_some_async_io_task()
>>     item = yield from q.coro_get() # Non-blocking get that won't affect
>> our io_task
>>     print("Got {} from worker".format(item))
>>     item = item + 25
>>     yield from q.coro_put(item)
>>
>>
>> if __name__  == "__main__":
>>     q = AsyncProcessQueue()  # This is our new asyncio-friendly version
>> of multiprocessing.Queue
>>     loop = asyncio.get_event_loop()
>>     loop.run_until_complete(do_work(q))
>>
>> I have seen some rumblings about a desire to do this kind of integration
>> on the bug tracker (http://bugs.python.org/issue10037#msg162497 and
>> http://bugs.python.org/issue9248#msg221963) though that discussion is
>> specifically tied to merging the enhancements from the Billiard library
>> into multiprocessing.Pool. Are there still plans to do that? If so, should
>> asyncio integration with multiprocessing be rolled into those plans, or
>> does it make sense to pursue it separately?
>>
>> Even more generally, do people think this kind of integration is a good
>> idea to begin with? I know using asyncio is primarily about *avoiding* the
>> headaches of concurrent threads/processes, but there are always going to be
>> cases where CPU-intensive work is going to be required in a primarily
>> I/O-bound application. The easier it is to for developers to handle those
>> use-cases, the better, IMO.
>>
>> Note that the same sort of integration could be done with the threading
>> module, though I think there's a fairly limited use-case for that; most
>> times you'd want to use threads over processes, you could probably just use
>> non-blocking I/O instead.
>>
>> Thanks,
>> Dan
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/75e41028/attachment.html>

From ncoghlan at gmail.com  Sun Jul 27 05:39:59 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 27 Jul 2014 13:39:59 +1000
Subject: [Python-ideas] Mutating while iterating
In-Reply-To: <cd149d5b-5b57-4fbe-a639-a7f13fae64cf@googlegroups.com>
References: <cd149d5b-5b57-4fbe-a639-a7f13fae64cf@googlegroups.com>
Message-ID: <CADiSq7cG0Cfp0AVn2xeVx-rRyqzeGUPV3XWmQQcXSeMa=02LPA@mail.gmail.com>

On 27 July 2014 06:51, Aaron Brady <castironpi at gmail.com> wrote:
> Hi, I asked about the inconsistency of the "RuntimeError" being raised when
> mutating a container while iterating over it here [1], "set and dict
> iteration" on Aug 16, 2012.

Hi,

This is clearly an issue of grave concern to you, but as Raymond
pointed out previously, you appear to have misunderstood the purpose
of those exceptions. They're there to prevent catastrophic failure of
the interpreter itself (i.e. segmentation faults), not to help find
bugs in user code. If users want to mutate containers while they're
iterating over them, they're generally free to do so. The only time
we'll actively disallow it is when such mutation will outright *break*
the iterator, rather than merely producing potentially surprising
results.

I have closed the new issue and added a longer reply (with examples)
that will hopefully better explain why we have no intention of
changing this behaviour: http://bugs.python.org/issue22084#msg224100

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From guido at python.org  Sun Jul 27 05:43:07 2014
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Jul 2014 20:43:07 -0700
Subject: [Python-ideas] Better integration of multiprocessing with
	asyncio
In-Reply-To: <CAP3foKLKjtG5Nfo449H2hQZunsVRbO=NT8w01bmgWp34UmcEXA@mail.gmail.com>
References: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>
 <CAP7+vJLzRK+QS-sumP7=kZPGw1uJ9s4qp0JNSsLR3MK8J-76kw@mail.gmail.com>
 <CAP3foKLKjtG5Nfo449H2hQZunsVRbO=NT8w01bmgWp34UmcEXA@mail.gmail.com>
Message-ID: <CAP7+vJKpaie4h5krUPcK=4_7qH=KZDwRoMGezi5io1Nz8xp9Fg@mail.gmail.com>

I'm going to go out on a limb here and say that it feels too early to me.
First someone has to actually solve this problem well as a 3rd party
package before we can talk about adding it to the asyncio package. It
doesn't actually sound like Billiards has adapted to asyncio yet (not that
I have any idea what Billiards is -- it sounds like a fork of
multiprocessing actually?).


On Sat, Jul 26, 2014 at 8:34 PM, Dan O'Reilly <oreilldf at gmail.com> wrote:

> Right, this is the same approach I've used myself. For example, the
> AsyncProcessQueue in my example above was implemented like this:
>
> def AsyncProcessQueue(maxsize=0):
>     m = Manager()
>     q = m.Queue(maxsize=maxsize)
>     return _ProcQueue(q)
>
> class _ProcQueue(object):
>     def __init__(self, q):
>         self._queue = q
>         self._executor = self._get_executor()
>         self._cancelled_join = False
>
>     def __getstate__(self):
>         self_dict = self.__dict__
>         self_dict['_executor'] = None
>         return self_dict
>
>     def _get_executor(self):
>         return ThreadPoolExecutor(max_workers=cpu_count())
>
>     def __setstate__(self, self_dict):
>         self_dict['_executor'] = self._get_executor()
>          self.__dict__.update(self_dict)
>
>     def __getattr__(self, name):
>         if name in ['qsize', 'empty', 'full', 'put', 'put_nowait',
>                     'get', 'get_nowait', 'close']:
>             return getattr(self._queue, name)
>         else:
>             raise AttributeError("'%s' object has no attribute '%s'" %
>                                     (self.__class__.__name__, name))
>
>     @asyncio.coroutine
>     def coro_put(self, item):
>         loop = asyncio.get_event_loop()
>         return (yield from loop.run_in_executor(self._executor, self.put,
> item))
>
>     @asyncio.coroutine
>     def coro_get(self):
>         loop = asyncio.get_event_loop()
>         return (yield from loop.run_in_executor(self._executor, self.get))
>
>     def cancel_join_thread(self):
>         self._cancelled_join = True
>         self._queue.cancel_join_thread()
>
>     def join_thread(self):
>         self._queue.join_thread()
>         if self._executor and not self._cancelled_join:
>             self._executor.shutdown()
>
> I'm wondering if a complete library providing this kind of behavior for
> all or some subset of multiprocessing is worth adding to the the asyncio
> module, or if you prefer users to deal with this on their own (or perhaps
> just distribute something that provides this behavior as a stand-alone
> library). I suppose adding asyncio-friendly methods to the existing objects
> in multiprocessing is also an option, but I doubt its desirable to add
> asyncio-specific code to modules other than asyncio.
>
> It also sort of sounds like some of the work that's gone on in Billiard
> would make the alternative, more complicated approach you mentioned a
> realistic possibility, at least going by this comment by Ask Solem (from
> http://bugs.python.org/issue9248#msg221963):
>
> > we have a version of multiprocessing.Pool using async IO and one pipe per process that drastically improves performance and also avoids the threads+forking issues (well, not the initial fork), but I have not yet adapted it to use the new asyncio module in 3.4.
>
> I don't know the details there, though. Hopefully someone more familiar with Billiard/multiprocessing than I am can provide some additional information.
>
>
>
>
>
> On Sat, Jul 26, 2014 at 10:39 PM, Guido van Rossum <guido at python.org>
> wrote:
>
>> I actually know very little about multiprocessing (have never used it)
>> but I imagine the way you normally interact with multiprocessing is using a
>> synchronous calls that talk to the subprocesses and their work queues and
>> so on, right?
>>
>> In the asyncio world you would put that work in a thread and then use
>> run_in_executor() with a thread executor -- the thread would then be
>> managing the subprocesses and talking to them. While you are waiting for
>> that thread to complete your other coroutines will still work.
>>
>> Unless you want to rewrite the communication and process management as
>> coroutines, but that sounds like a lot of work.
>>
>>
>> On Sat, Jul 26, 2014 at 1:59 PM, Dan O'Reilly <oreilldf at gmail.com> wrote:
>>
>>> I think it would be helpful for folks using the asyncio module to be
>>> able to make non-blocking calls to objects in the multiprocessing module
>>> more easily. While some use-cases for using multiprocessing can be replaced
>>> with ProcessPoolExecutor/run_in_executor, there are others that cannot;
>>> more advanced usages of multiprocessing.Pool aren't supported by
>>> ProcessPoolExecutor (initializer/initargs, contexts, etc.), and other
>>> multiprocessing classes like Lock and Queue have blocking methods that
>>> could be made into coroutines.
>>>
>>> Consider this (extremely contrived, but use your imagination) example of
>>> a asyncio-friendly Queue:
>>>
>>> import asyncio
>>> import time
>>>
>>> def do_proc_work(q, val, val2):
>>>     time.sleep(3)  # Imagine this is some expensive CPU work.
>>>     ok = val + val2
>>>     print("Passing {} to parent".format(ok))
>>>     q.put(ok) # The Queue can be used with the normal blocking API, too.
>>>     item = q.get()
>>>     print("got {} back from parent".format(item))
>>>
>>> def do_some_async_io_task():
>>>     # Imagine there's some kind of asynchronous I/O
>>>     # going on here that utilizes asyncio.
>>>     asyncio.sleep(5)
>>>
>>> @asyncio.coroutine
>>> def do_work(q):
>>>     loop.run_in_executor(ProcessPoolExecutor(),
>>>                          do_proc_work, q, 1, 2)
>>>     do_some_async_io_task()
>>>     item = yield from q.coro_get() # Non-blocking get that won't affect
>>> our io_task
>>>     print("Got {} from worker".format(item))
>>>     item = item + 25
>>>     yield from q.coro_put(item)
>>>
>>>
>>> if __name__  == "__main__":
>>>     q = AsyncProcessQueue()  # This is our new asyncio-friendly version
>>> of multiprocessing.Queue
>>>     loop = asyncio.get_event_loop()
>>>     loop.run_until_complete(do_work(q))
>>>
>>> I have seen some rumblings about a desire to do this kind of integration
>>> on the bug tracker (http://bugs.python.org/issue10037#msg162497 and
>>> http://bugs.python.org/issue9248#msg221963) though that discussion is
>>> specifically tied to merging the enhancements from the Billiard library
>>> into multiprocessing.Pool. Are there still plans to do that? If so, should
>>> asyncio integration with multiprocessing be rolled into those plans, or
>>> does it make sense to pursue it separately?
>>>
>>> Even more generally, do people think this kind of integration is a good
>>> idea to begin with? I know using asyncio is primarily about *avoiding* the
>>> headaches of concurrent threads/processes, but there are always going to be
>>> cases where CPU-intensive work is going to be required in a primarily
>>> I/O-bound application. The easier it is to for developers to handle those
>>> use-cases, the better, IMO.
>>>
>>> Note that the same sort of integration could be done with the threading
>>> module, though I think there's a fairly limited use-case for that; most
>>> times you'd want to use threads over processes, you could probably just use
>>> non-blocking I/O instead.
>>>
>>> Thanks,
>>> Dan
>>>
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/7072bd9e/attachment.html>

From ncoghlan at gmail.com  Sun Jul 27 05:47:49 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 27 Jul 2014 13:47:49 +1000
Subject: [Python-ideas] Better integration of multiprocessing with
	asyncio
In-Reply-To: <CAP3foKLKjtG5Nfo449H2hQZunsVRbO=NT8w01bmgWp34UmcEXA@mail.gmail.com>
References: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>
 <CAP7+vJLzRK+QS-sumP7=kZPGw1uJ9s4qp0JNSsLR3MK8J-76kw@mail.gmail.com>
 <CAP3foKLKjtG5Nfo449H2hQZunsVRbO=NT8w01bmgWp34UmcEXA@mail.gmail.com>
Message-ID: <CADiSq7cXw45HTMM2w1R2RBrjA-Hu4hZ6aRZtZ=igHnT9GOp=hg@mail.gmail.com>

On 27 July 2014 13:34, Dan O'Reilly <oreilldf at gmail.com> wrote:
>
> I'm wondering if a complete library providing this kind of behavior for all
> or some subset of multiprocessing is worth adding to the the asyncio module,
> or if you prefer users to deal with this on their own (or perhaps just
> distribute something that provides this behavior as a stand-alone library).
> I suppose adding asyncio-friendly methods to the existing objects in
> multiprocessing is also an option, but I doubt its desirable to add
> asyncio-specific code to modules other than asyncio.

Actually, having asyncio act as a "nexus" for asynchronous IO backends
is one of the reasons for its existence. The asyncio event loop is
pluggable, so making multiprocessing asyncio friendly (whether
directly, or as an addon library that bridges the two) *also* has the
effect of making it compatible with all the other asynchronous event
loops that can be plugged into the asyncio framework.

I'm inclined to agree with Guido, though - while I think making
asyncio and multiprocessing play well together is a good idea in
principle, I think we're still in the "third party exploration phase"
of that integration. Once folks figure out good ways to do it, *then*
we can start talking about making that integration a default part of
Python 3.5 or 3.6+.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ryan at ryanhiebert.com  Sun Jul 27 05:48:02 2014
From: ryan at ryanhiebert.com (Ryan Hiebert)
Date: Sat, 26 Jul 2014 22:48:02 -0500
Subject: [Python-ideas] Better integration of multiprocessing with
	asyncio
In-Reply-To: <CAP7+vJKpaie4h5krUPcK=4_7qH=KZDwRoMGezi5io1Nz8xp9Fg@mail.gmail.com>
References: <CAP3foK+Q-qkJygGvPjX6Wqyr_699ZsRHRpL+_WFYXiMa1RRZ-Q@mail.gmail.com>
 <CAP7+vJLzRK+QS-sumP7=kZPGw1uJ9s4qp0JNSsLR3MK8J-76kw@mail.gmail.com>
 <CAP3foKLKjtG5Nfo449H2hQZunsVRbO=NT8w01bmgWp34UmcEXA@mail.gmail.com>
 <CAP7+vJKpaie4h5krUPcK=4_7qH=KZDwRoMGezi5io1Nz8xp9Fg@mail.gmail.com>
Message-ID: <6D00A7DF-A35D-4608-8AEA-5C21376F909B@ryanhiebert.com>


> On Jul 26, 2014, at 10:43 PM, Guido van Rossum <guido at python.org> wrote:
> 
> I'm going to go out on a limb here and say that it feels too early to me. First someone has to actually solve this problem well as a 3rd party package before we can talk about adding it to the asyncio package. It doesn't actually sound like Billiards has adapted to asyncio yet (not that I have any idea what Billiards is -- it sounds like a fork of multiprocessing actually?).

Yep, Billiard is a fork of multiprocessing: https://pypi.python.org/pypi/billiard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140726/db8a3276/attachment.html>

From ronaldoussoren at mac.com  Sun Jul 27 09:42:02 2014
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Sun, 27 Jul 2014 09:42:02 +0200
Subject: [Python-ideas] PEP 447 revisited
In-Reply-To: <CADiSq7dNdCwqcMfAsmuoDUPXcFQBjq+Agv29rgB6Jzx50XnrNw@mail.gmail.com>
References: <5BB87CC4-F31B-4213-AAAC-0C0CE738460C@mac.com>
 <CADiSq7dNdCwqcMfAsmuoDUPXcFQBjq+Agv29rgB6Jzx50XnrNw@mail.gmail.com>
Message-ID: <06ED1B99-850E-49C1-950C-B311FEC340C8@mac.com>


On 26 Jul 2014, at 13:59, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 26 July 2014 18:03, Ronald Oussoren <ronaldoussoren at mac.com> wrote:
>> Hi,
>> 
>> After a long hiatus I?ve done some updates to PEP 447 which proposes a new metaclass method that?s used in attribute resolution for normal and super instances. There have been two updates, the first one is trivial, the proposed method has a new name (__getdescriptor__).  The second change to the PEP is to add a Python pseudo implementation of object.__getattribute__ and super.__getattribute__ to make it easier to reason about the impact of the proposal.
>> 
>> I?d like to move forward with this PEP, either to rejection or (preferable) to acceptance of the feature in some form. That said, I?m not too attached to the exact proposal, it just seems to be the minmal clean change that can be used to implement my use case for this.
>> 
>> My use case is fairly obscure, but hopefully it is not too obscure :-).  The problem I have at the moment is basically that it is not possible to hook into the attribute resolution algorithm used by super.__getattribute__ and this PEP would solve that.
> 
> The use case seems reasonable to me, and the new slot name seems much
> easier to document and explain than the previous iteration.

Some Australian guy you may know suggested the name the last time I posted 
the PEP for review, and I liked the name.  Naming is hard...

> 
> I'd like to see the PEP look into the inspect module and consider the
> consequences for the functions there (e.g. another way for
> getattr_static to miss methods), as well as any possible implications
> for dir(). We had a few issues there with the enum changes for 3.4
> (and some more serious ones with Argument Clinic) - it's not a
> blocker, it's just nice going in to have some idea of the impact going
> in :)

I agree that it is useful to explain those consequences. The consequences
for dir() should be similar to those for __getattribute__ itself: if you override
the default implementation you should implement __dir__ to match, or live
with the inconsistency.

There should be little or no impact on inspect, other then that getattr_static 
may not work as expected when using a custom implemention of __getdescriptor__
because the class __dict__ may not contain the values you need. There?s nothing
that can be done about that, the entire point of getattr_static is to avoid triggering
custom attribute lookup code.

inspect.getmembers and inspect.get_class_attrs, look directly at the class __dict__, 
and hence might not show everything that?s available through the class when using
a custom __getdescriptor__ method.  I have to think about the consequences and possible
mitigation of those consequences a bit, not just for this PEP but for the current PyObjC 
implementation as well.

Anyways, I?ll add a section about introspection to the PEP that describes these issues 
and their consequences.

Ronald

> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From joshua at landau.ws  Mon Jul 28 07:26:13 2014
From: joshua at landau.ws (Joshua Landau)
Date: Mon, 28 Jul 2014 06:26:13 +0100
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140727011739.GC9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
Message-ID: <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>

On 27 July 2014 02:17, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Jul 27, 2014 at 09:34:16AM +1000, Alexander Heger wrote:
>
>> Is there a good reason for not implementing the "+" operator for dict.update()?
> [...]
>> That is
>>
>> B += A
>>
>> should be equivalent to
>>
>> B.update(A)
>
> You're asking the wrong question. The burden is not on people to justify
> *not* adding new features, the burden is on somebody to justify adding
> them. Is there a good reason for implementing the + operator as
> dict.update?

One good reason is that people are still convinced "dict(A, **B)"
makes some kind of sense.

But really, we have collections.ChainMap, dict addition is confusing
and there's already a PEP (python.org/dev/peps/pep-0448) that has a
solution I prefer ({**A, **B}).

From steve at pearwood.info  Mon Jul 28 16:59:51 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jul 2014 00:59:51 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
Message-ID: <20140728145951.GH9112@ando>

On Mon, Jul 28, 2014 at 06:26:13AM +0100, Joshua Landau wrote:
> On 27 July 2014 02:17, Steven D'Aprano <steve at pearwood.info> wrote:
[...]
> > Is there a good reason for implementing the + operator as
> > dict.update?
> 
> One good reason is that people are still convinced "dict(A, **B)"
> makes some kind of sense.

Explain please. dict(A, **B) makes perfect sense to me, and it works 
perfectly too. It's a normal constructor call, using the same syntax as 
any other function or method call. Are you suggesting that it does not 
make sense?


-- 
Steven

From dw+python-ideas at hmmz.org  Mon Jul 28 17:33:06 2014
From: dw+python-ideas at hmmz.org (dw+python-ideas at hmmz.org)
Date: Mon, 28 Jul 2014 15:33:06 +0000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140728145951.GH9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando>
Message-ID: <20140728153306.GA5756@k2>

On Tue, Jul 29, 2014 at 12:59:51AM +1000, Steven D'Aprano wrote:

> > One good reason is that people are still convinced "dict(A, **B)"
> > makes some kind of sense.
> 
> Explain please. dict(A, **B) makes perfect sense to me, and it works
> perfectly too. It's a normal constructor call, using the same syntax
> as any other function or method call. Are you suggesting that it does
> not make sense?

It worked in Python 2, but Python 3 added code to explicitly prevent the
kwargs mechanism from being abused by passing non-string keys.
Effectively, the only reason it worked was due to a Python 2.x kwargs
implementation detail.

It took me a while to come to terms with this one too, it was really
quite a nice hack. But that's all it ever was. The domain of valid keys
accepted by **kwargs should never have exceeded the range supported by
the language syntax for declaring keyword arguments.


David

From guido at python.org  Mon Jul 28 17:40:17 2014
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Jul 2014 08:40:17 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140728145951.GH9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando>
Message-ID: <CAP7+vJKQ=erVvQVji9=LeMssxzpX77NWsmUeKCxQpbwfup_Hdw@mail.gmail.com>

I'll regret jumping in here, but while dict(A, **B) as a way to merge two
dicts A and B makes some sense, it has two drawbacks: (1) slow (creates an
extra copy of B as it creates the keyword args structure for dict()) and
(2) not general enough (doesn't support key types other than str).


On Mon, Jul 28, 2014 at 7:59 AM, Steven D'Aprano <steve at pearwood.info>
wrote:

> On Mon, Jul 28, 2014 at 06:26:13AM +0100, Joshua Landau wrote:
> > On 27 July 2014 02:17, Steven D'Aprano <steve at pearwood.info> wrote:
> [...]
> > > Is there a good reason for implementing the + operator as
> > > dict.update?
> >
> > One good reason is that people are still convinced "dict(A, **B)"
> > makes some kind of sense.
>
> Explain please. dict(A, **B) makes perfect sense to me, and it works
> perfectly too. It's a normal constructor call, using the same syntax as
> any other function or method call. Are you suggesting that it does not
> make sense?
>
>
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140728/334fdb45/attachment.html>

From steve at pearwood.info  Mon Jul 28 18:04:50 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jul 2014 02:04:50 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140728153306.GA5756@k2>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
Message-ID: <20140728160450.GI9112@ando>

On Mon, Jul 28, 2014 at 03:33:06PM +0000, dw+python-ideas at hmmz.org wrote:
> On Tue, Jul 29, 2014 at 12:59:51AM +1000, Steven D'Aprano wrote:
> 
> > > One good reason is that people are still convinced "dict(A, **B)"
> > > makes some kind of sense.
> > 
> > Explain please. dict(A, **B) makes perfect sense to me, and it works
> > perfectly too. It's a normal constructor call, using the same syntax
> > as any other function or method call. Are you suggesting that it does
> > not make sense?
> 
> It worked in Python 2, but Python 3 added code to explicitly prevent the
> kwargs mechanism from being abused by passing non-string keys.

/face-palm

Ah of course! You're right, using dict(A, **B) isn't general enough.

I'm still inclined to prefer allowing update() to accept multiple 
arguments:

a.update(b, c, d)

rather than a += b + c + d

which suggests that maybe there ought to be an updated() built-in, Let 
the bike-shedding begin: should such a thing be spelled ?

new_dict = a + b + c + d

Pros: + is short to type; subclasses can control the type of new_dict.
Cons: dict addition isn't obvious.

new_dict = updated(a, b, c, d)

Pros: analogous to sort/sorted, reverse/reversed.
Cons: another built-in; isn't very general, only applies to Mappings

new_dict = a.updated(b, c, d)

Pros: only applies to mappings, so it should be a method; subclasses can 
control the type of the new dict returned.
Cons: easily confused with dict.update




-- 
Steven

From guido at python.org  Mon Jul 28 18:08:49 2014
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Jul 2014 09:08:49 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140728153306.GA5756@k2>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
Message-ID: <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>

In addition, dict(A, **B) is not something you easily stumble upon when
your goal is "merge two dicts"; nor is it even clear that that's what it is
when you read it for the first time.

All signs of too-clever hacks in my book.


On Mon, Jul 28, 2014 at 8:33 AM, <dw+python-ideas at hmmz.org> wrote:

> On Tue, Jul 29, 2014 at 12:59:51AM +1000, Steven D'Aprano wrote:
>
> > > One good reason is that people are still convinced "dict(A, **B)"
> > > makes some kind of sense.
> >
> > Explain please. dict(A, **B) makes perfect sense to me, and it works
> > perfectly too. It's a normal constructor call, using the same syntax
> > as any other function or method call. Are you suggesting that it does
> > not make sense?
>
> It worked in Python 2, but Python 3 added code to explicitly prevent the
> kwargs mechanism from being abused by passing non-string keys.
> Effectively, the only reason it worked was due to a Python 2.x kwargs
> implementation detail.
>
> It took me a while to come to terms with this one too, it was really
> quite a nice hack. But that's all it ever was. The domain of valid keys
> accepted by **kwargs should never have exceeded the range supported by
> the language syntax for declaring keyword arguments.
>
>
> David
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140728/6601f5a6/attachment.html>

From ron3200 at gmail.com  Mon Jul 28 19:17:10 2014
From: ron3200 at gmail.com (Ron Adam)
Date: Mon, 28 Jul 2014 12:17:10 -0500
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140728160450.GI9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando>
Message-ID: <lr60in$bn5$1@ger.gmane.org>



On 07/28/2014 11:04 AM, Steven D'Aprano wrote:
> On Mon, Jul 28, 2014 at 03:33:06PM +0000,dw+python-ideas at hmmz.org  wrote:
>> >On Tue, Jul 29, 2014 at 12:59:51AM +1000, Steven D'Aprano wrote:
>> >
>>>> > > >One good reason is that people are still convinced "dict(A, **B)"
>>>> > > >makes some kind of sense.
>>> > >
>>> > >Explain please. dict(A, **B) makes perfect sense to me, and it works
>>> > >perfectly too. It's a normal constructor call, using the same syntax
>>> > >as any other function or method call. Are you suggesting that it does
>>> > >not make sense?
>> >
>> >It worked in Python 2, but Python 3 added code to explicitly prevent the
>> >kwargs mechanism from being abused by passing non-string keys.
> /face-palm
>
> Ah of course! You're right, using dict(A, **B) isn't general enough.
  and make the language easier to write and use
> I'm still inclined to prefer allowing update() to accept multiple
> arguments:
>
> a.update(b, c, d)

To me, the constructor and update method should be as near alike as possible.

So I think if it's done in the update method, it should also work in the 
constructor.  And other type constructors, such as list, should work in 
similar ways as well.  I'm not sure that going in this direction would be 
good in the long term.


> rather than a += b + c + d
>
> which suggests that maybe there ought to be an updated() built-in, Let
> the bike-shedding begin: should such a thing be spelled ?
>
> new_dict = a + b + c + d
>
> Pros: + is short to type; subclasses can control the type of new_dict.
> Cons: dict addition isn't obvious.

I think it's more obvious.  It only needs __add__ and __iadd__ methods to 
make it consistent with the list type.

The cons is that somewhere someone could be catching TypeError to 
differentiate dict from other types while adding.  But it's just as likely 
they are doing so in order to add them after a TypeError occurs.

I think this added consistency between lists and dicts would be useful.

But, Putting __add__ and __iadd__ methods on dicts seems like something 
that was probably discussed in length before, and I wonder what reasons 
where given for not doing it then.

Cheers,
   Ron


From antoine at python.org  Mon Jul 28 19:29:00 2014
From: antoine at python.org (Antoine Pitrou)
Date: Mon, 28 Jul 2014 13:29:00 -0400
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>
Message-ID: <lr618s$jtr$1@ger.gmane.org>

Le 28/07/2014 12:08, Guido van Rossum a ?crit :
> In addition, dict(A, **B) is not something you easily stumble upon when
> your goal is "merge two dicts"; nor is it even clear that that's what it
> is when you read it for the first time.
>
> All signs of too-clever hacks in my book.

Agreed with Guido (!).

Regards

Antoine.



From ryan at ryanhiebert.com  Mon Jul 28 20:37:03 2014
From: ryan at ryanhiebert.com (Ryan Hiebert)
Date: Mon, 28 Jul 2014 13:37:03 -0500
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140728160450.GI9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando>
Message-ID: <6122DCE6-D84A-4B05-AB02-C1FD3CED82A4@ryanhiebert.com>


> On Jul 28, 2014, at 11:04 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> I'm still inclined to prefer allowing update() to accept multiple 
> arguments:
> 
> a.update(b, c, d)
> 
> rather than a += b + c + d
> 
> which suggests that maybe there ought to be an updated() built-in, Let 
> the bike-shedding begin: should such a thing be spelled ?
> 
> new_dict = a + b + c + d
> 
or, to match set

new_dict = a | b | c | d


From nathan at cmu.edu  Mon Jul 28 20:58:21 2014
From: nathan at cmu.edu (Nathan Schneider)
Date: Mon, 28 Jul 2014 14:58:21 -0400
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
Message-ID: <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>

On Sat, Jul 26, 2014 at 7:34 PM, Alexander Heger <python at 2sn.net> wrote:

>
> My apologies if this has been posted before but with a quick google
> search I could not see it; if it was, could you please point me to the
> thread?
>

Here are two threads that had some discussion of this:
https://mail.python.org/pipermail/python-ideas/2011-December/013227.html
and https://mail.python.org/pipermail/python-ideas/2013-June/021140.html.

Seems like a useful feature if there could be a clean way to spell it.

Cheers,
Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140728/8b473654/attachment.html>

From p.f.moore at gmail.com  Mon Jul 28 21:21:54 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 28 Jul 2014 20:21:54 +0100
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
Message-ID: <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>

On 28 July 2014 19:58, Nathan Schneider <nathan at cmu.edu> wrote:
> Here are two threads that had some discussion of this:
> https://mail.python.org/pipermail/python-ideas/2011-December/013227.html

This doesn't seem to have a use case, other than "it would be nice".

> https://mail.python.org/pipermail/python-ideas/2013-June/021140.html.

This can be handled using ChainMap, if I understand the proposal.

> Seems like a useful feature if there could be a clean way to spell it.

I've yet to see any real-world situation when I've wanted "dictionary
addition" (with any of the various semantics proposed here) and I've
never encountered a situation where using d1.update(d2) was
sufficiently awkward that having an operator seemed reasonable.

In all honesty, I'd suggest that code which looks bad enough to
warrant even considering this feature is probably badly in need of
refactoring, at which point the problem will likely go away.

Paul

From abarnert at yahoo.com  Mon Jul 28 22:20:20 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Jul 2014 13:20:20 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
Message-ID: <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>

On Jul 28, 2014, at 12:21, Paul Moore <p.f.moore at gmail.com> wrote:

> On 28 July 2014 19:58, Nathan Schneider <nathan at cmu.edu> wrote:
>> Here are two threads that had some discussion of this:
>> https://mail.python.org/pipermail/python-ideas/2011-December/013227.html
> 
> This doesn't seem to have a use case, other than "it would be nice".
> 
>> https://mail.python.org/pipermail/python-ideas/2013-June/021140.html.
> 
> This can be handled using ChainMap, if I understand the proposal.

When the underlying dicts and desired combined dict are all going to be used immutably, ChainMap is the perfect answer. (Better than an "updated" function for performance if nothing else.) And usually, when you're looking for a non-mutating combine-dicts operation, that will be what you want.

But usually isn't always. If you want a snapshot of the combination of mutable dicts, ChainMap is wrong. If you want to be able to mutate the result, ChainMap is wrong.

All that being said, I'm not sure these use cases are sufficiently common to warrant adding an operator--especially since there are other just-as-(un)common use cases it wouldn't solve. (For example, what I often want is a mutable "overlay" ChainMap, which doesn't need to copy the entire potentially-gigantic source dicts. I wouldn't expect an operator for that, even though I need it far more often than I need a mutable snapshot copy.)

And of course, as you say, real-life use cases would be a lot more compelling than theoretical/abstract ones.

>> Seems like a useful feature if there could be a clean way to spell it.
> 
> I've yet to see any real-world situation when I've wanted "dictionary
> addition" (with any of the various semantics proposed here) and I've
> never encountered a situation where using d1.update(d2) was
> sufficiently awkward that having an operator seemed reasonable.
> 
> In all honesty, I'd suggest that code which looks bad enough to
> warrant even considering this feature is probably badly in need of
> refactoring, at which point the problem will likely go away.
> 
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From encukou at gmail.com  Mon Jul 28 22:53:43 2014
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 28 Jul 2014 22:53:43 +0200
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
Message-ID: <CA+=+wqBcY8KM06Mdrx99XizM4exW-4FeoE=p6QqGiKu-aJOnoQ@mail.gmail.com>

On Mon, Jul 28, 2014 at 10:20 PM, Andrew Barnert
<abarnert at yahoo.com.dmarc.invalid> wrote:

> When the underlying dicts and desired combined dict are all going to be used immutably, ChainMap is the perfect answer. (Better than an "updated" function for performance if nothing else.) And usually, when you're looking for a non-mutating combine-dicts operation, that will be what you want.
>
> But usually isn't always. If you want a snapshot of the combination of mutable dicts, ChainMap is wrong. If you want to be able to mutate the result, ChainMap is wrong.

In those cases, do dict(ChainMap(...)).

>
> All that being said, I'm not sure these use cases are sufficiently common to warrant adding an operator--especially since there are other just-as-(un)common use cases it wouldn't solve. (For example, what I often want is a mutable "overlay" ChainMap, which doesn't need to copy the entire potentially-gigantic source dicts. I wouldn't expect an operator for that, even though I need it far more often than I need a mutable snapshot copy.)
>
> And of course, as you say, real-life use cases would be a lot more compelling than theoretical/abstract ones.

From python at 2sn.net  Mon Jul 28 22:59:29 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 06:59:29 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>
Message-ID: <CAN3CYHxUGLaTNPCC-dgeDpQmkNvfqUbwSHvDmWPL+N0nNo1gSA@mail.gmail.com>

On 29 July 2014 02:08, Guido van Rossum <guido at python.org> wrote:
> In addition, dict(A, **B) is not something you easily stumble upon when your
> goal is "merge two dicts"; nor is it even clear that that's what it is when
> you read it for the first time.
>
> All signs of too-clever hacks in my book.

I try to convince students to learn and *use* python.

If I tell students to merge 2 dictionaries they have to do dict(A,
**B} or {**A, **B} that seem less clear (not something you "stumble
across" as Guidon says) than A + B; then we still have to tell them
the rules of the operation, as usual for any operation.

It does not have to be "+", could be the "union" operator "|" that is
used for sets where
s.update(t)
is the same as
s |= t

... and accordingly

D = A | B | C

Maybe this operator is better as this equivalence is already being
used (for sets).  Accordingly "union(A,B)" could do a merge operation
and return the new dict().

(this then still allows people who want "+" to add the values be made
happy in the long run)

-Alexander

From python at 2sn.net  Tue Jul 29 00:15:49 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 08:15:49 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
Message-ID: <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>

> In all honesty, I'd suggest that code which looks bad enough to
> warrant even considering this feature is probably badly in need of
> refactoring, at which point the problem will likely go away.

I often want to call functions with added (or removed, replaced)
keywords from the call.

args0 = dict(...)
args1 = dict(...)

def f(**kwargs):
    g(**(arg0 | kwargs | args1))

currently I have to write

args = dict(...)
def f(**kwargs):
    temp_args = dict(dic0)
    temp_args.update(kwargs)
    temp_args.update(dic1)
    g(**temp_args)

It would also make the proposed feature to allow multiple kw args
expansions in Python 3.5 easy to write by having

f(**a, **b, **c)
be equivalent to
f(**(a | b | c))

-Alexander

From abarnert at yahoo.com  Tue Jul 29 00:17:22 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Jul 2014 15:17:22 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHxUGLaTNPCC-dgeDpQmkNvfqUbwSHvDmWPL+N0nNo1gSA@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>
 <CAN3CYHxUGLaTNPCC-dgeDpQmkNvfqUbwSHvDmWPL+N0nNo1gSA@mail.gmail.com>
Message-ID: <999F9DAF-E27A-46FE-A444-C2713A18BBB6@yahoo.com>

On Jul 28, 2014, at 13:59, Alexander Heger <python at 2sn.net> wrote:

> On 29 July 2014 02:08, Guido van Rossum <guido at python.org> wrote:
>> In addition, dict(A, **B) is not something you easily stumble upon when your
>> goal is "merge two dicts"; nor is it even clear that that's what it is when
>> you read it for the first time.
>> 
>> All signs of too-clever hacks in my book.
> 
> I try to convince students to learn and *use* python.
> 
> If I tell students to merge 2 dictionaries they have to do dict(A,
> **B} or {**A, **B} that seem less clear (not something you "stumble
> across" as Guidon says) than A + B; then we still have to tell them
> the rules of the operation, as usual for any operation.
> 
> It does not have to be "+", could be the "union" operator "|" that is
> used for sets where
> s.update(t)
> is the same as
> s |= t

The difference is that with sets, it (at least conceptually) doesn't matter whether you keep elements from s or t when they collide, because by definition they only collide if they're equal, but with dicts, it very much matters whether you keep items from s or t when their keys collide, because the corresponding values are generally _not_ equal. So this is a false analogy; the same problem raised in the first three replies on this thread still needs to be answered: Is it obvious that the values from b should overwrite the values from a (assuming that's the rule you're suggesting, since you didn't specify; translate to the appropriate question if you want a different rule) in all real-life use cases? If not, is this so useful that the benefits in some uses outweigh the almost certain confusion in others? Without a compelling "yes" to one of those two questions, we're still at square one here; switching from + to | and making an analogy with sets doesn't help.

> ... and accordingly
> 
> D = A | B | C
> 
> Maybe this operator is better as this equivalence is already being
> used (for sets).  Accordingly "union(A,B)" could do a merge operation
> and return the new dict().

Wouldn't you expect a top-level union function to take any two iterables and return the union of them as a set (especially given that set.union accepts any iterable for its non-self argument)? A.union(B) seems a lot better than union(A, B).

Then again, A.updated(B) or updated?A, B) might be even better, as someone suggested, because the parallel between update and updated (and between e.g. sort and sorted) is not at all problematic.

> (this then still allows people who want "+" to add the values be made
> happy in the long run)
> 
> -Alexander
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From abarnert at yahoo.com  Tue Jul 29 00:19:22 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Jul 2014 15:19:22 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
Message-ID: <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>

On Jul 28, 2014, at 15:15, Alexander Heger <python at 2sn.net> wrote:

>> In all honesty, I'd suggest that code which looks bad enough to
>> warrant even considering this feature is probably badly in need of
>> refactoring, at which point the problem will likely go away.
> 
> I often want to call functions with added (or removed, replaced)
> keywords from the call.
> 
> args0 = dict(...)
> args1 = dict(...)
> 
> def f(**kwargs):
>    g(**(arg0 | kwargs | args1))
> 
> currently I have to write
> 
> args = dict(...)
> def f(**kwargs):
>    temp_args = dict(dic0)
>    temp_args.update(kwargs)
>    temp_args.update(dic1)
>    g(**temp_args)

No, you just have to write a one-liner with ChainMap, except in the (very rare) case where you're expecting g to hold onto and later modify its kwargs.
> 
> It would also make the proposed feature to allow multiple kw args
> expansions in Python 3.5 easy to write by having
> 
> f(**a, **b, **c)
> be equivalent to
> f(**(a | b | c))
> 
> -Alexander
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

From ncoghlan at gmail.com  Tue Jul 29 00:20:53 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jul 2014 08:20:53 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <6122DCE6-D84A-4B05-AB02-C1FD3CED82A4@ryanhiebert.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando>
 <6122DCE6-D84A-4B05-AB02-C1FD3CED82A4@ryanhiebert.com>
Message-ID: <CADiSq7cHWS7zOtRpXVOzgr6cWmwQvn7VMDf8kz4SCiSNHxVwtg@mail.gmail.com>

On 29 Jul 2014 04:40, "Ryan Hiebert" <ryan at ryanhiebert.com> wrote:
>
>
> > On Jul 28, 2014, at 11:04 AM, Steven D'Aprano <steve at pearwood.info>
wrote:
> >
> > I'm still inclined to prefer allowing update() to accept multiple
> > arguments:
> >
> > a.update(b, c, d)
> >
> > rather than a += b + c + d

Note that if update() was changed to accept multiple args, the dict()
constructor could similarly be updated.

Then:

    x = dict(a)
    x.update(b)
    x.update(c)
    x.update(d)

Would become:

    x = dict(a, b, c, d)

Aside from the general "What's the use case that wouldn't be better served
by a larger scale refactoring?" concern, my main issue with that approach
would be the asymmetry it would introduce with the set constructor (which
disallows multiple arguments to avoid ambiguity in the single argument
case).

But really, I'm not seeing a compelling argument for why this needs to be a
builtin. If someone is merging dicts often enough to care, they can already
write a function to do the dict copy-and-update as a single operation. What
makes this more special than the multitude of other three line functions in
the world?

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140729/65e08ae6/attachment-0001.html>

From python at 2sn.net  Tue Jul 29 00:20:53 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 08:20:53 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
Message-ID: <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>

> https://mail.python.org/pipermail/python-ideas/2013-June/021140.html.

I see, this is a very extended thread google did not show me when I
started this one, and many good points were made there.
So, my apologies I restarted this w/o reference; this discussion does
seem to resurface, however.

It seems it would be valuable to parallel the behaviour of operators
already in place for collections. Counter:

A + B adds values (calls __add__ or __iadd__ function of values,
likely __iadd__ for values of A)
A |= B does  A.update(B)
etc.

-Alexander

On 29 July 2014 05:21, Paul Moore <p.f.moore at gmail.com> wrote:
> On 28 July 2014 19:58, Nathan Schneider <nathan at cmu.edu> wrote:
>> Here are two threads that had some discussion of this:
>> https://mail.python.org/pipermail/python-ideas/2011-December/013227.html
>
> This doesn't seem to have a use case, other than "it would be nice".
>
>> https://mail.python.org/pipermail/python-ideas/2013-June/021140.html.
>
> This can be handled using ChainMap, if I understand the proposal.
>
>> Seems like a useful feature if there could be a clean way to spell it.
>
> I've yet to see any real-world situation when I've wanted "dictionary
> addition" (with any of the various semantics proposed here) and I've
> never encountered a situation where using d1.update(d2) was
> sufficiently awkward that having an operator seemed reasonable.
>
> In all honesty, I'd suggest that code which looks bad enough to
> warrant even considering this feature is probably badly in need of
> refactoring, at which point the problem will likely go away.
>
> Paul

From python at 2sn.net  Tue Jul 29 00:21:09 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 08:21:09 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
Message-ID: <CAN3CYHz8+zChss0R=_uS1XsYka8_R-z-f62=2hORAJ59Qc7SAQ@mail.gmail.com>

> When the underlying dicts and desired combined dict are all going to be used immutably, ChainMap is the perfect answer. (Better than an "updated" function for performance if nothing else.) And usually, when you're looking for a non-mutating combine-dicts operation, that will be what you want.
>
> But usually isn't always. If you want a snapshot of the combination of mutable dicts, ChainMap is wrong. If you want to be able to mutate the result, ChainMap is wrong.
>
> All that being said, I'm not sure these use cases are sufficiently common to warrant adding an operator--especially since there are other just-as-(un)common use cases it wouldn't solve. (For example, what I often want is a mutable "overlay" ChainMap, which doesn't need to copy the entire potentially-gigantic source dicts. I wouldn't expect an operator for that, even though I need it far more often than I need a mutable snapshot copy.)
>
> And of course, as you say, real-life use cases would be a lot more compelling than theoretical/abstract ones.

For many applications you may not care one way or the other, only for
some you do, and only then you need to know the details of operation.

My point is to make the dict() data structure more easy to use for
most users and use cases.  Especially novices.
This is what adds power to the language.  Not that you can do things
(Turing machines can) but that you can do them easily and naturally.

From ncoghlan at gmail.com  Tue Jul 29 00:27:06 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jul 2014 08:27:06 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
Message-ID: <CADiSq7do_4wcVuTBw0dXXn1_Rt+W0amm5-nk6152_mvi3HPEhQ@mail.gmail.com>

On 29 Jul 2014 08:16, "Alexander Heger" <python at 2sn.net> wrote:
>
> > In all honesty, I'd suggest that code which looks bad enough to
> > warrant even considering this feature is probably badly in need of
> > refactoring, at which point the problem will likely go away.
>
> I often want to call functions with added (or removed, replaced)
> keywords from the call.
>
> args0 = dict(...)
> args1 = dict(...)
>
> def f(**kwargs):
>     g(**(arg0 | kwargs | args1))
>
> currently I have to write
>
> args = dict(...)
> def f(**kwargs):
>     temp_args = dict(dic0)
>     temp_args.update(kwargs)
>     temp_args.update(dic1)
>     g(**temp_args)

The first part of this one of the use cases for functools.partial(), so it
isn't a compelling argument for easy dict merging. The above is largely an
awkward way of spelling:

    import functools
    f = functools.partial(g, **...)

The one difference is to also silently *override* some of the explicitly
passed arguments, but that part's downright user hostile and shouldn't be
encouraged.

Regards,
Nick.

>
> It would also make the proposed feature to allow multiple kw args
> expansions in Python 3.5 easy to write by having
>
> f(**a, **b, **c)
> be equivalent to
> f(**(a | b | c))
>
> -Alexander
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140729/eaa6a397/attachment.html>

From ncoghlan at gmail.com  Tue Jul 29 00:40:02 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jul 2014 08:40:02 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHz8+zChss0R=_uS1XsYka8_R-z-f62=2hORAJ59Qc7SAQ@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
 <CAN3CYHz8+zChss0R=_uS1XsYka8_R-z-f62=2hORAJ59Qc7SAQ@mail.gmail.com>
Message-ID: <CADiSq7cDoSvz-m=xBz0dszvQgH1H9uwNFSrMZ3-jqHNUscN1YQ@mail.gmail.com>

On 29 Jul 2014 08:22, "Alexander Heger" <python at 2sn.net> wrote:

>
> My point is to make the dict() data structure more easy to use for
> most users and use cases.  Especially novices.
> This is what adds power to the language.  Not that you can do things
> (Turing machines can) but that you can do them easily and naturally.

But why is dict merging into a *new* dict something that needs to be done
as a single expression? What's the problem with spelling out "to merge two
dicts into a new, first make a dict, then merge in the other one":

    x = dict(a)
    x.update(b)

That's the real competitor here, not the more cryptic "x = dict(a, **b)"

You can even use it as an example of factoring out a helper function:

    def copy_and_update(a, *args):
        x = dict(a)
        for arg in args:
           x.update(arg)
        return x

My personal experience suggests that's a rare enough use case that it's
fine to leave it as a trivial helper function that people can write if they
need it. The teaching example isn't compelling, since in the teaching case,
spelling out the steps is going to be necessary anyway to explain what the
function or method call is actually doing.

Cheers,
Nick.

> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140729/446a86ff/attachment.html>

From python at 2sn.net  Tue Jul 29 00:35:55 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 08:35:55 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <999F9DAF-E27A-46FE-A444-C2713A18BBB6@yahoo.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <CAP7+vJK2thUJRC64NMFpNRLFs7+ddqpBLZN=g41FxrK66+k7+A@mail.gmail.com>
 <CAN3CYHxUGLaTNPCC-dgeDpQmkNvfqUbwSHvDmWPL+N0nNo1gSA@mail.gmail.com>
 <999F9DAF-E27A-46FE-A444-C2713A18BBB6@yahoo.com>
Message-ID: <CAN3CYHxAMbjiJmzo8AbGAcwbcYgRLXn_EQyt5j4k8O6tBHBBRA@mail.gmail.com>

> The difference is that with sets, it (at least conceptually) doesn't matter whether you keep elements from s or t when they collide, because by definition they only collide if they're equal, but with dicts, it very much matters whether you keep items from s or t when their keys collide, because the corresponding values are generally _not_ equal. So this is a false analogy; the same problem raised in the first three replies on this thread still needs to be answered: Is it obvious that the values from b should overwrite the values from a (assuming that's the rule you're suggesting, since you didn't specify; translate to the appropriate question if you want a different rule) in all real-life use cases? If not, is this so useful that the benefits in some uses outweigh the almost certain confusion in others? Without a compelling "yes" to one of those two questions, we're still at square one here; switching from + to | and making an analogy with sets doesn't help.
>
>> ... and accordingly
>>
>> D = A | B | C
>>
>> Maybe this operator is better as this equivalence is already being
>> used (for sets).  Accordingly "union(A,B)" could do a merge operation
>> and return the new dict().
>
> Wouldn't you expect a top-level union function to take any two iterables and return the union of them as a set (especially given that set.union accepts any iterable for its non-self argument)? A.union(B) seems a lot better than union(A, B).
>
> Then again, A.updated(B) or updated?A, B) might be even better, as someone suggested, because the parallel between update and updated (and between e.g. sort and sorted) is not at all problematic.

yes, one does have to deal with collisions and spell out a clear rule:
same behaviour as update().

I was less uneasy about the | operator
1) it is already used the same way for collections.Counter [this is a
quite strong constraint]
2) in shells it is used as "pipe" implying directionality - order matters

yes, you are wondering whether the order should be this or that; you
just *define* what it is, same as you do for subtraction.

Another way of looking at it is to say that even in sets you take the
second, but because they are identical it does not matter ;-)

-Alexander

From python at 2sn.net  Tue Jul 29 00:48:49 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 08:48:49 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CADiSq7cHWS7zOtRpXVOzgr6cWmwQvn7VMDf8kz4SCiSNHxVwtg@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando>
 <6122DCE6-D84A-4B05-AB02-C1FD3CED82A4@ryanhiebert.com>
 <CADiSq7cHWS7zOtRpXVOzgr6cWmwQvn7VMDf8kz4SCiSNHxVwtg@mail.gmail.com>
Message-ID: <CAN3CYHxfv+nmU+oaasKB0W3CB3zCkD+eeTBHTdcR29Yxbv-nmw@mail.gmail.com>

> But really, I'm not seeing a compelling argument for why this needs to be a
> builtin. If someone is merging dicts often enough to care, they can already
> write a function to do the dict copy-and-update as a single operation. What
> makes this more special than the multitude of other three line functions in
> the world?

We all have too many of those.

This would not add too much complexity to the language and overcome
some awkward constructs needed otherwise.
Currently dictionaries are not really as easy to use as your everyday
data type as it should be lacking such operators.

-Alexander

From python at 2sn.net  Tue Jul 29 01:04:42 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 09:04:42 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
Message-ID: <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>

>> args = dict(...)
>> def f(**kwargs):
>>    temp_args = dict(dic0)
>>    temp_args.update(kwargs)
>>    temp_args.update(dic1)
>>    g(**temp_args)
>
> No, you just have to write a one-liner with ChainMap, except in the (very rare) case where you're expecting g to hold onto and later modify its kwargs.

yes, this (modify) is what I do.

In any case, it would still be

g(**collections.ChainMap(dict1, kwargs, dic0))

In either case a new dict is created and passed to g as kwargs.

It's not pretty, but it does work.  Thanks.

so the general case

D = A | B | C

becomes

D = dict(collections.ChainMap(C, B, A))

(someone may suggest dict could have a "chain" constructor class
method D = dict.chain(C, B, A))

From python at 2sn.net  Tue Jul 29 01:18:37 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 09:18:37 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CADiSq7do_4wcVuTBw0dXXn1_Rt+W0amm5-nk6152_mvi3HPEhQ@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CADiSq7do_4wcVuTBw0dXXn1_Rt+W0amm5-nk6152_mvi3HPEhQ@mail.gmail.com>
Message-ID: <CAN3CYHy3JGZR7Bc8_0F35MuAADV4cOLHBXHkop+XuLMDD+QLqw@mail.gmail.com>

>> args0 = dict(...)
>> args1 = dict(...)
>>
>> def f(**kwargs):
>>     g(**(arg0 | kwargs | args1))
>>
>> currently I have to write
>>
>> args = dict(...)
>> def f(**kwargs):
>>     temp_args = dict(dic0)
>>     temp_args.update(kwargs)
>>     temp_args.update(dic1)
>>     g(**temp_args)
>
> The first part of this one of the use cases for functools.partial(), so it
> isn't a compelling argument for easy dict merging. The above is largely an
> awkward way of spelling:
>
>     import functools
>     f = functools.partial(g, **...)
>
> The one difference is to also silently *override* some of the explicitly
> passed arguments, but that part's downright user hostile and shouldn't be
> encouraged.

yes, poor example due to briefly.  ;-)

In my case f would actually do something with the values of kwargs
before calling g, and args1 many not be static outside f.
(hence partial is not a solution for the full application)

def f(**kwargs):
    # do something with kwrags, create dict0 and dict1 using kwargs
    temp_args = dict(dict0)
    temp_args.update(kwargs)
    temp_args.update(dict1)
    g(**temp_args)
    # more uses of dict0

which could be

def f(**kwargs):
    # do something with kwargs, create dict0 and dict1 using kwargs
    g(**collections.ChainMap(dict1, kwargs, dict0))
    # more uses of dict0

Maybe good enough for that case, like with + or |, one still need to
know/learn the lookup order for key replacement, and it is sort of
bulky.

 -Alexander

From python at 2sn.net  Tue Jul 29 01:45:06 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 09:45:06 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CADiSq7cDoSvz-m=xBz0dszvQgH1H9uwNFSrMZ3-jqHNUscN1YQ@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
 <CAN3CYHz8+zChss0R=_uS1XsYka8_R-z-f62=2hORAJ59Qc7SAQ@mail.gmail.com>
 <CADiSq7cDoSvz-m=xBz0dszvQgH1H9uwNFSrMZ3-jqHNUscN1YQ@mail.gmail.com>
Message-ID: <CAN3CYHxuiOBqPOxERijhe9xKOPXw93TPtQaBSQkFsaZhTaCT0w@mail.gmail.com>

> But why is dict merging into a *new* dict something that needs to be done as
> a single expression? What's the problem with spelling out "to merge two
> dicts into a new, first make a dict, then merge in the other one":
>
>     x = dict(a)
>     x.update(b)
>
> That's the real competitor here, not the more cryptic "x = dict(a, **b)"
>
> You can even use it as an example of factoring out a helper function:
>
>     def copy_and_update(a, *args):
>         x = dict(a)
>         for arg in args:
>            x.update(arg)
>         return x
>
> My personal experience suggests that's a rare enough use case that it's fine
> to leave it as a trivial helper function that people can write if they need
> it. The teaching example isn't compelling, since in the teaching case,
> spelling out the steps is going to be necessary anyway to explain what the
> function or method call is actually doing.

it is more about having easy operations for people who learn Python
for the sake of using it (besides, I teach science students not
computer science students).

The point is that it could be done in one operation.  It seems like
asking people to write

a = 2 + 3

as

a = int(2)
a.add(3)

Turing machine vs modern programming language.

It does already work for Counters.

The discussion seems to go such that because people can't agree
whether the first or second occurrence of keys takes precedence, or
what operator to use (already decided by the design of Counter) it is
not done at all.  To be fair, I am not a core Python programmer and am
asking others to implement this - or maybe even agree it would be
useful -, maybe pushing too much where just an idea should be floated.

-Alexander

From stephen at xemacs.org  Tue Jul 29 02:16:08 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 29 Jul 2014 09:16:08 +0900
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
Message-ID: <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>

Alexander Heger writes:

 > It seems it would be valuable to parallel the behaviour of operators
 > already in place for collections.

Mappings aren't collections.  In set theory, of course, they are
represented as *appropriately restricted* collections, but the meaning
of "+" as applied to mappings in mathematics varies.  For functions on
the same domain, there's usually an element-wise meaning that's
applied.  For functions on different domains, I've seen it used to
mean "apply the appropriate function on the disjoint union of the
domains".

I don't think there's an obvious winner in the competition among the
various meanings.




From python at 2sn.net  Tue Jul 29 02:38:38 2014
From: python at 2sn.net (Alexander Heger)
Date: Tue, 29 Jul 2014 10:38:38 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
 <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAN3CYHyhU8iT4FS+sAXtysaThGs+PpiQ7xH0ZCU0dSS0ctDz5g@mail.gmail.com>

>  > It seems it would be valuable to parallel the behaviour of operators
>  > already in place for collections.
>
> Mappings aren't collections.  In set theory, of course, they are
> represented as *appropriately restricted* collections, but the meaning
> of "+" as applied to mappings in mathematics varies.  For functions on
> the same domain, there's usually an element-wise meaning that's
> applied.  For functions on different domains, I've seen it used to
> mean "apply the appropriate function on the disjoint union of the
> domains".
>
> I don't think there's an obvious winner in the competition among the
> various meanings.

I mistyped.  It should have read " ... the behaviour in place for
collections.Counter"

It does define "+" and "|" operations.

-Alexander

From tjreedy at udel.edu  Tue Jul 29 03:39:28 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 28 Jul 2014 21:39:28 -0400
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
 <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <lr6u0i$fq0$1@ger.gmane.org>

On 7/28/2014 8:16 PM, Stephen J. Turnbull wrote:
> Alexander Heger writes:
>
>   > It seems it would be valuable to parallel the behaviour of operators
>   > already in place for collections.
>
> Mappings aren't collections.  In set theory, of course, they are
> represented as *appropriately restricted* collections, but the meaning
> of "+" as applied to mappings in mathematics varies.  For functions on
> the same domain, there's usually an element-wise meaning that's
> applied.

This assumes the same range set (of addable items) also.  If Python were 
to add d1 + d2 and d1 += d2, I think we should use this existing and 
most common definition and add values. The use cases are keyed 
collections of things that can be added, which are pretty common.
Then dict addition would have the properties of the value addition.

Example: Let sales be a mapping from salesperson to total sales (since 
whenever). Let sales_today be a mapping from saleperson to today's 
sales. Then sales = sales + sales_today, or sales += sales_today. I 
could, of course, do this today with class Sales(dict): with __add__, 
__iadd__, and probably other app-specific methods.

The issue is that there are two ways to update a mapping with an update 
mapping: replace values and combine values. Addition combines, so to me, 
dict addition, if defined, should combine.

>  For functions on different domains, I've seen it used to
> mean "apply the appropriate function on the disjoint union of the
> domains".

According to https://en.wikipedia.org/wiki/Disjoint_union, d_u has at 
least two meaning.

-- 
Terry Jan Reedy


From abarnert at yahoo.com  Tue Jul 29 04:09:31 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Jul 2014 19:09:31 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHxuiOBqPOxERijhe9xKOPXw93TPtQaBSQkFsaZhTaCT0w@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <60A434E0-C0DA-467B-B13A-BA6986C5B7B1@yahoo.com>
 <CAN3CYHz8+zChss0R=_uS1XsYka8_R-z-f62=2hORAJ59Qc7SAQ@mail.gmail.com>
 <CADiSq7cDoSvz-m=xBz0dszvQgH1H9uwNFSrMZ3-jqHNUscN1YQ@mail.gmail.com>
 <CAN3CYHxuiOBqPOxERijhe9xKOPXw93TPtQaBSQkFsaZhTaCT0w@mail.gmail.com>
Message-ID: <B3981CB2-6634-4CD0-B47E-7CF65ADCC84A@yahoo.com>

On Jul 28, 2014, at 16:45, Alexander Heger <python at 2sn.net> wrote:

> The discussion seems to go such that because people can't agree
> whether the first or second occurrence of keys takes precedence, or
> what operator to use (already decided by the design of Counter) it is
> not done at all.  

Well, yeah, that happens a lot. An good idea that can't be turned into a concrete design that fits the language and makes everyone happy doesn't get added, unless it's so ridiculously compelling that nobody can imagine living without it.

But that's not necessarily a bad thing--it's why Python is a relatively small and highly consistent language, which I think is a big part of why Python is so readable and teachable.

Anyway, I think you're on to something with your idea of adding an updated or union or whatever function/method whose semantics are obvious, and then mapping the operators to that method and update. I can definitely buy that a.updated(b) or union(a, b) favors values from b for exactly the same reason a.update(b) does (although as I mentioned I have other problems with a union function).

Meanwhile, if you have use cases for which ChainMap is not appropriate, you might want to write a dict subclass that you can use in your code or in teaching students or whatever, so you can amass some concrete use cases and show how much cleaner it is than the existing alternatives.

> To be fair, I am not a core Python programmer and am
> asking others to implement this - or maybe even agree it would be
> useful -, maybe pushing too much where just an idea should be floated.

If it helps, if you can get everyone to agree on this, except that none of the core devs wants to do the work, I'll volunteer to write the C code (after I finish my io patch and my abc patch...), so you only have to add the test cases (which are easy Python code; the only hard part is deciding what to test) and the docs.

From jeanpierreda at gmail.com  Tue Jul 29 04:46:14 2014
From: jeanpierreda at gmail.com (Devin Jeanpierre)
Date: Mon, 28 Jul 2014 19:46:14 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
 <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CABicbJLfGa-kn-nLOHt3TdYmzA7p63Et+t-GnpFC0B69EjaanQ@mail.gmail.com>

On Mon, Jul 28, 2014 at 5:16 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Alexander Heger writes:
>
>  > It seems it would be valuable to parallel the behaviour of operators
>  > already in place for collections.
>
> Mappings aren't collections.  In set theory, of course, they are
> represented as *appropriately restricted* collections, but the meaning
> of "+" as applied to mappings in mathematics varies.  For functions on
> the same domain, there's usually an element-wise meaning that's
> applied.  For functions on different domains, I've seen it used to
> mean "apply the appropriate function on the disjoint union of the
> domains".
>
> I don't think there's an obvious winner in the competition among the
> various meanings.

The former meaning requires that the member types support addition, so
it's the obvious loser -- dicts can contain any kind of value, not
just addable ones. Adding a method that only works if the values
satisfy certain extra optional constraints is rare in Python, and
needs justification over the alternatives.

The second suggestion works just fine, you just need to figure out
what to do with the intersection since we won't have disjoint domains.
The obvious suggestion is to pick an ordering, just like the update
method does.

For another angle: the algorithms course I took in university
introduced dictionaries as sets where the members of the set are
tagged with values. This makes set-like operators obvious in meaning,
with the only question being, again, what to do with the tags during
collisions.  (FWIW, the meaning of + as applied to sets is generally
union -- but Python's set type uses | instead, presumably for analogy
with ints when they are treated as a set of small integers).


That said, the only reason I can think of to support this new stuff is
to stop dict(x, **y) from being such an attractive nuisance.

-- Devin

From stephen at xemacs.org  Tue Jul 29 05:13:15 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 29 Jul 2014 12:13:15 +0900
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHyhU8iT4FS+sAXtysaThGs+PpiQ7xH0ZCU0dSS0ctDz5g@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
 <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CAN3CYHyhU8iT4FS+sAXtysaThGs+PpiQ7xH0ZCU0dSS0ctDz5g@mail.gmail.com>
Message-ID: <87silkn9h0.fsf@uwakimon.sk.tsukuba.ac.jp>

Alexander Heger writes:

 > I mistyped.  It should have read " ... the behaviour in place for
 > collections.Counter"

But there *is* a *the* (ie, unique) "additive" behavior for Counter.
(At least, I find it reasonable to think so.)  What you're missing is
that there is no such agreement on what it means to add dictionaries.

True, you can "just pick one".  Python doesn't much like to do that,
though.  The problem is that on discovering that dictionaries can be
added, *everybody* is going to think that their personal application
is the obvious one to implement as "+" and/or "+=".  Some of them are
going to be wrong and write buggy code as a consequence.



From steve at pearwood.info  Tue Jul 29 05:34:12 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jul 2014 13:34:12 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <lr60in$bn5$1@ger.gmane.org>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
Message-ID: <20140729033411.GJ9112@ando>

On Mon, Jul 28, 2014 at 12:17:10PM -0500, Ron Adam wrote:
> 
> On 07/28/2014 11:04 AM, Steven D'Aprano wrote:
[...]

> >new_dict = a + b + c + d
> >
> >Pros: + is short to type; subclasses can control the type of new_dict.
> >Cons: dict addition isn't obvious.
> 
> I think it's more obvious.  It only needs __add__ and __iadd__ methods to 
> make it consistent with the list type.

What I meant was that it wasn't obvious what dict1 + dict2 should do, 
not whether or not the __add__ method exists.

> I think this added consistency between lists and dicts would be useful.

Lists and dicts aren't the same kind of object. I'm not sure it is 
helpful to force them to be consistent. Should list grow an update() 
method to make it consistent with dicts? How about setdefault()?

As for being useful, useful for what? Useful how often? I'm sure that 
one could take any piece of code, no matter how obscure, and say it is 
useful *somewhere* :-) but the question is whether it is useful enough 
to be part of the language.

I was wrong to earlier dismiss the OP's usecase for dict addition by 
suggestion dict(a, **b). Such a thing only works if all the keys of b 
are valid identifiers. But that doesn't mean that just because my 
shoot-from-the-hip response missed the target that we should conclude 
that dict addition solves an important problem or that + is the correct 
way to spell it.

I'm still dubious that it's needed, but if it were, this is what I 
would prefer to see:

* should be a Mapping method, not a top-level function;

* should accept anything the dict constructor accepts, mappings or 
  lists of (key,value) pairs as well as **kwargs;

* my prefered name for this is now "merged" rather than "updated";

* it should return a new mapping, not modify in-place;

* when called from a class, it should behave like a class method: 
  MyMapping.merged(a, b, c) should return an instance of MyMapping;

* but when called from an instance, it should behave like an instance
  method, with self included in the chain of mappings to merge:
  a.merged(b, c) rather than a.merged(a, b, c).


I have a descriptor type which implements the behaviour from the last 
two bullet points, so from a technical standpoint it's not hard to 
implement this. But I can imagine a lot of push-back from the more 
conservative developers about adding a *fourth* method type (even if it 
is private) to the Python builtins, so it would take a really compelling 
use-case to justify adding a new method type and a new dict method.

(Personally, I think this hybrid class/instance method type is far more 
useful than staticmethod, since I've actually used it in production 
code, but staticmethod isn't going away.)


-- 
Steven

From stephen at xemacs.org  Tue Jul 29 07:15:44 2014
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 29 Jul 2014 14:15:44 +0900
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <lr6u0i$fq0$1@ger.gmane.org>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
 <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
 <lr6u0i$fq0$1@ger.gmane.org>
Message-ID: <87ppgon3sv.fsf@uwakimon.sk.tsukuba.ac.jp>

Terry Reedy writes:


 > This assumes the same range set (of addable items) also.  If Python were 
 > to add d1 + d2 and d1 += d2, I think we should use this existing and 
 > most common definition and add values.

IMHO[1] that's way too special for the generic mapping types.  If one
wants such operations, she should define NumericValuedMapping and
StringValuedMapping etc classes for each additive set of values.

 > >  For functions on different domains, I've seen it used to
 > > mean "apply the appropriate function on the disjoint union of the
 > > domains".
 > 
 > According to https://en.wikipedia.org/wiki/Disjoint_union, d_u has at 
 > least two meaning.

Either meaning will do here, with the distinction that the set-
theoretic meaning (which I intended) applies to any two functions,
while the alternate meaning imposes a restriction on the functions
that can be added (and therefore is inappropriate for this discussion
IMHO).

Footnotes: 
[1]  I mean the "H", I'm no authority.




From abarnert at yahoo.com  Tue Jul 29 08:15:44 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Jul 2014 23:15:44 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140729033411.GJ9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
 <20140729033411.GJ9112@ando>
Message-ID: <1406614544.48360.YahooMailNeo@web181002.mail.ne1.yahoo.com>

On Monday, July 28, 2014 8:34 PM, Steven D'Aprano <steve at pearwood.info> wrote:



[snip]

> * when called from a class, it should behave like a class method: 
> ? MyMapping.merged(a, b, c) should return an instance of MyMapping;
> 
> * but when called from an instance, it should behave like an instance
> ? method, with self included in the chain of mappings to merge:
> ? a.merged(b, c) rather than a.merged(a, b, c).
> 
> 
> I have a descriptor type which implements the behaviour from the last 
> two bullet points, so from a technical standpoint it's not hard to 
> implement this. But I can imagine a lot of push-back from the more 
> conservative developers about adding a *fourth* method type (even if it 
> is private) to the Python builtins, so it would take a really compelling 
> use-case to justify adding a new method type and a new dict method.
> 
> (Personally, I think this hybrid class/instance method type is far more 
> useful than staticmethod, since I've actually used it in production 
> code, but staticmethod isn't going away.)


How is this different from a plain-old (builtin or normal) method?

>>> class Spam:
... ? ? def eggs(self, a):
... ? ? ? ? print(self, a)
>>> spam = Spam()
>>> Spam.eggs(spam, 2)
<__main__.Spam object at 0x106377080> 2
>>> spam.eggs(2)
<__main__.Spam object at 0x106377080> 2
>>> Spam.eggs
<function __main__.eggs>
>>> spam.eggs
<bound method Spam.eggs of <__main__.Spam object at 0x106377080>>
>>> s = {1, 2, 3}
>>> set.union(s, [4])
{1, 2, 3, 4}
>>> s.union([4])
{1, 2, 3, 4}
>>> set.union
<method 'union' of 'set' objects>
>>> s.union

<function union>

This is the way methods have always worked (although the details of how they worked under the covers changed in 3.0, and before that when descriptors and new-style classes were added).

From p.f.moore at gmail.com  Tue Jul 29 08:22:34 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 29 Jul 2014 07:22:34 +0100
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
Message-ID: <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>

On 29 July 2014 00:04, Alexander Heger <python at 2sn.net> wrote:
> D = A | B | C
>
> becomes
>
> D = dict(collections.ChainMap(C, B, A))

This immediately explains the key problem with this proposal. It never
even *occurred* to me that anyone would expect C to take priority over
A in the operator form. But the ChainMap form makes it immediately
clear to me that this is the intent.

An operator form will be nothing but a maintenance nightmare and a
source of bugs. Thanks for making this obvious :-)

-1.

Paul

From ncoghlan at gmail.com  Tue Jul 29 09:46:56 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jul 2014 17:46:56 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <87silkn9h0.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzP=ZkhzDoMMdqqsKo3FztbQ-R8D=grhtJbgUpn9eCkyA@mail.gmail.com>
 <87wqaxm33r.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CAN3CYHyhU8iT4FS+sAXtysaThGs+PpiQ7xH0ZCU0dSS0ctDz5g@mail.gmail.com>
 <87silkn9h0.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CADiSq7d+ZSnubvHJaHtdWsdKCe7wBfei0faKDZ=cRux00izF6w@mail.gmail.com>

On 29 July 2014 13:13, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Alexander Heger writes:
>
>  > I mistyped.  It should have read " ... the behaviour in place for
>  > collections.Counter"
>
> But there *is* a *the* (ie, unique) "additive" behavior for Counter.
> (At least, I find it reasonable to think so.)  What you're missing is
> that there is no such agreement on what it means to add dictionaries.
>
> True, you can "just pick one".  Python doesn't much like to do that,
> though.  The problem is that on discovering that dictionaries can be
> added, *everybody* is going to think that their personal application
> is the obvious one to implement as "+" and/or "+=".  Some of them are
> going to be wrong and write buggy code as a consequence.

In fact, the existence of collections.Counter.__add__ is an argument
*against* introducing dict.__add__ with different semantics:

    >>> issubclass(collections.Counter, dict)
    True

So, if someone *wants* a dict with "addable" semantics, they can
already use collections.Counter. While some of its methods really only
work with integers, the addition part is actually usable with
arbitrary addable types.

If set-like semantics were added to dict, it would conflict with the
existing element-wise semantics of Counter.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From toddrjen at gmail.com  Tue Jul 29 10:05:10 2014
From: toddrjen at gmail.com (Todd)
Date: Tue, 29 Jul 2014 10:05:10 +0200
Subject: [Python-ideas] Accept list in os.path.join
Message-ID: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>

Currently, os.path.join joins strings specified in its arguments, with one
string per argument.

On its own, that is not a problem.  However, it is inconsistent with
str.join, which accepts only a list of strings.  This inconsistency can
lead to some confusion, since these operations that have similar names and
carry out similar tasks have fundamentally different syntax.

My suggestion is to allow os.path.join to accept a list of strings in
addition to existing one string per argument.  This would allow it to be
used in a manner consistent with str.join, while still allowing existing
code to run as expected.

Currently, when os.path.join is given a single list, it returns that list
exactly.  This is undocumented behavior (I am surprised it is not an
exception).  It would mean, however, this change would break code that
wants a list if given a list but wants to join if given multiple strings.
This is conceivable, but outside of catching the sorts of errors this
change would prevent, I would be surprised if it is a common use-case.

In the case where multiple arguments are used and one or more of those
arguments are a list, I think the best solution would be to raise an
exception, since this would avoid corner cases and be less likely to
silently propagate bugs.  However, I am not set on that, so if others
prefer it join all the strings in all the lists that would be okay too.

So the syntax would be like this (on POSIX as an example):

>>> os.path.join('test1', 'test2', 'test3')  # current syntax
'test1/test2/test3'
>>> os.path.join(['test1', 'test2', 'test3']) # new syntax
'test1/test2/test3'
>>> os.path.join(['test1', 'test2'], 'test3')
Exception
>>> os.path.join(['test1'], 'test2', 'test3')
Exception
>>> os.path.join(['test1', 'test2'], ['test3'])
Exception
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140729/2c43d1d5/attachment.html>

From abarnert at yahoo.com  Tue Jul 29 11:12:28 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 29 Jul 2014 02:12:28 -0700
Subject: [Python-ideas] Accept list in os.path.join
In-Reply-To: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>
References: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>
Message-ID: <1406625148.37122.YahooMailNeo@web181004.mail.ne1.yahoo.com>

On Tuesday, July 29, 2014 1:14 AM, Todd <toddrjen at gmail.com> wrote:

>Currently, os.path.join joins strings specified in its arguments, with one string per argument. ?
>
>On its own, that is not a problem.? However, it is inconsistent with str.join, which accepts only a list of strings.

No, str.join accepts any iterable of strings?including a string, which is an iterable of single-character strings.?

Not that you often intentionally pass a string to it, but you do very often pass a generator expression or other iterator, so treating lists specially for os.path.join to make it work more like str.join would just increase confusion, not reduce it.

Also, I don't know of anything else in Python that has special treatment for lists vs. other iterables. There are a few cases that have special treatment for _tuples_ (notably str.__mod__), but I don't think anyone wants to expand those, and I don't think it would make you happy here, either.

From tjreedy at udel.edu  Tue Jul 29 11:30:25 2014
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Jul 2014 05:30:25 -0400
Subject: [Python-ideas] Accept list in os.path.join
In-Reply-To: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>
References: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>
Message-ID: <lr7pjj$3ca$1@ger.gmane.org>

On 7/29/2014 4:05 AM, Todd wrote:
> Currently, os.path.join joins strings specified in its arguments, with
> one string per argument.

One typically has 2 or possibly 3 path segements, never 1000.

> On its own, that is not a problem.  However, it is inconsistent with
> str.join, which accepts only a list of strings.  This inconsistency can
> lead to some confusion, since these operations that have similar names
> and carry out similar tasks have fundamentally different syntax.

I partly agree, but think about the actually use cases.

> My suggestion is to allow os.path.join to accept a list of strings in
> addition to existing one string per argument.

os.path.join(*iterable)


-- 
Terry Jan Reedy


From j.wielicki at sotecware.net  Tue Jul 29 13:37:54 2014
From: j.wielicki at sotecware.net (Jonas Wielicki)
Date: Tue, 29 Jul 2014 13:37:54 +0200
Subject: [Python-ideas] Accept list in os.path.join
In-Reply-To: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>
References: <CAFpSVp+oDXdfbwwugxSZS7F8nRhMfse=4=X4TC2z8D_yiS3oVQ@mail.gmail.com>
Message-ID: <53D78792.9080401@sotecware.net>

On 29.07.2014 10:05, Todd wrote:
> In the case where multiple arguments are used and one or more of those
> arguments are a list, I think the best solution would be to raise an
> exception, since this would avoid corner cases and be less likely to
> silently propagate bugs.  However, I am not set on that, so if others
> prefer it join all the strings in all the lists that would be okay too.

>From the implementation point of view, I have yet to see a duck-typing
way to distinguish a list (or any other iterable) of strings from a string.

regards,
jwi


From j.wielicki at sotecware.net  Tue Jul 29 13:56:57 2014
From: j.wielicki at sotecware.net (Jonas Wielicki)
Date: Tue, 29 Jul 2014 13:56:57 +0200
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
Message-ID: <53D78C09.20406@sotecware.net>

On 29.07.2014 08:22, Paul Moore wrote:
> On 29 July 2014 00:04, Alexander Heger <python at 2sn.net> wrote:
>> D = A | B | C
>>
>> becomes
>>
>> D = dict(collections.ChainMap(C, B, A))
> 
> This immediately explains the key problem with this proposal. It never
> even *occurred* to me that anyone would expect C to take priority over
> A in the operator form. But the ChainMap form makes it immediately
> clear to me that this is the intent.

FWIW, one could use an operator which inherently shows a direction: <<
and >>, for both directions respectively.

A = B >> C lets B take precedence, and A = B << C lets C take precedence.

regards,
jwi

p.s.: I?m not entirely sure what to think about my suggestion---I?d like
to hear opinions.

> 
> An operator form will be nothing but a maintenance nightmare and a
> source of bugs. Thanks for making this obvious :-)
> 
> -1.
> 
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 


From p.f.moore at gmail.com  Tue Jul 29 14:29:52 2014
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 29 Jul 2014 13:29:52 +0100
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <53D78C09.20406@sotecware.net>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
 <53D78C09.20406@sotecware.net>
Message-ID: <CACac1F8T8kyLOn4F5-Po=jkVJpPjVg77XcWhqtvUGT5TGf+OYQ@mail.gmail.com>

On 29 July 2014 12:56, Jonas Wielicki <j.wielicki at sotecware.net> wrote:
> FWIW, one could use an operator which inherently shows a direction: <<
> and >>, for both directions respectively.
>
> A = B >> C lets B take precedence, and A = B << C lets C take precedence.
>
> regards,
> jwi
>
> p.s.: I?m not entirely sure what to think about my suggestion---I?d like
> to hear opinions.

Personally, I don't like it much more than the symmetric-looking
operators. I get your point, but it feels like you're just patching
over a relatively small aspect of a fundamentally bad idea. But then
again as I've already said, I see no need for any of this, the
existing functionality seems fine to me.

Paul

From steve at pearwood.info  Tue Jul 29 15:35:56 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jul 2014 23:35:56 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <1406614544.48360.YahooMailNeo@web181002.mail.ne1.yahoo.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
 <20140729033411.GJ9112@ando>
 <1406614544.48360.YahooMailNeo@web181002.mail.ne1.yahoo.com>
Message-ID: <20140729133556.GK9112@ando>

On Mon, Jul 28, 2014 at 11:15:44PM -0700, Andrew Barnert wrote:
> On Monday, July 28, 2014 8:34 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> [snip]
> > * when called from a class, it should behave like a class method: 
> > ? MyMapping.merged(a, b, c) should return an instance of MyMapping;
> > 
> > * but when called from an instance, it should behave like an instance
> > ? method, with self included in the chain of mappings to merge:
> > ? a.merged(b, c) rather than a.merged(a, b, c).
> > 
> > 
> > I have a descriptor type which implements the behaviour from the last 
> > two bullet points, so from a technical standpoint it's not hard to 
> > implement this. But I can imagine a lot of push-back from the more 
> > conservative developers about adding a *fourth* method type (even if it 
> > is private) to the Python builtins, so it would take a really compelling 
> > use-case to justify adding a new method type and a new dict method.
> > 
> > (Personally, I think this hybrid class/instance method type is far more 
> > useful than staticmethod, since I've actually used it in production 
> > code, but staticmethod isn't going away.)
> 
> 
> How is this different from a plain-old (builtin or normal) method?

I see I failed to explain clearly, sorry about that.

With class methods, the method always receives the class as the first 
argument. Regardless of whether you write dict.fromkeys or 
{1:'a'}.fromkeys, the first argument is the class, dict.

With instance methods, the method receives the instance. If you call it 
from a class, the method is "unbound" and you are responsible for 
providing the "self" argument.

To me, this hypothetical merged() method sometimes feels like an 
alternative constructor, like fromkeys, and therefore best written as a 
class method, but sometimes like a regular method. Since it feels like a 
hybrid to me, I think a hybrid descriptor approach is best, but as I 
already said I can completely understand if conservative developers 
reject this idea.

In the hybrid form I'm referring to, the first argument provided is the 
class when called from the class, and the instance when called from an 
instance. Imagine it written in pure Python like this:

class dict:
    @hybridmethod
    def merged(this, *args, **kwargs):
        if isinstance(this, type):
            # Called from the class
            new = this()
        else:
            # Called from an instance.
            new = this.copy()
        for arg in args:
            new.update(arg)
        new.update(kwargs)
        return new


If merged is a class method, we can avoid having to worry about the 
case where your "a" mapping happens to be a list of (key,item) pairs:

    a.merged(b, c, d)  # Fails if a = [(key, item), ...]
    dict.merged(a, b, c, d)  # Always succeeds.

It also allows us to easily specify a different mapping type for the 
result:

    MyMapping.merged(a, b, c, d)

although some would argue this is just as clear:

     MyMapping().merged(a, b, c, d)

albeit perhaps not quite as efficient if MyMapping is expensive to 
instantiate. (You create an empty instance, only to throw it away 
again.)

On the other hand, there are use-cases where merged() best communicates 
the intent if it is a regular instance method. Consider:

    settings = application_defaults.merged(
                       global_settings, 
                       user_settings, 
                       commandline_settings)

seems more clear to me than:

    settings = dict.merged(
                       application_defaults,
                       global_settings, 
                       user_settings, 
                       commandline_settings)

especially in the case that application_defaults is a dict literal.

tl;dr It's not often that I can't decide whether a method ought to be a 
class method or an instance method, the decision is usually easy, but 
this is one of those times.


-- 
Steven

From j.wielicki at sotecware.net  Tue Jul 29 16:03:09 2014
From: j.wielicki at sotecware.net (Jonas Wielicki)
Date: Tue, 29 Jul 2014 16:03:09 +0200
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140729133556.GK9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
 <20140729033411.GJ9112@ando>
 <1406614544.48360.YahooMailNeo@web181002.mail.ne1.yahoo.com>
 <20140729133556.GK9112@ando>
Message-ID: <53D7A99D.6000006@sotecware.net>

On 29.07.2014 15:35, Steven D'Aprano wrote:
> On Mon, Jul 28, 2014 at 11:15:44PM -0700, Andrew Barnert wrote:
>> On Monday, July 28, 2014 8:34 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> [snip]
>>> * when called from a class, it should behave like a class method: 
>>>   MyMapping.merged(a, b, c) should return an instance of MyMapping;
>>>
>>> * but when called from an instance, it should behave like an instance
>>>   method, with self included in the chain of mappings to merge:
>>>   a.merged(b, c) rather than a.merged(a, b, c).
>>>
>>>
>>> I have a descriptor type which implements the behaviour from the last 
>>> two bullet points, so from a technical standpoint it's not hard to 
>>> implement this. But I can imagine a lot of push-back from the more 
>>> conservative developers about adding a *fourth* method type (even if it 
>>> is private) to the Python builtins, so it would take a really compelling 
>>> use-case to justify adding a new method type and a new dict method.
>>>
>>> (Personally, I think this hybrid class/instance method type is far more 
>>> useful than staticmethod, since I've actually used it in production 
>>> code, but staticmethod isn't going away.)
>>
>>
>> How is this different from a plain-old (builtin or normal) method?
> 
[snip]
> In the hybrid form I'm referring to, the first argument provided is the 
> class when called from the class, and the instance when called from an 
> instance. Imagine it written in pure Python like this:
> 
> class dict:
>     @hybridmethod
>     def merged(this, *args, **kwargs):
>         if isinstance(this, type):
>             # Called from the class
>             new = this()
>         else:
>             # Called from an instance.
>             new = this.copy()
>         for arg in args:
>             new.update(arg)
>         new.update(kwargs)
>         return new
[snip]

I really like the semantics of that. This allows for concise, and in my
opinion, clearly readable code.

Although I think maybe one should have two separate methods: the class
method being called ``merged`` and the instance method called
``merged_with``. I find

    result = somedict.merged(b, c)

somewhat less clear than

    result = somedict.merged_with(b, c)

regards,
jwi



From steve at pearwood.info  Tue Jul 29 16:36:05 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 30 Jul 2014 00:36:05 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
Message-ID: <20140729143605.GL9112@ando>

On Tue, Jul 29, 2014 at 07:22:34AM +0100, Paul Moore wrote:
> On 29 July 2014 00:04, Alexander Heger <python at 2sn.net> wrote:
> > D = A | B | C
> >
> > becomes
> >
> > D = dict(collections.ChainMap(C, B, A))
> 
> This immediately explains the key problem with this proposal. It never
> even *occurred* to me that anyone would expect C to take priority over
> A in the operator form. But the ChainMap form makes it immediately
> clear to me that this is the intent.

Hmmm. Funny you say that, because to me that is a major disadvantage of 
the ChainMap form: you have to write the arguments in reverse order.

Suppose that we want to start with a, then override it with b, then 
override that with c. Since a is the start (the root, the base), we 
start with a, something like this:

d = {}
d.update(a)
d.update(b)
d.update(c)

If update was chainable as it would be in Ruby:

d.update(a).update(b).update(c)

or even:

d.update(a, b, c)

This nicely leads us to d = a+b+c (assuming we agree that + meaning 
merge is the spelling we want).

The ChainMap, on the other hand, works backwards from this perspective: 
the last dict to be merged has to be given first:

ChainMap(c, b, a)


-- 
Steven

From nathan at cmu.edu  Tue Jul 29 16:50:02 2014
From: nathan at cmu.edu (Nathan Schneider)
Date: Tue, 29 Jul 2014 10:50:02 -0400
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <53D78C09.20406@sotecware.net>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
 <53D78C09.20406@sotecware.net>
Message-ID: <CADQLQrXUrz+p9w2S0A8hM28dT6VM1zMgXCBUoC7UdhjiYezfvw@mail.gmail.com>

On Tue, Jul 29, 2014 at 7:56 AM, Jonas Wielicki <j.wielicki at sotecware.net>
wrote:

>
> FWIW, one could use an operator which inherently shows a direction: <<
> and >>, for both directions respectively.
>
> A = B >> C lets B take precedence, and A = B << C lets C take precedence.
>

If there is to be an operator devoted specifically to this, I like << and
>> as unambiguous choices. Proof:
https://mail.python.org/pipermail/python-ideas/2011-December/013232.html :)

I am also partial to the {**A, **B} proposal in
http://legacy.python.org/dev/peps/pep-0448/.

Cheers,
Nathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140729/20fa774b/attachment.html>

From abarnert at yahoo.com  Tue Jul 29 21:29:29 2014
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 29 Jul 2014 12:29:29 -0700
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140729143605.GL9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
 <20140729143605.GL9112@ando>
Message-ID: <1406662169.52281.YahooMailNeo@web181005.mail.ne1.yahoo.com>

On Tuesday, July 29, 2014 7:36 AM, Steven D'Aprano <steve at pearwood.info> wrote:

>On Tue, Jul 29, 2014 at 07:22:34AM +0100, Paul Moore wrote:
>> On 29 July 2014 00:04, Alexander Heger <python at 2sn.net> wrote:
>> > D = A | B | C
>> >
>> > becomes
>> >
>> > D = dict(collections.ChainMap(C, B, A))
>> 
>> This immediately explains the key problem with this proposal. It never
>> even *occurred* to me that anyone would expect C to take priority over
>> A in the operator form. But the ChainMap form makes it immediately
>> clear to me that this is the intent.
>
>Hmmm. Funny you say that, because to me that is a major disadvantage of 
>the ChainMap form: you have to write the arguments in reverse order.


I think that's pretty much exactly his point:

To him, it's obvious that + should be in the order of ChainMap, and he can't even conceive of the possibility that you'd want it "backward".

To you, it's obvious that + should be the other way around, and you find it annoying that ChainMap is "backward".

Which seems to imply that any attempt at setting an order is going to not only seem backward, but possibly surprisingly so, to a subset of Python's users.

And this is the kind of thing can lead to subtle bugs. If a and b _almost never_ have duplicate keys, but very rarely do, you won't catch the problem until you think to test for it. And if one order or the other is so obvious to you that you didn't even imagine anyone would ever implement the opposite order, you probably won't think to write the test until you have a bug in the field?

From ron3200 at gmail.com  Wed Jul 30 01:12:16 2014
From: ron3200 at gmail.com (Ron Adam)
Date: Tue, 29 Jul 2014 18:12:16 -0500
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140729033411.GJ9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
 <20140729033411.GJ9112@ando>
Message-ID: <lr99oh$kpq$1@ger.gmane.org>



On 07/28/2014 10:34 PM, Steven D'Aprano wrote:
> On Mon, Jul 28, 2014 at 12:17:10PM -0500, Ron Adam wrote:
>> >
>> >On 07/28/2014 11:04 AM, Steven D'Aprano wrote:
> [...]
>
>>> > >new_dict = a + b + c + d
>>> > >
>>> > >Pros: + is short to type; subclasses can control the type of new_dict.
>>> > >Cons: dict addition isn't obvious.
>> >
>> >I think it's more obvious.  It only needs __add__ and __iadd__ methods to
>> >make it consistent with the list type.
> What I meant was that it wasn't obvious what dict1 + dict2 should do,
> not whether or not the __add__ method exists.

What else could it do besides return a new copy of dict1 updated with dict2 
contents?  It's an unordered container, so it wouldn't append, and the 
duplicate keys would be resolved based on the order of evaluation.  I don't 
see any problem with that.  I also don't know of any other obvious way to 
combine two dictionaries.

The argument against it, may simply be that it's a feature by design, to 
have dictionaries unique enough so that code which handles them is clearly 
specific to them.  I'm not sure how strong that logic is though.


>> >I think this added consistency between lists and dicts would be useful.
> Lists and dicts aren't the same kind of object. I'm not sure it is
> helpful to force them to be consistent. Should list grow an update()
> method to make it consistent with dicts? How about setdefault()?

Well, here is how they currently compare.

 >>> set(dir(dict)).intersection(set(dir(list)))
{'copy', '__hash__', '__format__', '__sizeof__', '__ge__', '__delitem__', 
'__getitem__', '__dir__', 'pop', '__gt__', '__repr__', '__init__', 
'__subclasshook__', '__eq__', 'clear', '__len__', '__str__', '__le__', 
'__new__', '__reduce_ex__', '__doc__', '__getattribute__', '__ne__', 
'__reduce__', '__contains__', '__delattr__', '__class__', '__lt__', 
'__setattr__', '__setitem__', '__iter__'}

 >>> set(dir(dict)).difference(set(dir(list)))
{'popitem', 'update', 'setdefault', 'items', 'values', 'fromkeys', 'get', 
'keys'}

 >>> set(dir(list)).difference(set(dir(dict)))
{'sort', '__mul__', 'remove', '__iadd__', '__reversed__', 'insert', 
'extend', 'append', 'count', '__add__', '__rmul__', 'index', '__imul__', 
'reverse'}

They do have quite a lot in common already.  The usefulness of different 
types having the same methods is that external code can be less specific to 
the objects they handle.  Of course, if those like methods act too 
differently they can be surprising as well.   That may be the case if '+' 
and '+=' are used to update dictionaries, but then again, maybe not. (?)


> As for being useful, useful for what? Useful how often? I'm sure that
> one could take any piece of code, no matter how obscure, and say it is
> useful*somewhere*  :-)  but the question is whether it is useful enough
> to be part of the language.

That's where examples will have an advantage over an initial personal 
opinion.  Not that initial opinions aren't useful at first to express 
support or non-support.  I could have just used +1.  ;-)

Cheers,
   Ron




From steve at pearwood.info  Wed Jul 30 02:17:26 2014
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 30 Jul 2014 10:17:26 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <lr99oh$kpq$1@ger.gmane.org>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
 <20140729033411.GJ9112@ando> <lr99oh$kpq$1@ger.gmane.org>
Message-ID: <20140730001726.GM9112@ando>

On Tue, Jul 29, 2014 at 06:12:16PM -0500, Ron Adam wrote on the 
similarity of lists and dicts:

[...]
> Well, here is how they currently compare.
> 
> >>> set(dir(dict)).intersection(set(dir(list)))
> {'copy', '__hash__', '__format__', '__sizeof__', '__ge__', '__delitem__', 
> '__getitem__', '__dir__', 'pop', '__gt__', '__repr__', '__init__', 
> '__subclasshook__', '__eq__', 'clear', '__len__', '__str__', '__le__', 
> '__new__', '__reduce_ex__', '__doc__', '__getattribute__', '__ne__', 
> '__reduce__', '__contains__', '__delattr__', '__class__', '__lt__', 
> '__setattr__', '__setitem__', '__iter__'}

Now strip out the methods which are common to pretty much all objects, 
in other words just look at the ones which are common to mapping and 
sequence APIs but not to objects in general:

{'copy', '__ge__', '__delitem__', '__getitem__', 'pop', '__gt__',  
'clear', '__len__', '__le__', '__contains__', '__lt__', '__setitem__', 
'__iter__'}

And now look a little more closely:

- although dicts and lists both support order comparisons like > and <,
  you cannot compare a dict to a list in Python 3;

- although dicts and lists both support a pop method, their signatures
  are different; x.pop() will fail if x is a dict, and x.pop(k, d) will
  fail if x is a list;

- although both support membership testing "a in x", what is being 
  tested is rather different; if x is a dict, then a must be a key,
  but the analog of keys for lists is the index, not the value.


So the similarities between list and dict are:

* both have a length

* both are iterable

* both support subscripting operations x[i]

* although dicts don't support slicing x[i:j:k]

* both support a copy() method

* both support a clear() method

That's not a really big set of operations in common, and they're rather 
general.

The real test is, under what practical circumstances would you expect to 
freely substitute a list for a dict or visa versa, and what could you do 
with that object when you received it?

For me, the only answer that comes readily to mind is that the dict 
constructor accepts either another dict or a list of (key,item) pairs. 

[...]
> They do have quite a lot in common already.  The usefulness of different 
> types having the same methods is that external code can be less specific to 
> the objects they handle.

I don't think that it is reasonable to treat dicts and lists as having a 
lot in common. They have a little in common, by virtue of both being 
containers, but then a string bag and a 40ft steel shipping container 
are both containers too, so that doesn't imply much similarity :-) It 
seems to me that outside of utterly generic operations like iteration, 
conversion to string and so on, lists do not quack like dicts, and dicts 
do not swim like lists, in any significant sense.


-- 
Steven

From greg.ewing at canterbury.ac.nz  Wed Jul 30 00:46:46 2014
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Jul 2014 10:46:46 +1200
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <53D78C09.20406@sotecware.net>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
 <53D78C09.20406@sotecware.net>
Message-ID: <53D82456.3060102@canterbury.ac.nz>

Jonas Wielicki wrote:
> FWIW, one could use an operator which inherently shows a direction: <<
> and >>, for both directions respectively.
> 
> A = B >> C lets B take precedence, and A = B << C lets C take precedence.

While it succeeds in indicating a direction, it
fails to suggest any kind of addition or union.

-- 
Greg

From j.wielicki at sotecware.net  Wed Jul 30 10:37:23 2014
From: j.wielicki at sotecware.net (Jonas Wielicki)
Date: Wed, 30 Jul 2014 10:37:23 +0200
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <53D82456.3060102@canterbury.ac.nz>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
 <53D78C09.20406@sotecware.net> <53D82456.3060102@canterbury.ac.nz>
Message-ID: <53D8AEC3.6080602@sotecware.net>

On 30.07.2014 00:46, Greg Ewing wrote:
> Jonas Wielicki wrote:
>> FWIW, one could use an operator which inherently shows a direction: <<
>> and >>, for both directions respectively.
>>
>> A = B >> C lets B take precedence, and A = B << C lets C take precedence.
> 
> While it succeeds in indicating a direction, it
> fails to suggest any kind of addition or union.
> 

As already noted elsewhere (to continue playing devils advocate), its
not an addition or union anyways. It?s not a union because it is lossy
and not commutative it?s not something I?d call addition either.

While one can certainly see it as shifting the elements from dict A over
dict B.

regards,
jwi

From ncoghlan at gmail.com  Wed Jul 30 13:52:54 2014
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Jul 2014 21:52:54 +1000
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <1406662169.52281.YahooMailNeo@web181005.mail.ne1.yahoo.com>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <CADQLQrVbycqOV6cJBTgQXo5RR3g_nhZOW_F2n4cBcZu-fsDFTg@mail.gmail.com>
 <CACac1F-3AG=OwS5o5u144WzKmnRoaSCQF8zZZNACz4feQaY9_g@mail.gmail.com>
 <CAN3CYHzBEV7YQixhYNwnMqYfyFQZO9Ff0NeEFeGE_rx7SGuWmA@mail.gmail.com>
 <CA030941-62B6-4B2E-BEDF-6C15B26283E4@yahoo.com>
 <CAN3CYHwt9tg9Gsv3_+5gx18SD3w+dcyNJ_26kMp0Qi-eXc6xfg@mail.gmail.com>
 <CACac1F_jX8kjnedvFtYXJ8vpZRfUQZLAkMOOyYrBP5aEm8pyfg@mail.gmail.com>
 <20140729143605.GL9112@ando>
 <1406662169.52281.YahooMailNeo@web181005.mail.ne1.yahoo.com>
Message-ID: <CADiSq7eSj3utrh-g7x=6Ftn1nJCWFGAbpNXEabMwAL2_qss+wQ@mail.gmail.com>

On 30 July 2014 05:29, Andrew Barnert <abarnert at yahoo.com.dmarc.invalid> wrote:
> On Tuesday, July 29, 2014 7:36 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>
>>On Tue, Jul 29, 2014 at 07:22:34AM +0100, Paul Moore wrote:
>>> On 29 July 2014 00:04, Alexander Heger <python at 2sn.net> wrote:
>>> > D = A | B | C
>>> >
>>> > becomes
>>> >
>>> > D = dict(collections.ChainMap(C, B, A))
>>>
>>> This immediately explains the key problem with this proposal. It never
>>> even *occurred* to me that anyone would expect C to take priority over
>>> A in the operator form. But the ChainMap form makes it immediately
>>> clear to me that this is the intent.
>>
>>Hmmm. Funny you say that, because to me that is a major disadvantage of
>>the ChainMap form: you have to write the arguments in reverse order.
>
>
> I think that's pretty much exactly his point:
>
> To him, it's obvious that + should be in the order of ChainMap, and he can't even conceive of the possibility that you'd want it "backward".
>
> To you, it's obvious that + should be the other way around, and you find it annoying that ChainMap is "backward".
>
> Which seems to imply that any attempt at setting an order is going to not only seem backward, but possibly surprisingly so, to a subset of Python's users.
>
> And this is the kind of thing can lead to subtle bugs. If a and b _almost never_ have duplicate keys, but very rarely do, you won't catch the problem until you think to test for it. And if one order or the other is so obvious to you that you didn't even imagine anyone would ever implement the opposite order, you probably won't think to write the test until you have a bug in the field?

I think this is a nice way of explaining the concern.

I'll also note that, given we turned a whole pile of similarly subtle
data driven bugs into structural type errors in the Python 3
transition, I'm not exactly enamoured of the idea of adding more :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From drekin at gmail.com  Wed Jul 30 15:58:02 2014
From: drekin at gmail.com (drekin)
Date: Wed, 30 Jul 2014 15:58:02 +0200
Subject: [Python-ideas] Redesign of Python stdio backend
Message-ID: <CACvLUa=FTCUiofxFcx_PhVteMYM1WNqvKhDo_u=2WdKFBZsiSg@mail.gmail.com>

I would expect that all standard IO in Python goes through sys.stdin,
sys.stdout and sys.stderr or the underlying buffer or raw objects. The only
exception should be error messages before the sys.std* objects are
initialized.

I was surprised that this is actually not the case ? reading input in the
interactive loop actually doesn't use sys.stdin (see
http://bugs.python.org/issue17620). However it uses its encoding, which
doesn't make sense. My knowledge of the actual implementation is rather
poor, but I got impression that the codepath of getting input from user in
interactive loop is complicated. I would think that it consits just of
wrapping an underlying system call (or GNU readline or anything) in
sys.stdin.buffer.raw.readinto or something. With current implementation,
fixing issues may be complicated ? for example handling SIGINT produced by
Ctrl-C on Windows issues. There is a closed issue
http://bugs.python.org/issue17619 but also an open issue
http://bugs.python.org/issue18597.

There is also a seven years old issue http://bugs.python.org/issue1602
regarding Unicode support on Windows console. Even if the issue isn't
fixed, anyone could just write their own sys.std* objects a install them in
the running interpreter. This doesn't work now because of the problem
described.

I just wanted to bring up the idea of redesign the stdio backend which also
results in fixing http://bugs.python.org/issue17620 and helping fixing the
others.

Regards, Drekin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140730/a7c8a370/attachment.html>

From ron3200 at gmail.com  Wed Jul 30 16:27:00 2014
From: ron3200 at gmail.com (Ron Adam)
Date: Wed, 30 Jul 2014 09:27:00 -0500
Subject: [Python-ideas] adding dictionaries
In-Reply-To: <20140730001726.GM9112@ando>
References: <CAN3CYHzG7N1+j6=spVNos_RfA_xiABFJtGEz5DGDzjUx24S2Sw@mail.gmail.com>
 <20140727011739.GC9112@ando>
 <CAN1F8qV94hoeg-P7wq+feK9L7-TL3TkV0THvFZ_F+_aCfuzgGw@mail.gmail.com>
 <20140728145951.GH9112@ando> <20140728153306.GA5756@k2>
 <20140728160450.GI9112@ando> <lr60in$bn5$1@ger.gmane.org>
 <20140729033411.GJ9112@ando> <lr99oh$kpq$1@ger.gmane.org>
 <20140730001726.GM9112@ando>
Message-ID: <lravbl$gun$1@ger.gmane.org>



On 07/29/2014 07:17 PM, Steven D'Aprano wrote:
> On Tue, Jul 29, 2014 at 06:12:16PM -0500, Ron Adam wrote on the
> similarity of lists and dicts:
>
> [...]
>> >Well, here is how they currently compare.
>> >
>>>>> > >>>set(dir(dict)).intersection(set(dir(list)))
>> >{'copy', '__hash__', '__format__', '__sizeof__', '__ge__', '__delitem__',
>> >'__getitem__', '__dir__', 'pop', '__gt__', '__repr__', '__init__',
>> >'__subclasshook__', '__eq__', 'clear', '__len__', '__str__', '__le__',
>> >'__new__', '__reduce_ex__', '__doc__', '__getattribute__', '__ne__',
>> >'__reduce__', '__contains__', '__delattr__', '__class__', '__lt__',
>> >'__setattr__', '__setitem__', '__iter__'}
> Now strip out the methods which are common to pretty much all objects,
> in other words just look at the ones which are common to mapping and
> sequence APIs but not to objects in general:
>
> {'copy', '__ge__', '__delitem__', '__getitem__', 'pop', '__gt__',
> 'clear', '__len__', '__le__', '__contains__', '__lt__', '__setitem__',
> '__iter__'}
>
> And now look a little more closely:
>
> - although dicts and lists both support order comparisons like > and <,
>    you cannot compare a dict to a list in Python 3;

I think this would be the case we are describing with + and +=.   You would 
not be able to add a dict and some other incompatible type.

Cheers,
    Ron