From paul at colomiets.name  Tue Oct  1 21:17:11 2013
From: paul at colomiets.name (Paul Colomiets)
Date: Tue, 1 Oct 2013 22:17:11 +0300
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <l2a32u$3jc$1@ger.gmane.org>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com>
 <l2a32u$3jc$1@ger.gmane.org>
Message-ID: <CAA0gF6okYC5jkAqDWztj6UU24EyH1S92HE-H+UJsjf6vmY24wQ@mail.gmail.com>

Hi,

On Sun, Sep 29, 2013 at 11:38 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>
> What should be changed in pprint?
>

Would be nice if it support custom types.

Just my 2 cents

--
Paul

From robert.kern at gmail.com  Tue Oct  1 22:00:27 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 01 Oct 2013 21:00:27 +0100
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <CAA0gF6okYC5jkAqDWztj6UU24EyH1S92HE-H+UJsjf6vmY24wQ@mail.gmail.com>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> <l2a32u$3jc$1@ger.gmane.org>
 <CAA0gF6okYC5jkAqDWztj6UU24EyH1S92HE-H+UJsjf6vmY24wQ@mail.gmail.com>
Message-ID: <l2f9kj$qgi$1@ger.gmane.org>

On 2013-10-01 20:17, Paul Colomiets wrote:
> Hi,
>
> On Sun, Sep 29, 2013 at 11:38 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>>
>> What should be changed in pprint?
>
> Would be nice if it support custom types.

For what it's worth, I would like to point out that IPython uses an adaptation 
of Armin Ronacher's pretty.py for pretty-printing as the default displayhook. It 
is a nice design that supports custom types after-the-fact.

   https://github.com/ipython/ipython/blob/master/IPython/lib/pretty.py

Armin's original code:

   http://dev.pocoo.org/hg/sandbox/file/tip/pretty/pretty.py

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


From ncoghlan at gmail.com  Wed Oct  2 01:20:34 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Oct 2013 09:20:34 +1000
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <CAA0gF6okYC5jkAqDWztj6UU24EyH1S92HE-H+UJsjf6vmY24wQ@mail.gmail.com>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com>
 <l2a32u$3jc$1@ger.gmane.org>
 <CAA0gF6okYC5jkAqDWztj6UU24EyH1S92HE-H+UJsjf6vmY24wQ@mail.gmail.com>
Message-ID: <CADiSq7dtiMYU7dcpkZgQ16rfahkyQ=FPDQHgd_Bo-VXH8JeKcA@mail.gmail.com>

On 2 Oct 2013 05:45, "Paul Colomiets" <paul at colomiets.name> wrote:
>
> Hi,
>
> On Sun, Sep 29, 2013 at 11:38 PM, Serhiy Storchaka <storchaka at gmail.com>
wrote:
> >
> > What should be changed in pprint?
> >
>
> Would be nice if it support custom types.

Fixing pprint to allow customisation was a key part of the rationale for
functools.singledispatch. I guess Lukasz just hasn't had time to work on
the follow-up patch to refactor the pprint module (or else I just missed it
on the tracker, which is entirely plausible).

Cheers,
Nick.

>
> Just my 2 cents
>
> --
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131002/3a7a9a03/attachment.html>

From steve at pearwood.info  Wed Oct  2 02:56:45 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 2 Oct 2013 10:56:45 +1000
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <l2a32u$3jc$1@ger.gmane.org>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> <l2a32u$3jc$1@ger.gmane.org>
Message-ID: <20131002005644.GI7989@ando>

On Sun, Sep 29, 2013 at 11:38:30PM +0300, Serhiy Storchaka wrote:
> 28.09.13 07:17, Raymond Hettinger ???????(??):
> >This might be a reasonable idea if pprint were in better shape.
> >I think substantial work needs to be done on it, before it would
> >be worthy of becoming the default method of display.
> 
> What should be changed in pprint?

I would like to see pprint be smarter about printing lists and dicts. At 
the moment, a long list is either printed all on one line, like the 
default display, or one item per line. This can end up as one long, 
narrow column, which is worse than the default. I'd like to see it be 
smarter about using multiple columns.

E.g. pprint([1, 2, 3, ... 1000])

rather than this:

[1, 
 2, 
 3, 
 ...
 998,
 999,
 1000]

something like this:

[1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
 ...  
 991,  992,  993,  994,  995,  996,  997,  998,  999, 1000]





-- 
Steven

From robert.kern at gmail.com  Wed Oct  2 17:31:58 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Wed, 02 Oct 2013 16:31:58 +0100
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <20131002005644.GI7989@ando>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> <l2a32u$3jc$1@ger.gmane.org>
 <20131002005644.GI7989@ando>
Message-ID: <l2he95$ui$1@ger.gmane.org>

On 2013-10-02 01:56, Steven D'Aprano wrote:
> On Sun, Sep 29, 2013 at 11:38:30PM +0300, Serhiy Storchaka wrote:
>> 28.09.13 07:17, Raymond Hettinger ???????(??):
>>> This might be a reasonable idea if pprint were in better shape.
>>> I think substantial work needs to be done on it, before it would
>>> be worthy of becoming the default method of display.
>>
>> What should be changed in pprint?
>
> I would like to see pprint be smarter about printing lists and dicts. At
> the moment, a long list is either printed all on one line, like the
> default display, or one item per line. This can end up as one long,
> narrow column, which is worse than the default. I'd like to see it be
> smarter about using multiple columns.
>
> E.g. pprint([1, 2, 3, ... 1000])
>
> rather than this:
>
> [1,
>   2,
>   3,
>   ...
>   998,
>   999,
>   1000]
>
> something like this:
>
> [1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
>   ...
>   991,  992,  993,  994,  995,  996,  997,  998,  999, 1000]

As someone who has used pretty-printing as their default displayhook for a 
decade now via IPython, I have to say that this case happens much less often 
than one might expect. It *is* irritating the rare times it does come up, but 
less so than what I expect we would see from the false positives of a more 
intelligent algorithm. But I withhold final judgement until I see the actual 
results of such an algorithm.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


From paul at colomiets.name  Wed Oct  2 22:20:47 2013
From: paul at colomiets.name (Paul Colomiets)
Date: Wed, 2 Oct 2013 23:20:47 +0300
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <CADiSq7dtiMYU7dcpkZgQ16rfahkyQ=FPDQHgd_Bo-VXH8JeKcA@mail.gmail.com>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com>
 <l2a32u$3jc$1@ger.gmane.org>
 <CAA0gF6okYC5jkAqDWztj6UU24EyH1S92HE-H+UJsjf6vmY24wQ@mail.gmail.com>
 <CADiSq7dtiMYU7dcpkZgQ16rfahkyQ=FPDQHgd_Bo-VXH8JeKcA@mail.gmail.com>
Message-ID: <CAA0gF6rtB81dDjM5y7S8kxn0kArXYkekC89zzXQrR8SG+XnsKg@mail.gmail.com>

Hi,

On Wed, Oct 2, 2013 at 2:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Fixing pprint to allow customisation was a key part of the rationale for
> functools.singledispatch. I guess Lukasz just hasn't had time to work on the
> follow-up patch to refactor the pprint module (or else I just missed it on
> the tracker, which is entirely plausible).
>

Nice. Any chances it will be in time for python 3.4? We are waiting
for it for about a decade :)

-- 
Paul

From g.rodola at gmail.com  Thu Oct  3 19:09:51 2013
From: g.rodola at gmail.com (Giampaolo Rodola')
Date: Thu, 3 Oct 2013 19:09:51 +0200
Subject: [Python-ideas] Allow from foo import bar*
Message-ID: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>

I suppose this has already been proposed in past but couldn't find
any online  reference so here goes.
When it comes to module constant imports I usually like being explicit it's
OK with me as long as I have to do:

>>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE)

Nevertheless in case the existence of certain constants depends on the
platform in use I end up doing:

>>> if hasattr(resource, "RLIMIT_MSGQUEUE"):  # linux only
....         import resource.RLIMIT_MSGQUEUE
....
>>> if hasattr(resource, "RLIMIT_NICE"):  # linux only
....         import resource.RLIMIT_NICE
....


...or worse, if for simplicity I'm willing to simply import all RLIMIT_*
constants I'll have to do this:

>>> import resource
>>> import sys
>>> for name in dir(resource):
....    if name.startswith('RLIMIT_'):
....        setattr(sys.modules[__name__], name, getattr(resource, name))

...or just give up and use:

from resource import *

...which of course will pollute the namespace with unnecessary stuff.
So why not just allow "from resource import RLIMIT_*" syntax?
Another interesting variation might be:


>>> from socket import AF_*, SOCK_*
>>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
(2, 10, 1, 2)


On the other hand mixing "*" and "common" imports would be forbidden:

>>> from socket import AF_*, socket,
  File "<stdin>", line 1
    from socket import AF_*, socket
                                         ^
SyntaxError: invalid syntax;


Thoughts?


--- Giampaolo
https://code.google.com/p/pyftpdlib/
https://code.google.com/p/psutil/
https://code.google.com/p/pysendfile/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131003/56fb3768/attachment.html>

From guido at python.org  Thu Oct  3 19:16:37 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 3 Oct 2013 10:16:37 -0700
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
Message-ID: <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>

Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?


On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' <g.rodola at gmail.com>wrote:

> I suppose this has already been proposed in past but couldn't find
> any online  reference so here goes.
> When it comes to module constant imports I usually like being explicit
> it's OK with me as long as I have to do:
>
> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE)
>
> Nevertheless in case the existence of certain constants depends on the
> platform in use I end up doing:
>
> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"):  # linux only
> ....         import resource.RLIMIT_MSGQUEUE
> ....
> >>> if hasattr(resource, "RLIMIT_NICE"):  # linux only
> ....         import resource.RLIMIT_NICE
> ....
>
>
> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_*
> constants I'll have to do this:
>
> >>> import resource
> >>> import sys
> >>> for name in dir(resource):
> ....    if name.startswith('RLIMIT_'):
> ....        setattr(sys.modules[__name__], name, getattr(resource, name))
>
> ...or just give up and use:
>
> from resource import *
>
> ...which of course will pollute the namespace with unnecessary stuff.
> So why not just allow "from resource import RLIMIT_*" syntax?
> Another interesting variation might be:
>
>
> >>> from socket import AF_*, SOCK_*
> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
> (2, 10, 1, 2)
>
>
> On the other hand mixing "*" and "common" imports would be forbidden:
>
> >>> from socket import AF_*, socket,
>   File "<stdin>", line 1
>     from socket import AF_*, socket
>                                          ^
> SyntaxError: invalid syntax;
>
>
> Thoughts?
>
>
> --- Giampaolo
> https://code.google.com/p/pyftpdlib/
> https://code.google.com/p/psutil/
> https://code.google.com/p/pysendfile/
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131003/7f95935b/attachment.html>

From python at mrabarnett.plus.com  Thu Oct  3 19:25:46 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 03 Oct 2013 18:25:46 +0100
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
Message-ID: <524DA89A.6090608@mrabarnett.plus.com>

On 03/10/2013 18:09, Giampaolo Rodola' wrote:
> I suppose this has already been proposed in past but couldn't find
> any online  reference so here goes.
> When it comes to module constant imports I usually like being explicit
> it's OK with me as long as I have to do:
>
>  >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE)
>
> Nevertheless in case the existence of certain constants depends on the
> platform in use I end up doing:
>
>  >>> if hasattr(resource, "RLIMIT_MSGQUEUE"):  # linux only
> ....         import resource.RLIMIT_MSGQUEUE
> ....
>  >>> if hasattr(resource, "RLIMIT_NICE"):  # linux only
> ....         import resource.RLIMIT_NICE
> ....
>
>
> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_*
> constants I'll have to do this:
>
>  >>> import resource
>  >>> import sys
>  >>> for name in dir(resource):
> ....    if name.startswith('RLIMIT_'):
> ....        setattr(sys.modules[__name__], name, getattr(resource, name))
>
> ...or just give up and use:
>
> from resource import *
>
> ...which of course will pollute the namespace with unnecessary stuff.
> So why not just allow "from resource import RLIMIT_*" syntax?
> Another interesting variation might be:
>
>
>  >>> from socket import AF_*, SOCK_*
>  >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
> (2, 10, 1, 2)
>
>
> On the other hand mixing "*" and "common" imports would be forbidden:
>
>  >>> from socket import AF_*, socket,
>    File "<stdin>", line 1
>      from socket import AF_*, socket
>                                           ^
> SyntaxError: invalid syntax;
>
>
> Thoughts?
>
If you're importing RLIMIT_MSGQUEUE, then presumably you're using it
somewhere(!), but if it's platform-specific, you'll still need to check
which platform the code is running on anyway before trying to use it...


From storchaka at gmail.com  Thu Oct  3 20:42:03 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Oct 2013 21:42:03 +0300
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
Message-ID: <l2kdoj$a46$1@ger.gmane.org>

03.10.13 20:09, Giampaolo Rodola' ???????(??):
> Another interesting variation might be:
>
>  >>> from socket import AF_*, SOCK_*
>  >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
> (2, 10, 1, 2)

 >>> from socket import AddressFamily, SocketType
 >>> globals().update(AddressFamily.__members__)
 >>> globals().update(SocketType.__members__)
 >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
(<AddressFamily.AF_INET: 2>, <AddressFamily.AF_INET6: 10>, 
<SocketType.SOCK_STREAM: 1>, <SocketType.SOCK_DGRAM: 2>)



From g.rodola at gmail.com  Thu Oct  3 20:43:09 2013
From: g.rodola at gmail.com (Giampaolo Rodola')
Date: Thu, 3 Oct 2013 20:43:09 +0200
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
 <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
Message-ID: <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>

> Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?

That's what I usually do as well (because explicit is better than implicit)
but from my understanding when it comes to constants it is generally not
considered a bad practice to import them directly into the module namespace.
I guess my specific case is bit different though.
I have all these constants defined in a _linux.py submodule which I import
from __init__.py in order to expose them publicly.
And this is how I do that:

    # Linux >= 2.6.36
    if _psplatform.HAS_PRLIMIT:
        from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE,
                                     RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE,
                                     RLIMIT_LOCKS, RLIMIT_MEMLOCK,
RLIMIT_NOFILE,
                                     RLIMIT_NPROC, RLIMIT_RSS, RLIMIT_STACK)
        if hasattr(_psplatform, "RLIMIT_MSGQUEUE"):
            RLIMIT_MSGQUEUE = _psplatform.RLIMIT_MSGQUEUE
        if hasattr(_psplatform, "RLIMIT_NICE"):
            RLIMIT_NICE = _psplatform.RLIMIT_NICE
        if hasattr(_psplatform, "RLIMIT_RTPRIO"):
            RLIMIT_RTPRIO = _psplatform.RLIMIT_RTPRIO
        if hasattr(_psplatform, "RLIMIT_RTTIME"):
            RLIMIT_RTTIME = _psplatform.RLIMIT_RTTIME
        if hasattr(_psplatform, "RLIMIT_SIGPENDING"):
            RLIMIT_SIGPENDING = _psplatform.RLIMIT_SIGPENDING


In *this specific case* a "from _psplatform import RLIM*" would have solved
my problem nicely.
On one hand this might look like encouraging wildcard import usage, but I
think it's the opposite.
Sometimes people use "from foo import *" just because "from foo import
bar*" is not available.


--- Giampaolo
https://code.google.com/p/pyftpdlib/
https://code.google.com/p/psutil/
https://code.google.com/p/pysendfile/


On Thu, Oct 3, 2013 at 7:16 PM, Guido van Rossum <guido at python.org> wrote:

> Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?
>
>
> On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' <g.rodola at gmail.com>wrote:
>
>> I suppose this has already been proposed in past but couldn't find
>> any online  reference so here goes.
>> When it comes to module constant imports I usually like being explicit
>> it's OK with me as long as I have to do:
>>
>> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE)
>>
>> Nevertheless in case the existence of certain constants depends on the
>> platform in use I end up doing:
>>
>> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"):  # linux only
>> ....         import resource.RLIMIT_MSGQUEUE
>> ....
>> >>> if hasattr(resource, "RLIMIT_NICE"):  # linux only
>> ....         import resource.RLIMIT_NICE
>> ....
>>
>>
>> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_*
>> constants I'll have to do this:
>>
>> >>> import resource
>> >>> import sys
>> >>> for name in dir(resource):
>> ....    if name.startswith('RLIMIT_'):
>> ....        setattr(sys.modules[__name__], name, getattr(resource, name))
>>
>> ...or just give up and use:
>>
>> from resource import *
>>
>> ...which of course will pollute the namespace with unnecessary stuff.
>> So why not just allow "from resource import RLIMIT_*" syntax?
>> Another interesting variation might be:
>>
>>
>> >>> from socket import AF_*, SOCK_*
>> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
>> (2, 10, 1, 2)
>>
>>
>> On the other hand mixing "*" and "common" imports would be forbidden:
>>
>> >>> from socket import AF_*, socket,
>>   File "<stdin>", line 1
>>     from socket import AF_*, socket
>>                                          ^
>> SyntaxError: invalid syntax;
>>
>>
>> Thoughts?
>>
>>
>> --- Giampaolo
>> https://code.google.com/p/pyftpdlib/
>> https://code.google.com/p/psutil/
>> https://code.google.com/p/pysendfile/
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131003/64b9f870/attachment-0001.html>

From guido at python.org  Thu Oct  3 21:00:39 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 3 Oct 2013 12:00:39 -0700
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
 <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
 <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
Message-ID: <CAP7+vJLq6+DTJFMj7qd7EUODZr_v-KqB_+QaW5KCC1CmT1ybyw@mail.gmail.com>

Hm. It seems a pretty small use case for what would be a major
implementation challenge -- I'm sure there would be lots of issues
implementing this cleanly given all the special casing for import *, and
the special handling of importlib during bootstrap.


On Thu, Oct 3, 2013 at 11:43 AM, Giampaolo Rodola' <g.rodola at gmail.com>wrote:

> > Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?
>
> That's what I usually do as well (because explicit is better than
> implicit) but from my understanding when it comes to constants it is
> generally not considered a bad practice to import them directly into the
> module namespace.
> I guess my specific case is bit different though.
> I have all these constants defined in a _linux.py submodule which I import
> from __init__.py in order to expose them publicly.
> And this is how I do that:
>
>     # Linux >= 2.6.36
>     if _psplatform.HAS_PRLIMIT:
>         from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE,
>                                      RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE,
>                                      RLIMIT_LOCKS, RLIMIT_MEMLOCK,
> RLIMIT_NOFILE,
>                                      RLIMIT_NPROC, RLIMIT_RSS,
> RLIMIT_STACK)
>         if hasattr(_psplatform, "RLIMIT_MSGQUEUE"):
>             RLIMIT_MSGQUEUE = _psplatform.RLIMIT_MSGQUEUE
>         if hasattr(_psplatform, "RLIMIT_NICE"):
>             RLIMIT_NICE = _psplatform.RLIMIT_NICE
>         if hasattr(_psplatform, "RLIMIT_RTPRIO"):
>             RLIMIT_RTPRIO = _psplatform.RLIMIT_RTPRIO
>         if hasattr(_psplatform, "RLIMIT_RTTIME"):
>             RLIMIT_RTTIME = _psplatform.RLIMIT_RTTIME
>         if hasattr(_psplatform, "RLIMIT_SIGPENDING"):
>             RLIMIT_SIGPENDING = _psplatform.RLIMIT_SIGPENDING
>
>
> In *this specific case* a "from _psplatform import RLIM*" would have
> solved my problem nicely.
> On one hand this might look like encouraging wildcard import usage, but I
> think it's the opposite.
> Sometimes people use "from foo import *" just because "from foo import
> bar*" is not available.
>
>
> --- Giampaolo
> https://code.google.com/p/pyftpdlib/
> https://code.google.com/p/psutil/
> https://code.google.com/p/pysendfile/
>
>
> On Thu, Oct 3, 2013 at 7:16 PM, Guido van Rossum <guido at python.org> wrote:
>
>> Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?
>>
>>
>> On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' <g.rodola at gmail.com>wrote:
>>
>>> I suppose this has already been proposed in past but couldn't find
>>> any online  reference so here goes.
>>> When it comes to module constant imports I usually like being explicit
>>> it's OK with me as long as I have to do:
>>>
>>> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE)
>>>
>>> Nevertheless in case the existence of certain constants depends on the
>>> platform in use I end up doing:
>>>
>>> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"):  # linux only
>>> ....         import resource.RLIMIT_MSGQUEUE
>>> ....
>>> >>> if hasattr(resource, "RLIMIT_NICE"):  # linux only
>>> ....         import resource.RLIMIT_NICE
>>> ....
>>>
>>>
>>> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_*
>>> constants I'll have to do this:
>>>
>>> >>> import resource
>>> >>> import sys
>>> >>> for name in dir(resource):
>>> ....    if name.startswith('RLIMIT_'):
>>> ....        setattr(sys.modules[__name__], name, getattr(resource, name))
>>>
>>> ...or just give up and use:
>>>
>>> from resource import *
>>>
>>> ...which of course will pollute the namespace with unnecessary stuff.
>>> So why not just allow "from resource import RLIMIT_*" syntax?
>>> Another interesting variation might be:
>>>
>>>
>>> >>> from socket import AF_*, SOCK_*
>>> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
>>> (2, 10, 1, 2)
>>>
>>>
>>> On the other hand mixing "*" and "common" imports would be forbidden:
>>>
>>> >>> from socket import AF_*, socket,
>>>   File "<stdin>", line 1
>>>     from socket import AF_*, socket
>>>                                          ^
>>> SyntaxError: invalid syntax;
>>>
>>>
>>> Thoughts?
>>>
>>>
>>> --- Giampaolo
>>> https://code.google.com/p/pyftpdlib/
>>> https://code.google.com/p/psutil/
>>> https://code.google.com/p/pysendfile/
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131003/bea9f480/attachment.html>

From g.rodola at gmail.com  Thu Oct  3 21:13:34 2013
From: g.rodola at gmail.com (Giampaolo Rodola')
Date: Thu, 3 Oct 2013 21:13:34 +0200
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAP7+vJLq6+DTJFMj7qd7EUODZr_v-KqB_+QaW5KCC1CmT1ybyw@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
 <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
 <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
 <CAP7+vJLq6+DTJFMj7qd7EUODZr_v-KqB_+QaW5KCC1CmT1ybyw@mail.gmail.com>
Message-ID: <CAFYqXL-6XmH451ByYHJ5M11hJjNev0cpovuWcJGg2cHqU1LbAQ@mail.gmail.com>

On Thu, Oct 3, 2013 at 9:00 PM, Guido van Rossum <guido at python.org> wrote:

> Hm. It seems a pretty small use case for what would be a major
> implementation challenge -- I'm sure there would be lots of issues
> implementing this cleanly given all the special casing for import *, and
> the special handling of importlib during bootstrap.
>

Fair enough.

--- Giampaolo
https://code.google.com/p/pyftpdlib/
https://code.google.com/p/psutil/
https://code.google.com/p/pysendfile/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131003/cf0e82c6/attachment.html>

From joshua at landau.ws  Thu Oct  3 21:59:42 2013
From: joshua at landau.ws (Joshua Landau)
Date: Thu, 3 Oct 2013 20:59:42 +0100
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
 <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
 <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
Message-ID: <CAN1F8qVGaqc3P_jwYMyW5abK0OyYh7S2=aCFM5s59NKSgxRV6w@mail.gmail.com>

On 3 October 2013 19:43, Giampaolo Rodola' <g.rodola at gmail.com> wrote:
>> Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?
>
> That's what I usually do as well (because explicit is better than implicit)
> but from my understanding when it comes to constants it is generally not
> considered a bad practice to import them directly into the module namespace.
> I guess my specific case is bit different though.
> I have all these constants defined in a _linux.py submodule which I import
> from __init__.py in order to expose them publicly.
> And this is how I do that:
>
>     # Linux >= 2.6.36
>     if _psplatform.HAS_PRLIMIT:
>         from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE,
>                                      RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE,
>                                      RLIMIT_LOCKS, RLIMIT_MEMLOCK,
...
>
> In *this specific case* a "from _psplatform import RLIM*" would have solved
> my problem nicely.

Or we change the module such that we can do

    from psutil._pslinux import RLIMIT

and then use RLIMIT.CORE, RLIMIT.CPU, RLIMIT.LOCKS, etc.

From rymg19 at gmail.com  Thu Oct  3 22:43:01 2013
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Thu, 3 Oct 2013 15:43:01 -0500
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
 <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
 <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
Message-ID: <CAO41-mOW9KffCQfLkvpXy1emTw-XWGs8WiaOF7qoK5vgYCdJwA@mail.gmail.com>

Well, that looks painful! I agree with Joshua: If you are doing something
like that, namespaces work best. If you're really that desperate, why not
something like this:

globals().update({name: getattr(_psplatform, name) for name in
dir(_psplatform) if name.startswith('RLIMIT')})


On Thu, Oct 3, 2013 at 1:43 PM, Giampaolo Rodola' <g.rodola at gmail.com>wrote:

> > Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?
>
> That's what I usually do as well (because explicit is better than
> implicit) but from my understanding when it comes to constants it is
> generally not considered a bad practice to import them directly into the
> module namespace.
> I guess my specific case is bit different though.
> I have all these constants defined in a _linux.py submodule which I import
> from __init__.py in order to expose them publicly.
> And this is how I do that:
>
>     # Linux >= 2.6.36
>     if _psplatform.HAS_PRLIMIT:
>         from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS, RLIMIT_CORE,
>                                      RLIMIT_CPU, RLIMIT_DATA, RLIMIT_FSIZE,
>                                      RLIMIT_LOCKS, RLIMIT_MEMLOCK,
> RLIMIT_NOFILE,
>                                      RLIMIT_NPROC, RLIMIT_RSS,
> RLIMIT_STACK)
>         if hasattr(_psplatform, "RLIMIT_MSGQUEUE"):
>             RLIMIT_MSGQUEUE = _psplatform.RLIMIT_MSGQUEUE
>         if hasattr(_psplatform, "RLIMIT_NICE"):
>             RLIMIT_NICE = _psplatform.RLIMIT_NICE
>         if hasattr(_psplatform, "RLIMIT_RTPRIO"):
>             RLIMIT_RTPRIO = _psplatform.RLIMIT_RTPRIO
>         if hasattr(_psplatform, "RLIMIT_RTTIME"):
>             RLIMIT_RTTIME = _psplatform.RLIMIT_RTTIME
>         if hasattr(_psplatform, "RLIMIT_SIGPENDING"):
>             RLIMIT_SIGPENDING = _psplatform.RLIMIT_SIGPENDING
>
>
> In *this specific case* a "from _psplatform import RLIM*" would have
> solved my problem nicely.
> On one hand this might look like encouraging wildcard import usage, but I
> think it's the opposite.
> Sometimes people use "from foo import *" just because "from foo import
> bar*" is not available.
>
>
> --- Giampaolo
> https://code.google.com/p/pyftpdlib/
> https://code.google.com/p/psutil/
> https://code.google.com/p/pysendfile/
>
>
> On Thu, Oct 3, 2013 at 7:16 PM, Guido van Rossum <guido at python.org> wrote:
>
>> Hm. Why not just use "import socket" and then use "socket.AF_<whatever>"?
>>
>>
>> On Thu, Oct 3, 2013 at 10:09 AM, Giampaolo Rodola' <g.rodola at gmail.com>wrote:
>>
>>> I suppose this has already been proposed in past but couldn't find
>>> any online  reference so here goes.
>>> When it comes to module constant imports I usually like being explicit
>>> it's OK with me as long as I have to do:
>>>
>>> >>> from resource import (RLIMIT_CORE, RLIMIT_CPU, RLIMIT_FSIZE)
>>>
>>> Nevertheless in case the existence of certain constants depends on the
>>> platform in use I end up doing:
>>>
>>> >>> if hasattr(resource, "RLIMIT_MSGQUEUE"):  # linux only
>>> ....         import resource.RLIMIT_MSGQUEUE
>>> ....
>>> >>> if hasattr(resource, "RLIMIT_NICE"):  # linux only
>>> ....         import resource.RLIMIT_NICE
>>> ....
>>>
>>>
>>> ...or worse, if for simplicity I'm willing to simply import all RLIMIT_*
>>> constants I'll have to do this:
>>>
>>> >>> import resource
>>> >>> import sys
>>> >>> for name in dir(resource):
>>> ....    if name.startswith('RLIMIT_'):
>>> ....        setattr(sys.modules[__name__], name, getattr(resource, name))
>>>
>>> ...or just give up and use:
>>>
>>> from resource import *
>>>
>>> ...which of course will pollute the namespace with unnecessary stuff.
>>> So why not just allow "from resource import RLIMIT_*" syntax?
>>> Another interesting variation might be:
>>>
>>>
>>> >>> from socket import AF_*, SOCK_*
>>> >>> AF_INET, AF_INET6, SOCK_STREAM, SOCK_DGRAM
>>> (2, 10, 1, 2)
>>>
>>>
>>> On the other hand mixing "*" and "common" imports would be forbidden:
>>>
>>> >>> from socket import AF_*, socket,
>>>   File "<stdin>", line 1
>>>     from socket import AF_*, socket
>>>                                          ^
>>> SyntaxError: invalid syntax;
>>>
>>>
>>> Thoughts?
>>>
>>>
>>> --- Giampaolo
>>> https://code.google.com/p/pyftpdlib/
>>> https://code.google.com/p/psutil/
>>> https://code.google.com/p/pysendfile/
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131003/e6952ad6/attachment-0001.html>

From ncoghlan at gmail.com  Thu Oct  3 23:44:43 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 4 Oct 2013 07:44:43 +1000
Subject: [Python-ideas] Allow from foo import bar*
In-Reply-To: <CAN1F8qVGaqc3P_jwYMyW5abK0OyYh7S2=aCFM5s59NKSgxRV6w@mail.gmail.com>
References: <CAFYqXL9XtH+-Zd0v9Vr1xJGOneDKz8A3_amOq6bUfwmZgAZgfA@mail.gmail.com>
 <CAP7+vJ+wVaq2Gsw_DVF_r78=OXYNKLw36WPW-_pXxwQFmOO57g@mail.gmail.com>
 <CAFYqXL8yQkUKtKQw2W_uVbcbEgXS+VD4eOquL5cUoxrzTtgfBA@mail.gmail.com>
 <CAN1F8qVGaqc3P_jwYMyW5abK0OyYh7S2=aCFM5s59NKSgxRV6w@mail.gmail.com>
Message-ID: <CADiSq7e=dnL9KnER6GCX=6Gg4-gUAdFGs3+GfKKE7_ih5iO=fw@mail.gmail.com>

On 4 Oct 2013 06:01, "Joshua Landau" <joshua at landau.ws> wrote:
>
> On 3 October 2013 19:43, Giampaolo Rodola' <g.rodola at gmail.com> wrote:
> >> Hm. Why not just use "import socket" and then use
"socket.AF_<whatever>"?
> >
> > That's what I usually do as well (because explicit is better than
implicit)
> > but from my understanding when it comes to constants it is generally not
> > considered a bad practice to import them directly into the module
namespace.
> > I guess my specific case is bit different though.
> > I have all these constants defined in a _linux.py submodule which I
import
> > from __init__.py in order to expose them publicly.
> > And this is how I do that:
> >
> >     # Linux >= 2.6.36
> >     if _psplatform.HAS_PRLIMIT:
> >         from psutil._pslinux import (RLIM_INFINITY, RLIMIT_AS,
RLIMIT_CORE,
> >                                      RLIMIT_CPU, RLIMIT_DATA,
RLIMIT_FSIZE,
> >                                      RLIMIT_LOCKS, RLIMIT_MEMLOCK,
> ...
> >
> > In *this specific case* a "from _psplatform import RLIM*" would have
solved
> > my problem nicely.
>
> Or we change the module such that we can do
>
>     from psutil._pslinux import RLIMIT
>
> and then use RLIMIT.CORE, RLIMIT.CPU, RLIMIT.LOCKS, etc.

Another Enum candidate, perhaps?

Cheers,
Nick.

> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131004/bc921772/attachment.html>

From storchaka at gmail.com  Fri Oct  4 21:17:14 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 04 Oct 2013 22:17:14 +0300
Subject: [Python-ideas] pprint in displayhook
In-Reply-To: <20131002005644.GI7989@ando>
References: <l23lbh$4el$1@ger.gmane.org>
 <0BFAAF4A-F5C8-48FA-9C82-1B60D164A033@gmail.com> <l2a32u$3jc$1@ger.gmane.org>
 <20131002005644.GI7989@ando>
Message-ID: <l2n46e$fav$2@ger.gmane.org>

02.10.13 03:56, Steven D'Aprano ???????(??):
> I would like to see pprint be smarter about printing lists and dicts. At
> the moment, a long list is either printed all on one line, like the
> default display, or one item per line. This can end up as one long,
> narrow column, which is worse than the default. I'd like to see it be
> smarter about using multiple columns.

http://bugs.python.org/issue19132



From storchaka at gmail.com  Tue Oct  8 13:17:59 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 08 Oct 2013 14:17:59 +0300
Subject: [Python-ideas] Add "has_surrogates" flags to string object
Message-ID: <l30pl0$apd$1@ger.gmane.org>

Here is an idea about adding a mark to PyUnicode object which allows 
fast answer to the question if a string has surrogate code. This mark 
has one of three possible states:

* String doesn't contain surrogates.
* String contains surrogates.
* It is still unknown.

We can combine this with "is_ascii" flag in 2-bit value:

* String is ASCII-only (and doesn't contain surrogates).
* String is not ASCII-only and doesn't contain surrogates.
* String is not ASCII-only and contains surrogates.
* String is not ASCII-only and it is still unknown if it contains surrogate.

By default a string is created in "unknown" state (if it is UCS2 or 
UCS4). After first request it can be switched to "has surrogates" or 
"hasn't surrogates". State of the result of concatenating or slicing can 
be determined from states of input strings.

This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a 
little faster UTF-8 encoding) and converting to wchar_t* if string 
hasn't surrogates (this is true in most cases).


From masklinn at masklinn.net  Tue Oct  8 13:38:19 2013
From: masklinn at masklinn.net (Masklinn)
Date: Tue, 8 Oct 2013 13:38:19 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <l30pl0$apd$1@ger.gmane.org>
References: <l30pl0$apd$1@ger.gmane.org>
Message-ID: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>

On 2013-10-08, at 13:17 , Serhiy Storchaka wrote:

> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states:
> 
> * String doesn't contain surrogates.
> * String contains surrogates.
> * It is still unknown.
> 
> We can combine this with "is_ascii" flag in 2-bit value:
> 
> * String is ASCII-only (and doesn't contain surrogates).
> * String is not ASCII-only and doesn't contain surrogates.
> * String is not ASCII-only and contains surrogates.
> * String is not ASCII-only and it is still unknown if it contains surrogate.

Isn't that redundant with the kind under shortest form representation?

From solipsis at pitrou.net  Tue Oct  8 13:43:43 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 8 Oct 2013 13:43:43 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
References: <l30pl0$apd$1@ger.gmane.org>
Message-ID: <20131008134343.2e084051@pitrou.net>

Le Tue, 08 Oct 2013 14:17:59 +0300,
Serhiy Storchaka <storchaka at gmail.com> a
?crit :
> Here is an idea about adding a mark to PyUnicode object which allows 
> fast answer to the question if a string has surrogate code. This mark 
> has one of three possible states:
> 
> * String doesn't contain surrogates.
> * String contains surrogates.
> * It is still unknown.
> 
> We can combine this with "is_ascii" flag in 2-bit value:
> 
> * String is ASCII-only (and doesn't contain surrogates).
> * String is not ASCII-only and doesn't contain surrogates.
> * String is not ASCII-only and contains surrogates.
> * String is not ASCII-only and it is still unknown if it contains
> surrogate.
> 
> By default a string is created in "unknown" state (if it is UCS2 or 
> UCS4). After first request it can be switched to "has surrogates" or 
> "hasn't surrogates". State of the result of concatenating or slicing
> can be determined from states of input strings.

Not true for slicing (you can take a non-surrogates slice of a
surrogates string). Other than that, this sounds reasonable to me,
provided that the patch isn't too complex and the perf improvements are
worth it.

Regards

Antoine.



From storchaka at gmail.com  Tue Oct  8 13:43:51 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 08 Oct 2013 14:43:51 +0300
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
Message-ID: <l30r5h$ssp$1@ger.gmane.org>

08.10.13 14:38, Masklinn ???????(??):
> On 2013-10-08, at 13:17 , Serhiy Storchaka wrote:
>
>> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states:
>>
>> * String doesn't contain surrogates.
>> * String contains surrogates.
>> * It is still unknown.
>>
>> We can combine this with "is_ascii" flag in 2-bit value:
>>
>> * String is ASCII-only (and doesn't contain surrogates).
>> * String is not ASCII-only and doesn't contain surrogates.
>> * String is not ASCII-only and contains surrogates.
>> * String is not ASCII-only and it is still unknown if it contains surrogate.
>
> Isn't that redundant with the kind under shortest form representation?

No, it isn't redundant. '\udc80' is UCS2 string with surrogate code, and 
'\udc80\U00010000' is UCS4 string with surrogate code. UCS2 string 
without surrogate codes can be encoded in UTF-16 by memcpy().


From mal at egenix.com  Tue Oct  8 13:58:00 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 08 Oct 2013 13:58:00 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <l30pl0$apd$1@ger.gmane.org>
References: <l30pl0$apd$1@ger.gmane.org>
Message-ID: <5253F348.3010204@egenix.com>

On 08.10.2013 13:17, Serhiy Storchaka wrote:
> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if
> a string has surrogate code. This mark has one of three possible states:
> 
> * String doesn't contain surrogates.
> * String contains surrogates.
> * It is still unknown.
> 
> We can combine this with "is_ascii" flag in 2-bit value:
> 
> * String is ASCII-only (and doesn't contain surrogates).
> * String is not ASCII-only and doesn't contain surrogates.
> * String is not ASCII-only and contains surrogates.
> * String is not ASCII-only and it is still unknown if it contains surrogate.
> 
> By default a string is created in "unknown" state (if it is UCS2 or UCS4). After first request it
> can be switched to "has surrogates" or "hasn't surrogates". State of the result of concatenating or
> slicing can be determined from states of input strings.
> 
> This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little faster UTF-8 encoding)
> and converting to wchar_t* if string hasn't surrogates (this is true in most cases).

I guess you could use one bit from the kind structure
for that:

        /* Character size:

           - PyUnicode_WCHAR_KIND (0):

             * character type = wchar_t (16 or 32 bits, depending on the
               platform)

           - PyUnicode_1BYTE_KIND (1):

             * character type = Py_UCS1 (8 bits, unsigned)
             * all characters are in the range U+0000-U+00FF (latin1)
             * if ascii is set, all characters are in the range U+0000-U+007F
               (ASCII), otherwise at least one character is in the range
               U+0080-U+00FF

           - PyUnicode_2BYTE_KIND (2):

             * character type = Py_UCS2 (16 bits, unsigned)
             * all characters are in the range U+0000-U+FFFF (BMP)
             * at least one character is in the range U+0100-U+FFFF

           - PyUnicode_4BYTE_KIND (4):

             * character type = Py_UCS4 (32 bits, unsigned)
             * all characters are in the range U+0000-U+10FFFF
             * at least one character is in the range U+10000-U+10FFFF
         */
        unsigned int kind:3;


For some reason, it allocates 3 bits, but only 2 bits are
used.

The again, the state struct is unsigned int, so there's still plenty
of room for extra flags.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 08 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-10-14: PyCon DE 2013, Cologne, Germany ...             6 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From masklinn at masklinn.net  Tue Oct  8 13:58:20 2013
From: masklinn at masklinn.net (Masklinn)
Date: Tue, 8 Oct 2013 13:58:20 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <l30r5h$ssp$1@ger.gmane.org>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
Message-ID: <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>


On 2013-10-08, at 13:43 , Serhiy Storchaka wrote:

> 08.10.13 14:38, Masklinn ???????(??):
>> On 2013-10-08, at 13:17 , Serhiy Storchaka wrote:
>> 
>>> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states:
>>> 
>>> * String doesn't contain surrogates.
>>> * String contains surrogates.
>>> * It is still unknown.
>>> 
>>> We can combine this with "is_ascii" flag in 2-bit value:
>>> 
>>> * String is ASCII-only (and doesn't contain surrogates).
>>> * String is not ASCII-only and doesn't contain surrogates.
>>> * String is not ASCII-only and contains surrogates.
>>> * String is not ASCII-only and it is still unknown if it contains surrogate.
>> 
>> Isn't that redundant with the kind under shortest form representation?
> 
> No, it isn't redundant. '\udc80' is UCS2 string with surrogate code, and '\udc80\U00010000' is UCS4 string with surrogate code.

I don't know the details of the flexible string representation, but I
believed the names fit what was actually in memory. UCS2 does not
have surrogate pairs, thus surrogate codes make no sense in UCS2,
they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not
codepoints, they have no reason to appear in either UCS2 or UCS4
outside of encoding errors.

> UCS2 string without surrogate codes can be encoded in UTF-16 by memcpy().

Surrogate codes prevent that (modulo objections above) for slicing (not
that it's a big issue I think, a guard can just check whether it's
slicing within a surrogate pair, that only requires checking the first
and last 2 bytes of the range) but not for concatenation right?

From victor.stinner at gmail.com  Tue Oct  8 14:23:09 2013
From: victor.stinner at gmail.com (Victor Stinner)
Date: Tue, 8 Oct 2013 14:23:09 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <l30pl0$apd$1@ger.gmane.org>
References: <l30pl0$apd$1@ger.gmane.org>
Message-ID: <CAMpsgwZCZx2UJ2posN0ebBw0BX1OsBmq88K2JHCXX-_i2+b2PA@mail.gmail.com>

I like the idea. I prefer to add another flag (1 bit), instead of
having a complex with 4 different values.

Your idea looks specific to the PEP 393, so I prefer to keep the flag
private. Otherwise it would be hard for other implementations of
Python to implement the function getting the flag value.

Victor

2013/10/8 Serhiy Storchaka <storchaka at gmail.com>:
> Here is an idea about adding a mark to PyUnicode object which allows fast
> answer to the question if a string has surrogate code. This mark has one of
> three possible states:
>
> * String doesn't contain surrogates.
> * String contains surrogates.
> * It is still unknown.
>
> We can combine this with "is_ascii" flag in 2-bit value:
>
> * String is ASCII-only (and doesn't contain surrogates).
> * String is not ASCII-only and doesn't contain surrogates.
> * String is not ASCII-only and contains surrogates.
> * String is not ASCII-only and it is still unknown if it contains surrogate.
>
> By default a string is created in "unknown" state (if it is UCS2 or UCS4).
> After first request it can be switched to "has surrogates" or "hasn't
> surrogates". State of the result of concatenating or slicing can be
> determined from states of input strings.
>
> This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little
> faster UTF-8 encoding) and converting to wchar_t* if string hasn't
> surrogates (this is true in most cases).

From steve at pearwood.info  Tue Oct  8 15:02:08 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 9 Oct 2013 00:02:08 +1100
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
Message-ID: <20131008130208.GX7989@ando>

On Tue, Oct 08, 2013 at 01:58:20PM +0200, Masklinn wrote:
> 
> On 2013-10-08, at 13:43 , Serhiy Storchaka wrote:
> 
> > 08.10.13 14:38, Masklinn ???????(??):
> >> On 2013-10-08, at 13:17 , Serhiy Storchaka wrote:
> >> 
> >>> Here is an idea about adding a mark to PyUnicode object which 
> >>> allows fast answer to the question if a string has surrogate code. 
> >>> This mark has one of three possible states:
[...]
> >> Isn't that redundant with the kind under shortest form representation?
> > 
> > No, it isn't redundant. '\udc80' is UCS2 string with surrogate code, and '\udc80\U00010000' is UCS4 string with surrogate code.
> 
> I don't know the details of the flexible string representation, but I
> believed the names fit what was actually in memory. UCS2 does not
> have surrogate pairs, thus surrogate codes make no sense in UCS2,
> they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not
> codepoints, they have no reason to appear in either UCS2 or UCS4
> outside of encoding errors.

I welcome correction, but I think you're mistaken. Python 3.3 strings 
don't have surrogate *pairs*, but they can contain surrogate *code 
points*. Unicode states:

"Isolated surrogate code points have no interpretation; consequently, no 
character code charts or names lists are provided for this range."

http://www.unicode.org/charts/PDF/UDC00.pdf
http://www.unicode.org/charts/PDF/UD800.pdf
 
So technically surrogates are "non-characters". That doesn't mean they 
are forbidden though; you can certainly create them, and encode them to 
UTF-16 and -32:

py> surr = '\udc80'
py> import unicodedata as ud
py> ud.category(surr)
'Cs'
py> surr.encode('utf-16')
b'\xff\xfe\x80\xdc'
py> surr.encode('utf-32')
b'\xff\xfe\x00\x00\x80\xdc\x00\x00'


However, you cannot encode single surrogates to UTF-8:

py> surr.encode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in 
position 0: surrogates not allowed

as per the standard:

http://www.unicode.org/faq/utf_bom.html#utf8-5

I *think* you are supposed to be able to encode surrogate *pairs* to 
UTF-8, if I'm reading the FAQ correctly, but it seems Python 3.3 doesn't 
support that. In any case, it is certainly legal to have Unicode strings 
containing non-characters, including surrogates, and you can encode them 
to UTF-16 and -32.

However, it looks like surrogates won't round trip in UTF-16, but they 
will in UTF-32:

py> surr.encode('utf-16').decode('utf-16') == surr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 2-3: 
unexpected end of data
py> surr.encode('utf-32').decode('utf-32') == surr
True



So... I'm not sure why this will be useful. Presumably Unicode strings 
containing surrogate code points will be rare, and you can't encode them 
to UTF-8 at all, and you can't round trip them from UTF-16.



-- 
Steven

From stephen at xemacs.org  Tue Oct  8 15:31:07 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 08 Oct 2013 22:31:07 +0900
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
Message-ID: <87vc17khyc.fsf@uwakimon.sk.tsukuba.ac.jp>

Masklinn writes:

 > I don't know the details of the flexible string representation, but I
 > believed the names fit what was actually in memory. UCS2 does not
 > have surrogate pairs, thus surrogate codes make no sense in UCS2,
 > they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not
 > codepoints, they have no reason to appear in either UCS2 or UCS4
 > outside of encoding errors.

True, but Python doesn't actually use UCS2 or UCS4 internally.  It
uses UCS2 or UCS4 plus a row of codes from the surrogate area to
represent undecodable bytes.  This feature is optional (enabled by
using the appropriate error= setting in the codec), but I don't
suppose it's going to go away.


From masklinn at masklinn.net  Tue Oct  8 15:48:18 2013
From: masklinn at masklinn.net (Masklinn)
Date: Tue, 8 Oct 2013 15:48:18 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131008130208.GX7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
Message-ID: <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>

On 2013-10-08, at 15:02 , Steven D'Aprano wrote:

[snipped early part as any response would be superseded by or redundant
with the stuff below]

> However, you cannot encode single surrogates to UTF-8:
> 
> py> surr.encode('utf-8')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in 
> position 0: surrogates not allowed
> 
> as per the standard:
> 
> http://www.unicode.org/faq/utf_bom.html#utf8-5
> 
> I *think* you are supposed to be able to encode surrogate *pairs* to 
> UTF-8, if I'm reading the FAQ correctly

I'm reading the opposite, from http://www.unicode.org/faq/utf_bom.html#utf8-4:

> there is a widespread practice of generating pairs of three byte
> sequences in older software, especially software which pre-dates the
> introduction of UTF-16 or that is interoperating with UTF-16
> environments under particular constraints. Such an encoding is not
> conformant to UTF-8 as defined.

Pairs of 3-byte sequences would be encoding each surrogate directly to
UTF-8, whereas a single 4-byte sequence would be decoding the surrogate
pair to a codepoint and encoding that codepoint to UTF-8. My reading
of the FAQ makes the second interpretation the only valid one.

So you can't encode surrogates (either lone or paired) to UTF-8,
you can encode the codepoint encoded by a surrogate pair.

> In any case, it is certainly legal to have Unicode strings 
> containing non-characters, including surrogates, and you can encode them 
> to UTF-16 and ?32.

The UTF-32 section has similar note to UTF-8:
http://www.unicode.org/faq/utf_bom.html#utf32-7

> A: If an unpaired surrogate is encountered when converting ill-formed
> UTF-16 data, any conformant converter must treat this as an error. By
> representing such an unpaired surrogate on its own, the resulting UTF-32
> data stream would become ill-formed. While it faithfully reflects the
> nature of the input, Unicode conformance requires that encoding form
> conversion always results in valid data stream.

and the UTF-16 section points out:
http://www.unicode.org/faq/utf_bom.html#utf16-7

> Q: Are there any 16-bit values that are invalid?

> A: Unpaired surrogates are invalid in UTFs. These include any value in
> the range D80016 to DBFF16 not followed by a value in the range DC0016
> to DFFF16, or any value in the range DC0016 to DFFF16 not preceded by a
> value in the range D80016 to DBFF16.

As far as I can read the FAQ, it is always invalid to encode a
surrogate, surrogates are not to be considered codepoints (they're not
just noncharacters[0], noncharacters are codepoints), and a lone
surrogate in a UTF-16 stream means the stream is corrupted, which should
result in an error during transcoding to anything (unless some recovery
mode is used to replace corrupted characters by some mark during
decoding I guess).

> So... I'm not sure why this will be useful. Presumably Unicode strings 
> containing surrogate code points will be rare

And they're a sign of corrupted stream.

The FAQ reads a bit strangely, I think because it's written from the
viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and
UTF-32 are transcoding from that. Which does not apply to CPython and
the FSR.

Parsing the FAQ with that viewpoint, I believe a CPython string (unicode)
must not contain surrogate codes: a surrogate pair should have been
decoded from UTF-16 to a codepoint (then identity-encoded to UCS4) and a
single surrogate should have been caught by the UTF-16 decoder and
should have triggered the error handler at that point. A surrogate code
in a CPython string means the string is corrupted[1].

Surrogates *may* appear in binary data, while building a UTF-16
bytestream by hand.

[0] since "noncharacter" has a well-defined meaning in unicode, and only
    applies to 66 codepoints, a much smaller range than surrogates:
    http://www.unicode.org/faq/private_use.html#noncharacters

[1] note that this hinges on my understanding of "UCS2" in FSR being
    actual UCS2, if it's UCS2-with-surrogates with a heuristic for
    switching between UCS2 and UCS4 depending on the number of
    surrogate pairs in the string it does not apply

From steve at pearwood.info  Tue Oct  8 16:20:09 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 9 Oct 2013 01:20:09 +1100
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
Message-ID: <20131008142009.GY7989@ando>

On Tue, Oct 08, 2013 at 03:48:18PM +0200, Masklinn wrote:
> On 2013-10-08, at 15:02 , Steven D'Aprano wrote:

> > py> surr.encode('utf-8')
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> > UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in 
> > position 0: surrogates not allowed
> > 
> > as per the standard:
> > 
> > http://www.unicode.org/faq/utf_bom.html#utf8-5
> > 
> > I *think* you are supposed to be able to encode surrogate *pairs* to 
> > UTF-8, if I'm reading the FAQ correctly
> 
> I'm reading the opposite, from http://www.unicode.org/faq/utf_bom.html#utf8-4:
> 
> > there is a widespread practice of generating pairs of three byte
> > sequences in older software, especially software which pre-dates the
> > introduction of UTF-16 or that is interoperating with UTF-16
> > environments under particular constraints. Such an encoding is not
> > conformant to UTF-8 as defined.
> 
> Pairs of 3-byte sequences would be encoding each surrogate directly to
> UTF-8, whereas a single 4-byte sequence would be decoding the surrogate
> pair to a codepoint and encoding that codepoint to UTF-8. My reading
> of the FAQ makes the second interpretation the only valid one.

It's not that clear to me. I fear the Unicode FAQs don't distinguish 
between Unicode strings and bytes well enough for my liking :(

But for the record, my interpretion is that if you have a pair of code 
points constisting of the same values as a valid surrogate pair, you 
should be able to encode to UTF-8. To give a concrete example:

Given:

c = '\N{LINEAR B SYLLABLE B038 E}'  # \U00010001
c.encode('utf-8')
=> b'\xf0\x90\x80\x81'

and:

c.encode('utf-16BE')  # encodes as a surrogate pair
=> b'\xd8\x00\xdc\x01'

then those same surrogates, taken as codepoints, should be encodable as 
UTF-8:

'\ud800\udc01'.encode('utf-8')
=> b'\xf0\x90\x80\x81'


I'd actually be disappointed if that were the case; I think that would 
be a poor design. But if that's what the Unicode standard demands, 
Python ought to support it.

But hopefully somebody will explain to me why my interpretation is wrong 
:-)



[...]
> The FAQ reads a bit strangely, I think because it's written from the
> viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and
> UTF-32 are transcoding from that. Which does not apply to CPython and
> the FSR.

Hmmm... well, that might explain it. If it's written by Java programmers 
for Java programmers, they may very well decide that having spent 20 
years trying to convince people that string != ASCII, they're now 
going to convince them that string == UTF-16 instead :/


> Parsing the FAQ with that viewpoint, I believe a CPython string (unicode)
> must not contain surrogate codes: a surrogate pair should have been
> decoded from UTF-16 to a codepoint (then identity-encoded to UCS4) and a
> single surrogate should have been caught by the UTF-16 decoder and
> should have triggered the error handler at that point. A surrogate code
> in a CPython string means the string is corrupted[1].

I think that interpretation is a bit strong. I think it would be fair to 
say that CPython strings may contain surrogates, but you can't encode 
them to bytes using the UTFs. Nor are there any byte sequences that can 
be decoded to surrogates using the UTFs.

This essentially means that you can only get surrogates in a string 
using (e.g.) chr() or \u escapes, and you can't then encode them to 
bytes using UTF encodings.




> Surrogates *may* appear in binary data, while building a UTF-16
> bytestream by hand.

But there you're talking about bytes, not byte strings. Byte strings can 
contain any bytes you like :-)



-- 
Steven

From turnbull at sk.tsukuba.ac.jp  Tue Oct  8 16:31:25 2013
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Tue, 08 Oct 2013 23:31:25 +0900
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
Message-ID: <87txgrkf5u.fsf@uwakimon.sk.tsukuba.ac.jp>

Masklinn writes:

 > The FAQ reads a bit strangely, I think because it's written from the
 > viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and
 > UTF-32 are transcoding from that. Which does not apply to CPython and
 > the FSR.

No, it's written from the viewpoint that it says *nothing* about
internal encodings, only about the encodings used in interchange of
textual data, and about certain aspects of the processes that may
receive and generate such data (eg, when data matches a Unicode
regular expression, or how bidirectional text should appear visually).

 > Parsing the FAQ with that viewpoint, I believe a CPython string (unicode)
 > must not contain surrogate codes:

No, it says no such thing.  All the Unicode Standard (and the FAQ)
says is that if Python generates output that purports to be text
encoded in Unicode, it may not contain surrogate codes except where
those codes are used according to UTF-16 to encode characters in
planes 2 to 17, and if it receives data alleged to be Unicode in some
transformation format, it must raise an error if it receives
surrogates other than a correctly formed surrogate pair in text known
to be encoded as UTF-16.

In fact (as I wrote before without proper citation), the internal
encoding of Python has been extended by PEP 383 to use a subset of the
surrogate space to represent undecodable bytes in an octet stream,
when the error handler is set to "surrogateescape".

Furthermore, there is nothing to stop a Python unicode from containing
any code unit (including both surrogates and other non-characters like
0xFFFF).  Checking of the rules you cite is done by codecs, at
encoding and decoding time.


From masklinn at masklinn.net  Tue Oct  8 16:40:58 2013
From: masklinn at masklinn.net (Masklinn)
Date: Tue, 8 Oct 2013 16:40:58 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131008142009.GY7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
Message-ID: <B71D17F2-CC29-4C6C-83B6-8F78C7D52498@masklinn.net>

On 2013-10-08, at 16:20 , Steven D'Aprano wrote
> I'd actually be disappointed if that were the case; I think that would 
> be a poor design. But if that's what the Unicode standard demands, 
> Python ought to support it.

That would be really weird, it'd mean an *encoder* has to translate a
surrogate pair into the actual codepoint in some sort of weird
UTF-specific normalization pass.

> But hopefully somebody will explain to me why my interpretation is wrong 
> :-)
> 
> [...]
>> The FAQ reads a bit strangely, I think because it's written from the
>> viewpoint that the "internal encoding" will be UTF-16, and UTF-8 and
>> UTF-32 are transcoding from that. Which does not apply to CPython and
>> the FSR.
> 
> Hmmm... well, that might explain it. If it's written by Java programmers 
> for Java programmers, they may very well decide that having spent 20 
> years trying to convince people that string != ASCII, they're now 
> going to convince them that string == UTF-16 instead :/

To be fair, it's not just java programmers, IIRC ICU uses UTF-16 as the
internal encoding.

>> Parsing the FAQ with that viewpoint, I believe a CPython string (unicode)
>> must not contain surrogate codes: a surrogate pair should have been
>> decoded from UTF-16 to a codepoint (then identity-encoded to UCS4) and a
>> single surrogate should have been caught by the UTF-16 decoder and
>> should have triggered the error handler at that point. A surrogate code
>> in a CPython string means the string is corrupted[1].
> 
> I think that interpretation is a bit strong. I think it would be fair to 
> say that CPython strings may contain surrogates, but you can't encode 
> them to bytes using the UTFs. Nor are there any byte sequences that can 
> be decoded to surrogates using the UTFs.
> 
> This essentially means that you can only get surrogates in a string 
> using (e.g.) chr() or \u escapes, and you can't then encode them to 
> bytes using UTF encodings.
> 
>> Surrogates *may* appear in binary data, while building a UTF-16
>> bytestream by hand.
> 
> But there you're talking about bytes, not byte strings. Byte strings can 
> contain any bytes you like :-)

Yes, that's basically what I mean: I think surrogates only make sense
in a bytestream, not in a unicode stream.

Although I did not remember/was not aware of PEP 383 (thank you Stephen)
which makes the Unicode spec irrelevant to what Python string contains.


On 2013-10-08, at 16:31 , Stephen J. Turnbull wrote:
> Furthermore, there is nothing to stop a Python unicode from containing
> any code unit (including both surrogates and other non-characters like
> 0xFFFF).  Checking of the rules you cite is done by codecs, at
> encoding and decoding time.

noncharacters are a very different case for what it's worth, their own
FAQ clearly notes that they are valid full-fledged codepoints and must
be encoded and preserved by UTFs:
 http://www.unicode.org/faq/private_use.html#nonchar7

From random832 at fastmail.us  Tue Oct  8 17:27:52 2013
From: random832 at fastmail.us (random832 at fastmail.us)
Date: Tue, 08 Oct 2013 11:27:52 -0400
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
Message-ID: <1381246072.12709.31490813.0B5674DF@webmail.messagingengine.com>

On Tue, Oct 8, 2013, at 7:58, Masklinn wrote:
> I don't know the details of the flexible string representation, but I
> believed the names fit what was actually in memory. UCS2 does not
> have surrogate pairs, thus surrogate codes make no sense in UCS2,
> they're a UTF-16 concept. Likewise for UCS4. Surrogate codes are not
> codepoints, they have no reason to appear in either UCS2 or UCS4
> outside of encoding errors.

They can also occur due to slicing a ctypes unicode buffer, due to PEP
383, or due to native UTF-16 filenames that contain invalid surrogates.
The latter two also create situations where you need to generate them.

From storchaka at gmail.com  Tue Oct  8 17:55:25 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 08 Oct 2013 18:55:25 +0300
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131008130208.GX7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
Message-ID: <l31aue$4vb$1@ger.gmane.org>

08.10.13 16:02, Steven D'Aprano ???????(??):
> So... I'm not sure why this will be useful.

This is a bug. http://bugs.python.org/issue12892


From storchaka at gmail.com  Tue Oct  8 18:16:57 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 08 Oct 2013 19:16:57 +0300
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <5253F348.3010204@egenix.com>
References: <l30pl0$apd$1@ger.gmane.org> <5253F348.3010204@egenix.com>
Message-ID: <l31b5g$4vb$2@ger.gmane.org>

08.10.13 14:58, M.-A. Lemburg ???????(??):
> I guess you could use one bit from the kind structure
> for that:

The kind of string should be equal to the size of character unit. This 
assumption is used in a lot of code.



From storchaka at gmail.com  Tue Oct  8 18:21:57 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 08 Oct 2013 19:21:57 +0300
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <CAMpsgwZCZx2UJ2posN0ebBw0BX1OsBmq88K2JHCXX-_i2+b2PA@mail.gmail.com>
References: <l30pl0$apd$1@ger.gmane.org>
 <CAMpsgwZCZx2UJ2posN0ebBw0BX1OsBmq88K2JHCXX-_i2+b2PA@mail.gmail.com>
Message-ID: <l31bev$dce$1@ger.gmane.org>

08.10.13 15:23, Victor Stinner ???????(??):
> I like the idea. I prefer to add another flag (1 bit), instead of
> having a complex with 4 different values.

We need at least 3-states value: yes, no, may be. But combining with 
is_ascii flag we need only one additional bit. I think that it shouldn't 
be more complex.

> Your idea looks specific to the PEP 393, so I prefer to keep the flag
> private. Otherwise it would be hard for other implementations of
> Python to implement the function getting the flag value.

Yes, of course.



From mal at egenix.com  Tue Oct  8 18:28:51 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 08 Oct 2013 18:28:51 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <l31b5g$4vb$2@ger.gmane.org>
References: <l30pl0$apd$1@ger.gmane.org> <5253F348.3010204@egenix.com>
 <l31b5g$4vb$2@ger.gmane.org>
Message-ID: <525432C3.3070905@egenix.com>

On 08.10.2013 18:16, Serhiy Storchaka wrote:
> 08.10.13 14:58, M.-A. Lemburg ???????(??):
>> I guess you could use one bit from the kind structure
>> for that:
> 
> The kind of string should be equal to the size of character unit. This assumption is used in a lot
> of code.

Ok, then just add the flag to the end of the list... we'd still
have at least 7 bits left on most platforms, IICC.

PS: I guess this use of kind should be documented clearly somewhere.
The unicodeobject.h file only hints at this and for
PyUnicode_WCHAR_KIND this interpretation cannot be used.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 08 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-10-14: PyCon DE 2013, Cologne, Germany ...             6 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/

From bruce at leapyear.org  Tue Oct  8 22:37:54 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Tue, 8 Oct 2013 13:37:54 -0700
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131008142009.GY7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
Message-ID: <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>

On Tue, Oct 8, 2013 at 7:20 AM, Steven D'Aprano <steve at pearwood.info> wrote:

> Given:
>
> c = '\N{LINEAR B SYLLABLE B038 E}'  # \U00010001
> c.encode('utf-8')
> => b'\xf0\x90\x80\x81'
>
> and:
>
> c.encode('utf-16BE')  # encodes as a surrogate pair
> => b'\xd8\x00\xdc\x01'
>
> then those same surrogates, taken as codepoints, should be encodable as
> UTF-8:
>
> '\ud800\udc01'.encode('utf-8')
> => b'\xf0\x90\x80\x81'
>
>
> I'd actually be disappointed if that were the case; I think that would
> be a poor design. But if that's what the Unicode standard demands,
> Python ought to support it.
>

The FAQ is explicit that this is wrong: "The definition of UTF-8 requires
that supplementary characters (those using surrogate pairs in UTF-16) be
encoded with a single four byte sequence."
http://www.unicode.org/faq/utf_bom.html#utf8-4

It goes on to say that there is a widespread practice of doing it anyway in
older software. Therefore, it might be acceptable to accept these
mis-encoded characters when *decoding* but they should never be generated
when *encoding*. I'd prefer not to have that on by default given the
history of overlong UTF-8 bugs (e.g., see
http://blogs.msdn.com/b/michael_howard/archive/2008/08/22/overlong-utf-8-escapes-bite.aspx).
Essentially if different decoders follow different rules, then you can
sometimes sneak stuff through the permissive decoders.

Notwithstanding that, there is a different unicode encoding CESU-8 which
does the opposite: it always encodes those characters requiring surrogate
pairs as 6 bytes consisting of two UTF-8-style encodings of the individual
surrogate codepoints. Python doesn't support this and the request to
support it was rejected: http://bugs.python.org/issue12742


--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131008/dfe44a37/attachment-0001.html>

From greg.ewing at canterbury.ac.nz  Wed Oct  9 00:49:29 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 09 Oct 2013 11:49:29 +1300
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
 <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
Message-ID: <52548BF9.4070802@canterbury.ac.nz>

Bruce Leban wrote:
> The FAQ is explicit that this is wrong: "The definition of UTF-8 
> requires that supplementary characters (those using surrogate pairs in 
> UTF-16) be encoded with a single four byte 
> sequence." http://www.unicode.org/faq/utf_bom.html#utf8-4

Python's internal string representation is not UTF-16, though,
so this doesn't apply directly.

Seems to me it hinges on whether a pair of surrogate code
points appearing in a Python string are meant to represent
a single character or not. I would say not, because otherwise
they would have been stored as a single code unit.

-- 
Greg

From steve at pearwood.info  Wed Oct  9 02:55:07 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 9 Oct 2013 11:55:07 +1100
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
 <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
Message-ID: <20131009005507.GB7989@ando>

On Tue, Oct 08, 2013 at 01:37:54PM -0700, Bruce Leban wrote:
> On Tue, Oct 8, 2013 at 7:20 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > Given:
> >
> > c = '\N{LINEAR B SYLLABLE B038 E}'  # \U00010001
> > c.encode('utf-8')
> > => b'\xf0\x90\x80\x81'
> >
> > and:
> >
> > c.encode('utf-16BE')  # encodes as a surrogate pair
> > => b'\xd8\x00\xdc\x01'
> >
> > then those same surrogates, taken as codepoints, should be encodable as
> > UTF-8:
> >
> > '\ud800\udc01'.encode('utf-8')
> > => b'\xf0\x90\x80\x81'
> >
> >
> > I'd actually be disappointed if that were the case; I think that would
> > be a poor design. But if that's what the Unicode standard demands,
> > Python ought to support it.
> >
> 
> The FAQ is explicit that this is wrong: "The definition of UTF-8 requires
> that supplementary characters (those using surrogate pairs in UTF-16) be
> encoded with a single four byte sequence."
> http://www.unicode.org/faq/utf_bom.html#utf8-4

And if you count the number of bytes, you will see four of them:

'\ud800\udc01'.encode('utf-8')
=> b'\xf0' b'\x90' b'\x80' b'\x81'

I stress that Python 3.3 doesn't actually do this, but my reading of the 
FAQ suggests that it should.

The question isn't what UTF-8 should do with supplmentary characters 
(those outside the BMP). That is well-defined, and Python 3.3 gets it 
right. The question is what it should do with pairs of surrogates. 
Ill-formed surrogates are rightly illegal when encoding to UTF-8:

# a lone surrogate is illegal
'\ud800'.encode('utf-8') must be treated as an error

# two high surrogates, or two low surrogates
'\udc01\udc01'.encode('utf-8') must be treated as an error
'\ud800\ud800'.encode('utf-8') must be treated as an error

# if they're in the wrong order
'\udc01\ud800'.encode('utf-8') must be treated as an error


The only thing that I'm not sure is how to deal with *valid* 
pairs of surrogates:

'\ud800\udc01'.encode('utf-8') should do what?

I personally would hope that this too should raise, which is Python's 
current behaviour, but my reading of the FAQs is that it should be 
treated as if there were an implicit UTF-16 conversion. (I hope I'm 
wrong!) That is:

1) treat the sequence of code points as if it were a sequence of two 
16-bit values b'\xd8\x00' b'\xdc\x01'

2) implicitly decode it using UTF-16 to get U+10001

3) encode U+10001 using UTF-8 to get b'\xf0\x90\x80\x81'

That would be (in my opinion) *horrible*, but that's my reading of the 
Unicode FAQ. The question asks:

"How do I convert a UTF-16 surrogate pair such as <D800 DC00> to UTF-8?"

and the answer seems to be:

"The definition of UTF-8 requires that supplementary characters (those 
using surrogate pairs in UTF-16) be encoded with a single four byte 
sequence."

which doesn't actually answer the question (the question is about 
SURROGATE PAIRS, the answer is about SUPPLEMENTARY CHARACTERS) but 
suggests the above horrible interpretation.

What I'm hoping for is a definite source that explains what the UTF-8 
encoder is supposed to do with a Unicode string containing surrogates.

(And presumably the other UTF encoders as well, although I haven't tried 
thinking about them yet.)



> It goes on to say that there is a widespread practice of doing it anyway in
> older software. Therefore, it might be acceptable to accept these
> mis-encoded characters when *decoding* but they should never be generated
> when *encoding*. 

They are talking about the practice of generating six bytes, two 
three-byte sequences. You should notice that I'm not generating six 
bytes anywhere.



-- 
Steven

From stephen at xemacs.org  Wed Oct  9 04:03:46 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 09 Oct 2013 11:03:46 +0900
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <52548BF9.4070802@canterbury.ac.nz>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
 <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
 <52548BF9.4070802@canterbury.ac.nz>
Message-ID: <87eh7vjj3x.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:
 > Bruce Leban wrote:
 > > The FAQ is explicit that this is wrong: "The definition of UTF-8 
 > > requires that supplementary characters (those using surrogate pairs in 
 > > UTF-16) be encoded with a single four byte 
 > > sequence." http://www.unicode.org/faq/utf_bom.html#utf8-4
 > 
 > Python's internal string representation is not UTF-16, though,
 > so this doesn't apply directly.

It applies directly to Steven's examples, since they use .encode() and
.decode().

 > Seems to me it hinges on whether a pair of surrogate code
 > points appearing in a Python string are meant to represent
 > a single character or not.

Only (a subset of) low surrogates is valid in a Python string, so a
pair can't possibly respresent a supplementary character in UTF-16
encoding.


From tjreedy at udel.edu  Wed Oct  9 04:43:54 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 08 Oct 2013 22:43:54 -0400
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131009005507.GB7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
 <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
 <20131009005507.GB7989@ando>
Message-ID: <l32ft3$v97$1@ger.gmane.org>

On 10/8/2013 8:55 PM, Steven D'Aprano wrote:

> '\ud800\udc01'.encode('utf-8')
> => b'\xf0' b'\x90' b'\x80' b'\x81'
>
> I stress that Python 3.3 doesn't actually do this, but my reading of the
> FAQ suggests that it should.

And I already explained on python-list why that reading is wrong; 
transcoding a utf-16 string (sequence of 2-byte words, subject to 
validity rules) is different from encoding unicode text (character 
sequence, and surrogates are not characters). A utf-16 to utf-8 
transcoder should (must) do the above, but in 3.3+, the utf-8 codec is 
no longer the utf-16 trancoder that it effectively was for narrow builds.

Each utf form defines a one to one mapping between unicode texts and 
valid code unit sequences. (Unicode Standard, Chapter 3, definition 
D79.) Having both '\U00010001' and '\ud800\udc01' map to 
b'\xf0\x90\x80\x81' would violate that important property. 
'\ud800\udc01' represents a character in utf-16 but not in python's 
flexible string representation. The latter uses one code unit (of 
variable size per string) per character, instead of a variable number of 
code units (of one size for all strings) per character.

Because machines have not conceptual, visual, or aural memory, but only 
byte memory, they must re-encode abstract characters to bytes to 
remember them. In pre 3.3 narrow builds, where utf-16 was used 
internally, decoding and encoding amounted to transcoding bytes 
encodings into the utf-16 encoding, and vice versa. So utf-8 
b'\xf0\x90\x80\x81' and utf-16 '\ud800\udc01' were mapped into each 
other. Whether the mapping was done directly or indirectly, via the 
character codepoint value, did not matter to the user.

In any case FSR no longer uses multiple-code-unit encodings internally, 
and '\ud800\udc01', even though allowed for practical reasons, does not 
represent and is not the same as '\U00010001'. The proposed 
'has_surrogates' flag amounts to an 'not strictly valid' flag. Only the 
FSR implementors can decide if it is worth the trouble.

-- 
Terry Jan Reedy


From stephen at xemacs.org  Wed Oct  9 06:29:04 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 09 Oct 2013 13:29:04 +0900
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131009005507.GB7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
 <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
 <20131009005507.GB7989@ando>
Message-ID: <87a9ijjcdr.fsf@uwakimon.sk.tsukuba.ac.jp>

Steven D'Aprano writes:

 > What I'm hoping for is a definite source that explains what the UTF-8 
 > encoder is supposed to do with a Unicode string containing
 > surrogates.

According to PEP 383, which provides a special mechanism for
roundtripping input that claims to be a particular encoding but does
not conform to that encoding, when encoding to UTF-8, if the errors=
parameter *is* surrogateescape *and* the value is in the first row of
the low surrogate range, it is masked by 0xff and emitted as a single
byte.

In all other cases of surrogates, it should raise an error.  A
conforming Unicode codec must not emit UTF-8 which would decode to a
surrogate.  These cases can occur in valid Python programs because
chr() is unconstrained (for example).

On input, Unicode conformance means that when using the
surrogateescape handler, an alleged UTF-8 stream containing a 6-byte
sequence that would algorithmically decode to a surrogate pair should
be represented internally as a sequence of 6 surrogates from the first
row of the low surrogate range.  If the surrogateescape handler is not
in use, it should raise an error.

Sorry about not testing actual behavior, gotta run to a meeting.

I forget what PEP 383 says about other Unicode codecs.

From bruce at leapyear.org  Wed Oct  9 04:09:11 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Tue, 8 Oct 2013 19:09:11 -0700
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <20131009005507.GB7989@ando>
References: <l30pl0$apd$1@ger.gmane.org>
 <70EDEE78-A85F-4558-A940-32E72DAC8F2C@masklinn.net>
 <l30r5h$ssp$1@ger.gmane.org>
 <D03B546D-34BA-410E-B8D6-5F1917ACE0BB@masklinn.net>
 <20131008130208.GX7989@ando>
 <A2E1DB4A-703E-4AAE-8D70-C1E4621423A5@masklinn.net>
 <20131008142009.GY7989@ando>
 <CAGu0Anvmb8O73p4a5M9Kj6PsWCA+q-u-UxTgd7S8yO6iav4XeQ@mail.gmail.com>
 <20131009005507.GB7989@ando>
Message-ID: <CAGu0AnsgTphReLAV0ZLUKykqw1pWTXVc0XM6o8LmRuga1BddWA@mail.gmail.com>

Sorry. I don't think what I said contributed to the conversation very well.
Let me try again.

On Tue, Oct 8, 2013 at 5:55 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> On Tue, Oct 08, 2013 at 01:37:54PM -0700, Bruce Leban wrote:
>
> The question isn't what UTF-8 should do with supplmentary characters
> (those outside the BMP). That is well-defined, and Python 3.3 gets it
> right. The question is what it should do with pairs of surrogates.
> Ill-formed surrogates are rightly illegal when encoding to UTF-8:
>
> The only thing that I'm not sure is how to deal with *valid*
> pairs of surrogates:
>
> '\ud800\udc01'.encode('utf-8') should do what?
>
> I don't think that's valid. While it is a sequence of Unicode *codepoints
*(Python definition of unicode string) it is not a sequence of Unicode *
characters*. Arguably, Python should insist that a Unicode string be a
sequence of Unicode characters and reject '\ud800\udc01' at compile time
just as it does '\U01010101' as those are all not valid Unicode characters.
However, I concede that is unlikely to happen.

Here's how I read the FAQ. Most of this FAQ is written in terms of
converting one representation to another. Python strings are not one of
those representations.

A *Unicode transformation format* (UTF) is an algorithmic mapping from
every Unicode code point (except surrogate code points) to a unique byte
sequence.
http://www.unicode.org/faq/utf_bom.html#gen2


To convert UTF-X to UTF-Y, you convert the UTF-X to a sequence of
characters and then convert that to UTF-Y. Note that this excludes
surrogate code points -- they are not representable in the sequence of code
points that a UTF defines.

The definition of UTF-32 says:

Any Unicode character can be represented as a single 32-bit unit in UTF-32.
This single 4 code unit corresponds to the Unicode scalar value, which is
the abstract number associated with a Unicode character.
http://www.unicode.org/faq/utf_bom.html#utf32-1


Thus a surrogate codepoint is NOT allowed in UTF-32 as it is not a
character and if it is encountered it should be treated as an error.

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131008/065fa2af/attachment.html>

From victor.stinner at gmail.com  Fri Oct 11 14:12:37 2013
From: victor.stinner at gmail.com (Victor Stinner)
Date: Fri, 11 Oct 2013 14:12:37 +0200
Subject: [Python-ideas] Add "has_surrogates" flags to string object
In-Reply-To: <l30pl0$apd$1@ger.gmane.org>
References: <l30pl0$apd$1@ger.gmane.org>
Message-ID: <CAMpsgwZgmHjKKPv8+4r0vU=EGqY2h=eVWqaukTV=r5oiRuc4ZQ@mail.gmail.com>

2013/10/8 Serhiy Storchaka <storchaka at gmail.com>:
> Here is an idea about adding a mark to PyUnicode object which allows fast
> answer to the question if a string has surrogate code. This mark has one of
> three possible states:
>
> * String doesn't contain surrogates.
> * String contains surrogates.
> * It is still unknown.
>
> We can combine this with "is_ascii" flag in 2-bit value:
>
> * String is ASCII-only (and doesn't contain surrogates).
> * String is not ASCII-only and doesn't contain surrogates.
> * String is not ASCII-only and contains surrogates.
> * String is not ASCII-only and it is still unknown if it contains surrogate.
>
> By default a string is created in "unknown" state (if it is UCS2 or UCS4).
> After first request it can be switched to "has surrogates" or "hasn't
> surrogates". State of the result of concatenating or slicing can be
> determined from states of input strings.
>
> This will allow faster UTF-16 and UTF-32 encoding (and perhaps even a little
> faster UTF-8 encoding) and converting to wchar_t* if string hasn't
> surrogates (this is true in most cases).

Knowing if a string contains any surrogate character would also
speedup marshal and pickle modules:
http://bugs.python.org/issue19219#msg199465

Victor

From mistersheik at gmail.com  Fri Oct 11 20:29:43 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 11:29:43 -0700 (PDT)
Subject: [Python-ideas] An exhaust() function for iterators
In-Reply-To: <l2a3a8$7p8$1@ger.gmane.org>
References: <CANW+cAXMjP2KrPuF21CFQy16iaaHzOs-GM=jkPyPxODpOWpmsA@mail.gmail.com>
 <l2a3a8$7p8$1@ger.gmane.org>
Message-ID: <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com>

This was also my thought.

On Sunday, September 29, 2013 4:42:20 PM UTC-4, Serhiy Storchaka wrote:
>
> 29.09.13 07:06, Clay Sweetser ???????(??):
> > I would like to propose that this function, or one very similar to it,
> > be added to the standard library, either in the itertools module, or
> > the standard namespace.
> > If nothing else, doing so would at least give a single *obvious* way
> > to exhaust an iterator, instead of the several miscellaneous methods
> > available.
>
> I prefer optimize the for loop so that it will be most efficient way (it 
> is already most obvious way).
>
> _______________________________________________
> Python-ideas mailing list
> Python... at python.org <javascript:>
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/4033b0b9/attachment.html>

From mertz at gnosis.cx  Fri Oct 11 20:51:20 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 11:51:20 -0700
Subject: [Python-ideas] An exhaust() function for iterators
In-Reply-To: <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com>
References: <CANW+cAXMjP2KrPuF21CFQy16iaaHzOs-GM=jkPyPxODpOWpmsA@mail.gmail.com>
 <l2a3a8$7p8$1@ger.gmane.org>
 <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com>
Message-ID: <CAEbHw4ZDSaRpf4Yawt1KBsGmdzO+temasPUHhYqGtZ8zhJUoaQ@mail.gmail.com>

It is hard to imagine that doing this:

  for _ in side_effect_iter: pass

Could EVER realistically spend a significant share of its time in the loop
code.  Side effects almost surely need to do something that vastly
overpowers the cost of the loop itself (maybe some I/O, maybe some
computation), or there's no point in using a side-effect iterator.

I know you *could* technically write:

  def side_effect_iter(N, obj):
      for n in range(N):
          obj.val = n
          yield True

And probably something else whose only side effect was changing some value
that doesn't need real computation.  But surely writing that and exhausting
that iterator is NEVER the best way to code such a thing.

On the other hand, a more realistic one like this:

  def side_effect_iter(N):
      for n in range(N):
          val = complex_computation(n)
          write_to_slow_disk(val)
          yield True

Is going to take a long time in each iteration, and there's no reason to
care that the loop isn't absolutely optimal speed.



On Fri, Oct 11, 2013 at 11:29 AM, Neil Girdhar <mistersheik at gmail.com>wrote:

> This was also my thought.
>
>
> On Sunday, September 29, 2013 4:42:20 PM UTC-4, Serhiy Storchaka wrote:
>
>> 29.09.13 07:06, Clay Sweetser ???????(??):
>> > I would like to propose that this function, or one very similar to it,
>> > be added to the standard library, either in the itertools module, or
>> > the standard namespace.
>> > If nothing else, doing so would at least give a single *obvious* way
>> > to exhaust an iterator, instead of the several miscellaneous methods
>> > available.
>>
>> I prefer optimize the for loop so that it will be most efficient way (it
>> is already most obvious way).
>>
>> ______________________________**_________________
>> Python-ideas mailing list
>> Python... at python.org
>> https://mail.python.org/**mailman/listinfo/python-ideas<https://mail.python.org/mailman/listinfo/python-ideas>
>>
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/127acc9f/attachment.html>

From storchaka at gmail.com  Fri Oct 11 21:02:42 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 11 Oct 2013 22:02:42 +0300
Subject: [Python-ideas] An exhaust() function for iterators
In-Reply-To: <CAEbHw4ZDSaRpf4Yawt1KBsGmdzO+temasPUHhYqGtZ8zhJUoaQ@mail.gmail.com>
References: <CANW+cAXMjP2KrPuF21CFQy16iaaHzOs-GM=jkPyPxODpOWpmsA@mail.gmail.com>
 <l2a3a8$7p8$1@ger.gmane.org>
 <5a7a21a5-bd7e-4bc7-a80f-e6d6154f0e13@googlegroups.com>
 <CAEbHw4ZDSaRpf4Yawt1KBsGmdzO+temasPUHhYqGtZ8zhJUoaQ@mail.gmail.com>
Message-ID: <l39i08$le8$1@ger.gmane.org>

11.10.13 21:51, David Mertz ???????(??):
> It is hard to imagine that doing this:
>
>    for _ in side_effect_iter: pass
>
> Could EVER realistically spend a significant share of its time in the
> loop code.

When I written a test for tee() (issue #13454) I needed very fast 
iterator exhausting. There were one or two other similar cases.



From mistersheik at gmail.com  Fri Oct 11 20:38:33 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 11:38:33 -0700 (PDT)
Subject: [Python-ideas] Extremely weird itertools.permutations
Message-ID: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>

"It is universally agreed that a list of n distinct symbols has n! 
permutations. However, when the symbols are not distinct, the most common 
convention, in mathematics and elsewhere, seems to be to count only 
distinct permutations." ? 
http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original.


Should we consider fixing itertools.permutations and to output only unique 
permutations (if possible, although I realize that would break code). It is 
completely non-obvious to have permutations returning duplicates. For a 
non-breaking compromise what about adding a flag?

Best,
Neil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/c27bd95d/attachment-0001.html>

From storchaka at gmail.com  Fri Oct 11 21:29:35 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 11 Oct 2013 22:29:35 +0300
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
Message-ID: <l39jil$6gd$1@ger.gmane.org>

11.10.13 21:38, Neil Girdhar ???????(??):
> Should we consider fixing itertools.permutations and to output only
> unique permutations (if possible, although I realize that would break
> code). It is completely non-obvious to have permutations returning
> duplicates. For a non-breaking compromise what about adding a flag?

I think this should be separated function.



From mertz at gnosis.cx  Fri Oct 11 22:02:11 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 13:02:11 -0700
Subject: [Python-ideas] Fwd:  Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
Message-ID: <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>

What would you like this hypothetical function to output here:

>>> from itertools import permutations
>>> from decimal import Decimal as D
>>> from fractions import Fraction as F
>>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>> list(permutations(items))

It's neither QUITE equality nor identity you are looking for, I think, in
nonredundant_permutation():

>> "aa" == "AA".lower(), "aa" is "AA".lower()
(True, False)
>>> "aa" == "a"+"a", "aa" is "a"+"a"
(True, True)
>>> D(3) == 3.0, D(3) is 3.0
(True, False)

On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:

> "It is universally agreed that a list of n distinct symbols has n!
> permutations. However, when the symbols are not distinct, the most common
> convention, in mathematics and elsewhere, seems to be to count only
> distinct permutations." ?
> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
> .
>
>
> Should we consider fixing itertools.permutations and to output only unique
> permutations (if possible, although I realize that would break code). It is
> completely non-obvious to have permutations returning duplicates. For a
> non-breaking compromise what about adding a flag?
>
> Best,
> Neil
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/53aedddd/attachment.html>

From abarnert at yahoo.com  Fri Oct 11 22:19:22 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 11 Oct 2013 13:19:22 -0700
Subject: [Python-ideas] Fwd:  Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
Message-ID: <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>

I think equality is perfectly reasonable here. The fact that {3.0, 3} only has one member seems like the obvious precedent to follow here.

Sent from a random iPhone

On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:

> What would you like this hypothetical function to output here:
> 
> >>> from itertools import permutations
> >>> from decimal import Decimal as D
> >>> from fractions import Fraction as F
> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
> >>> list(permutations(items))
> 
> It's neither QUITE equality nor identity you are looking for, I think, in nonredundant_permutation():
> 
> >> "aa" == "AA".lower(), "aa" is "AA".lower()
> (True, False)
> >>> "aa" == "a"+"a", "aa" is "a"+"a"
> (True, True)
> >>> D(3) == 3.0, D(3) is 3.0
> (True, False)
> 
> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com> wrote:
>> "It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." ? http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original.
>> 
>> 
>> Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates. For a non-breaking compromise what about adding a flag?
>> 
>> Best,
>> Neil
>> 
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
> 
> 
> 
> -- 
> Keeping medicines from the bloodstreams of the sick; food 
> from the bellies of the hungry; books from the hands of the 
> uneducated; technology from the underdeveloped; and putting 
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> 
> 
> 
> -- 
> Keeping medicines from the bloodstreams of the sick; food 
> from the bellies of the hungry; books from the hands of the 
> uneducated; technology from the underdeveloped; and putting 
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/1844ea2e/attachment.html>

From jon.brandvein at gmail.com  Fri Oct 11 23:19:38 2013
From: jon.brandvein at gmail.com (Jonathan Brandvein)
Date: Fri, 11 Oct 2013 17:19:38 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
Message-ID: <CAE+E=KBdjft9hrgMFQs4yN2CEsOc4=LUfu5YSU73UZdoJz6chg@mail.gmail.com>

I think it's fair to use {3.0, 3} as precedent. But note that transitivity
is not required by the __eq__() method. In cases of intransitive equality
(A == B == C but not A == C), I imagine the result should be ill-defined in
the same way that sorting is when the key function is inconsistent.

Jon


On Fri, Oct 11, 2013 at 4:19 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> I think equality is perfectly reasonable here. The fact that {3.0, 3} only
> has one member seems like the obvious precedent to follow here.
>
> Sent from a random iPhone
>
> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>
> What would you like this hypothetical function to output here:
>
> >>> from itertools import permutations
> >>> from decimal import Decimal as D
> >>> from fractions import Fraction as F
> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
> >>> list(permutations(items))
>
> It's neither QUITE equality nor identity you are looking for, I think, in
> nonredundant_permutation():
>
> >> "aa" == "AA".lower(), "aa" is "AA".lower()
> (True, False)
> >>> "aa" == "a"+"a", "aa" is "a"+"a"
> (True, True)
> >>> D(3) == 3.0, D(3) is 3.0
> (True, False)
>
> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>
>> "It is universally agreed that a list of n distinct symbols has n!
>> permutations. However, when the symbols are not distinct, the most common
>> convention, in mathematics and elsewhere, seems to be to count only
>> distinct permutations." ?
>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>> .
>>
>>
>> Should we consider fixing itertools.permutations and to output only
>> unique permutations (if possible, although I realize that would break
>> code). It is completely non-obvious to have permutations returning
>> duplicates. For a non-breaking compromise what about adding a flag?
>>
>> Best,
>> Neil
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/cdf9bb73/attachment-0001.html>

From python at mrabarnett.plus.com  Fri Oct 11 23:25:56 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 11 Oct 2013 22:25:56 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <l39jil$6gd$1@ger.gmane.org>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <l39jil$6gd$1@ger.gmane.org>
Message-ID: <52586CE4.9030002@mrabarnett.plus.com>

On 11/10/2013 20:29, Serhiy Storchaka wrote:
> 11.10.13 21:38, Neil Girdhar ???????(??):
>> Should we consider fixing itertools.permutations and to output only
>> unique permutations (if possible, although I realize that would break
>> code). It is completely non-obvious to have permutations returning
>> duplicates. For a non-breaking compromise what about adding a flag?
>
> I think this should be separated function.
>
+1



From mertz at gnosis.cx  Fri Oct 11 22:27:34 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 13:27:34 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
Message-ID: <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>

Andrew & Neil (or whoever):

Is this *really* what you want:

>>> from itertools import permutations
>>> def nonredundant_permutations(seq):
...     return list(set(permutations(seq)))
...
>>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
[(Fraction(3, 1), Decimal('3'), 3.0),
 (Fraction(3, 1), 3.0, Decimal('3')),
 (Decimal('3'), Fraction(3, 1), 3.0),
 (Decimal('3'), 3.0, Fraction(3, 1)),
 (3.0, Fraction(3, 1), Decimal('3')),
 (3.0, Decimal('3'), Fraction(3, 1))]

>>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
[(Fraction(3, 1), Decimal('3'), 3.0)]

It seems odd to me to want that.  On the other hand, I provide a one-line
implementation of the desired behavior if anyone wants it.  Moreover, I
don't think the runtime behavior of my one-liner is particularly costly...
maybe not the best possible, but the best big-O possible.



On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> I think equality is perfectly reasonable here. The fact that {3.0, 3} only
> has one member seems like the obvious precedent to follow here.
>
> Sent from a random iPhone
>
> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>
> What would you like this hypothetical function to output here:
>
> >>> from itertools import permutations
> >>> from decimal import Decimal as D
> >>> from fractions import Fraction as F
> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
> >>> list(permutations(items))
>
> It's neither QUITE equality nor identity you are looking for, I think, in
> nonredundant_permutation():
>
> >> "aa" == "AA".lower(), "aa" is "AA".lower()
> (True, False)
> >>> "aa" == "a"+"a", "aa" is "a"+"a"
> (True, True)
> >>> D(3) == 3.0, D(3) is 3.0
> (True, False)
>
> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>
>> "It is universally agreed that a list of n distinct symbols has n!
>> permutations. However, when the symbols are not distinct, the most common
>> convention, in mathematics and elsewhere, seems to be to count only
>> distinct permutations." ?
>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>> .
>>
>>
>> Should we consider fixing itertools.permutations and to output only
>> unique permutations (if possible, although I realize that would break
>> code). It is completely non-obvious to have permutations returning
>> duplicates. For a non-breaking compromise what about adding a flag?
>>
>> Best,
>> Neil
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/16f0a3ca/attachment.html>

From mistersheik at gmail.com  Fri Oct 11 23:35:41 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 17:35:41 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
Message-ID: <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>

> Moreover, I don't think the runtime behavior of my one-liner is
particularly costly?

It is *extremely* costly.  There can be n! permutations, so for even, say,
12 elements, you are looking at many gigabytes of memory needlessly used.
 One big motivator for itertools is not to have to do this.  I'm curious
how you would solve this problem:
https://www.kattis.com/problems/industrialspy  efficiently in Python.  I
did it by using a unique-ifying generator, but ideally this would not be
necessary.  Ideally, Python would do exactly what C++ does with
next_permutation.

Best,

Neil


On Fri, Oct 11, 2013 at 4:27 PM, David Mertz <mertz at gnosis.cx> wrote:

> Andrew & Neil (or whoever):
>
> Is this *really* what you want:
>
> >>> from itertools import permutations
> >>> def nonredundant_permutations(seq):
> ...     return list(set(permutations(seq)))
> ...
> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
> [(Fraction(3, 1), Decimal('3'), 3.0),
>  (Fraction(3, 1), 3.0, Decimal('3')),
>  (Decimal('3'), Fraction(3, 1), 3.0),
>  (Decimal('3'), 3.0, Fraction(3, 1)),
>  (3.0, Fraction(3, 1), Decimal('3')),
>  (3.0, Decimal('3'), Fraction(3, 1))]
>
> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
> [(Fraction(3, 1), Decimal('3'), 3.0)]
>
> It seems odd to me to want that.  On the other hand, I provide a one-line
> implementation of the desired behavior if anyone wants it.  Moreover, I
> don't think the runtime behavior of my one-liner is particularly costly...
> maybe not the best possible, but the best big-O possible.
>
>
>
> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>
>> I think equality is perfectly reasonable here. The fact that {3.0, 3}
>> only has one member seems like the obvious precedent to follow here.
>>
>> Sent from a random iPhone
>>
>> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>>
>> What would you like this hypothetical function to output here:
>>
>> >>> from itertools import permutations
>> >>> from decimal import Decimal as D
>> >>> from fractions import Fraction as F
>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>> >>> list(permutations(items))
>>
>> It's neither QUITE equality nor identity you are looking for, I think, in
>> nonredundant_permutation():
>>
>> >> "aa" == "AA".lower(), "aa" is "AA".lower()
>> (True, False)
>> >>> "aa" == "a"+"a", "aa" is "a"+"a"
>> (True, True)
>> >>> D(3) == 3.0, D(3) is 3.0
>> (True, False)
>>
>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>>
>>> "It is universally agreed that a list of n distinct symbols has n!
>>> permutations. However, when the symbols are not distinct, the most common
>>> convention, in mathematics and elsewhere, seems to be to count only
>>> distinct permutations." ?
>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>>> .
>>>
>>>
>>> Should we consider fixing itertools.permutations and to output only
>>> unique permutations (if possible, although I realize that would break
>>> code). It is completely non-obvious to have permutations returning
>>> duplicates. For a non-breaking compromise what about adding a flag?
>>>
>>> Best,
>>> Neil
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/a15a6378/attachment-0001.html>

From python at mrabarnett.plus.com  Fri Oct 11 23:38:41 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 11 Oct 2013 22:38:41 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
Message-ID: <52586FE1.8040803@mrabarnett.plus.com>

On 11/10/2013 21:27, David Mertz wrote:
> Andrew & Neil (or whoever):
>
> Is this *really* what you want:
>
>  >>> from itertools import permutations
>  >>> def nonredundant_permutations(seq):
> ...     return list(set(permutations(seq)))
> ...
>  >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
> [(Fraction(3, 1), Decimal('3'), 3.0),
>   (Fraction(3, 1), 3.0, Decimal('3')),
>   (Decimal('3'), Fraction(3, 1), 3.0),
>   (Decimal('3'), 3.0, Fraction(3, 1)),
>   (3.0, Fraction(3, 1), Decimal('3')),
>   (3.0, Decimal('3'), Fraction(3, 1))]
>
>  >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
> [(Fraction(3, 1), Decimal('3'), 3.0)]
>
> It seems odd to me to want that.  On the other hand, I provide a
> one-line implementation of the desired behavior if anyone wants it.
>   Moreover, I don't think the runtime behavior of my one-liner is
> particularly costly... maybe not the best possible, but the best big-O
> possible.
>
n! gets very big very fast, so that can be a very big set.

If you sort the original items first then it's much easier to yield
unique permutations without having to remember them. (Each would be >
than the previous one, although you might have to map them to orderable
keys if they're not orderable themselves, e.g. a mixture of integers
and strings.)
>
>
> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com
> <mailto:abarnert at yahoo.com>> wrote:
>
>     I think equality is perfectly reasonable here. The fact that {3.0,
>     3} only has one member seems like the obvious precedent to follow here.
>
>     Sent from a random iPhone
>
>     On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx
>     <mailto:mertz at gnosis.cx>> wrote:
>
>>     What would you like this hypothetical function to output here:
>>
>>     >>> from itertools import permutations
>>     >>> from decimal import Decimal as D
>>     >>> from fractions import Fraction as F
>>     >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>     >>> list(permutations(items))
>>
>>     It's neither QUITE equality nor identity you are looking for, I
>>     think, in nonredundant_permutation():
>>
>>     >> "aa" == "AA".lower(), "aa" is "AA".lower()
>>     (True, False)
>>     >>> "aa" == "a"+"a", "aa" is "a"+"a"
>>     (True, True)
>>     >>> D(3) == 3.0, D(3) is 3.0
>>     (True, False)
>>
>>     On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar
>>     <mistersheik at gmail.com <mailto:mistersheik at gmail.com>> wrote:
>>
>>         "It is universally agreed that a list of n distinct symbols
>>         has n! permutations. However, when the symbols are not
>>         distinct, the most common convention, in mathematics and
>>         elsewhere, seems to be to count only distinct permutations." ?
>>         http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original.
>>
>>
>>         Should we consider fixing itertools.permutations and to output
>>         only unique permutations (if possible, although I realize that
>>         would break code). It is completely non-obvious to have
>>         permutations returning duplicates. For a non-breaking
>>         compromise what about adding a flag?
>>


From mistersheik at gmail.com  Fri Oct 11 23:38:27 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 17:38:27 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
Message-ID: <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>

My code, which was the motivation for this suggestion:

import itertools as it
import math

def is_prime(n):
    for i in range(2, int(math.floor(math.sqrt(n))) + 1):
        if n % i == 0:
            return False
    return n >= 2

def unique(iterable):    # Should not be necessary in my opinion
    seen = set()
    for x in iterable:
        if x not in seen:
            seen.add(x)
            yield x

n = int(input())
for _ in range(n):
    x = input()
    print(sum(is_prime(int("".join(y)))
              for len_ in range(1, len(x) + 1)
              for y in unique(it.permutations(x, len_))
              if y[0] != '0'))



On Fri, Oct 11, 2013 at 5:35 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

> > Moreover, I don't think the runtime behavior of my one-liner is
> particularly costly?
>
> It is *extremely* costly.  There can be n! permutations, so for even, say,
> 12 elements, you are looking at many gigabytes of memory needlessly used.
>  One big motivator for itertools is not to have to do this.  I'm curious
> how you would solve this problem:
> https://www.kattis.com/problems/industrialspy  efficiently in Python.  I
> did it by using a unique-ifying generator, but ideally this would not be
> necessary.  Ideally, Python would do exactly what C++ does with
> next_permutation.
>
> Best,
>
> Neil
>
>
> On Fri, Oct 11, 2013 at 4:27 PM, David Mertz <mertz at gnosis.cx> wrote:
>
>> Andrew & Neil (or whoever):
>>
>> Is this *really* what you want:
>>
>> >>> from itertools import permutations
>> >>> def nonredundant_permutations(seq):
>> ...     return list(set(permutations(seq)))
>> ...
>> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
>> [(Fraction(3, 1), Decimal('3'), 3.0),
>>  (Fraction(3, 1), 3.0, Decimal('3')),
>>  (Decimal('3'), Fraction(3, 1), 3.0),
>>  (Decimal('3'), 3.0, Fraction(3, 1)),
>>  (3.0, Fraction(3, 1), Decimal('3')),
>>  (3.0, Decimal('3'), Fraction(3, 1))]
>>
>> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
>> [(Fraction(3, 1), Decimal('3'), 3.0)]
>>
>> It seems odd to me to want that.  On the other hand, I provide a one-line
>> implementation of the desired behavior if anyone wants it.  Moreover, I
>> don't think the runtime behavior of my one-liner is particularly costly...
>> maybe not the best possible, but the best big-O possible.
>>
>>
>>
>> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>>
>>> I think equality is perfectly reasonable here. The fact that {3.0, 3}
>>> only has one member seems like the obvious precedent to follow here.
>>>
>>> Sent from a random iPhone
>>>
>>> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>>>
>>> What would you like this hypothetical function to output here:
>>>
>>> >>> from itertools import permutations
>>> >>> from decimal import Decimal as D
>>> >>> from fractions import Fraction as F
>>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>> >>> list(permutations(items))
>>>
>>> It's neither QUITE equality nor identity you are looking for, I think,
>>> in nonredundant_permutation():
>>>
>>> >> "aa" == "AA".lower(), "aa" is "AA".lower()
>>> (True, False)
>>> >>> "aa" == "a"+"a", "aa" is "a"+"a"
>>> (True, True)
>>> >>> D(3) == 3.0, D(3) is 3.0
>>> (True, False)
>>>
>>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>>>
>>>> "It is universally agreed that a list of n distinct symbols has n!
>>>> permutations. However, when the symbols are not distinct, the most common
>>>> convention, in mathematics and elsewhere, seems to be to count only
>>>> distinct permutations." ?
>>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>>>> .
>>>>
>>>>
>>>> Should we consider fixing itertools.permutations and to output only
>>>> unique permutations (if possible, although I realize that would break
>>>> code). It is completely non-obvious to have permutations returning
>>>> duplicates. For a non-breaking compromise what about adding a flag?
>>>>
>>>> Best,
>>>> Neil
>>>>
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at python.org
>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>
>>>>
>>>
>>>
>>> --
>>> Keeping medicines from the bloodstreams of the sick; food
>>> from the bellies of the hungry; books from the hands of the
>>> uneducated; technology from the underdeveloped; and putting
>>> advocates of freedom in prisons.  Intellectual property is
>>> to the 21st century what the slave trade was to the 16th.
>>>
>>>
>>>
>>> --
>>> Keeping medicines from the bloodstreams of the sick; food
>>> from the bellies of the hungry; books from the hands of the
>>> uneducated; technology from the underdeveloped; and putting
>>> advocates of freedom in prisons.  Intellectual property is
>>> to the 21st century what the slave trade was to the 16th.
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "python-ideas" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> python-ideas+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/dba34ab5/attachment.html>

From mistersheik at gmail.com  Fri Oct 11 20:50:00 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 11:50:00 -0700 (PDT)
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
Message-ID: <26ae5ce9-6709-41a1-9dc2-6f8d5bc2f0bd@googlegroups.com>

Note that if permutations is made to return only unique permutations, the 
behaviour of defining unique elements by index can be recovered using:

([it[index] for index in indexes] for indexes in 
itertools.permutations(range(len(it))))

On Friday, October 11, 2013 2:38:33 PM UTC-4, Neil Girdhar wrote:
>
> "It is universally agreed that a list of n distinct symbols has n! 
> permutations. However, when the symbols are not distinct, the most common 
> convention, in mathematics and elsewhere, seems to be to count only 
> distinct permutations." ? 
> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
> .
>
>
> Should we consider fixing itertools.permutations and to output only unique 
> permutations (if possible, although I realize that would break code). It is 
> completely non-obvious to have permutations returning duplicates. For a 
> non-breaking compromise what about adding a flag?
>
> Best,
> Neil
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/3afa3f15/attachment-0001.html>

From mertz at gnosis.cx  Fri Oct 11 23:48:25 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 14:48:25 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
Message-ID: <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>

OK, you're right.  Just using set() has bad worst case memory costs.  I was
thinking of the case where there actually WERE lots of equalities, and
hence the resulting list would be much smaller than N!.  But of course
that's not general.  It takes more than one line, but here's an incremental
version:

def nonredundant_permutations(seq):
    seq = sorted(seq)
    last = None
    for perm in permutations(seq):
        if perm != last:
            yield perm
            last = perm


On Fri, Oct 11, 2013 at 2:35 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

> > Moreover, I don't think the runtime behavior of my one-liner is
> particularly costly?
>
> It is *extremely* costly.  There can be n! permutations, so for even, say,
> 12 elements, you are looking at many gigabytes of memory needlessly used.
>  One big motivator for itertools is not to have to do this.  I'm curious
> how you would solve this problem:
> https://www.kattis.com/problems/industrialspy  efficiently in Python.  I
> did it by using a unique-ifying generator, but ideally this would not be
> necessary.  Ideally, Python would do exactly what C++ does with
> next_permutation.
>
> Best,
>
> Neil
>
>
> On Fri, Oct 11, 2013 at 4:27 PM, David Mertz <mertz at gnosis.cx> wrote:
>
>> Andrew & Neil (or whoever):
>>
>> Is this *really* what you want:
>>
>>  >>> from itertools import permutations
>> >>> def nonredundant_permutations(seq):
>> ...     return list(set(permutations(seq)))
>> ...
>> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
>> [(Fraction(3, 1), Decimal('3'), 3.0),
>>  (Fraction(3, 1), 3.0, Decimal('3')),
>>  (Decimal('3'), Fraction(3, 1), 3.0),
>>  (Decimal('3'), 3.0, Fraction(3, 1)),
>>  (3.0, Fraction(3, 1), Decimal('3')),
>>  (3.0, Decimal('3'), Fraction(3, 1))]
>>
>> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
>> [(Fraction(3, 1), Decimal('3'), 3.0)]
>>
>> It seems odd to me to want that.  On the other hand, I provide a one-line
>> implementation of the desired behavior if anyone wants it.  Moreover, I
>> don't think the runtime behavior of my one-liner is particularly costly...
>> maybe not the best possible, but the best big-O possible.
>>
>>
>>
>> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>>
>>> I think equality is perfectly reasonable here. The fact that {3.0, 3}
>>> only has one member seems like the obvious precedent to follow here.
>>>
>>> Sent from a random iPhone
>>>
>>> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>>>
>>> What would you like this hypothetical function to output here:
>>>
>>> >>> from itertools import permutations
>>> >>> from decimal import Decimal as D
>>> >>> from fractions import Fraction as F
>>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>> >>> list(permutations(items))
>>>
>>> It's neither QUITE equality nor identity you are looking for, I think,
>>> in nonredundant_permutation():
>>>
>>> >> "aa" == "AA".lower(), "aa" is "AA".lower()
>>> (True, False)
>>> >>> "aa" == "a"+"a", "aa" is "a"+"a"
>>> (True, True)
>>> >>> D(3) == 3.0, D(3) is 3.0
>>> (True, False)
>>>
>>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>>>
>>>> "It is universally agreed that a list of n distinct symbols has n!
>>>> permutations. However, when the symbols are not distinct, the most common
>>>> convention, in mathematics and elsewhere, seems to be to count only
>>>> distinct permutations." ?
>>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>>>> .
>>>>
>>>>
>>>> Should we consider fixing itertools.permutations and to output only
>>>> unique permutations (if possible, although I realize that would break
>>>> code). It is completely non-obvious to have permutations returning
>>>> duplicates. For a non-breaking compromise what about adding a flag?
>>>>
>>>> Best,
>>>> Neil
>>>>
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at python.org
>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>
>>>>
>>>
>>>
>>> --
>>> Keeping medicines from the bloodstreams of the sick; food
>>> from the bellies of the hungry; books from the hands of the
>>> uneducated; technology from the underdeveloped; and putting
>>> advocates of freedom in prisons.  Intellectual property is
>>> to the 21st century what the slave trade was to the 16th.
>>>
>>>
>>>
>>> --
>>> Keeping medicines from the bloodstreams of the sick; food
>>> from the bellies of the hungry; books from the hands of the
>>> uneducated; technology from the underdeveloped; and putting
>>> advocates of freedom in prisons.  Intellectual property is
>>> to the 21st century what the slave trade was to the 16th.
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "python-ideas" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> python-ideas+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/574b1eaf/attachment.html>

From mistersheik at gmail.com  Fri Oct 11 23:51:06 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 17:51:06 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
Message-ID: <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>

Unfortunately, that doesn't quite work?

list("".join(x) for x in it.permutations('aaabb', 3))
['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba',
'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba',
'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab',
'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa',
'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa',
'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']


On Fri, Oct 11, 2013 at 5:48 PM, David Mertz <mertz at gnosis.cx> wrote:

> OK, you're right.  Just using set() has bad worst case memory costs.  I
> was thinking of the case where there actually WERE lots of equalities, and
> hence the resulting list would be much smaller than N!.  But of course
> that's not general.  It takes more than one line, but here's an incremental
> version:
>
> def nonredundant_permutations(seq):
>     seq = sorted(seq)
>     last = None
>     for perm in permutations(seq):
>         if perm != last:
>             yield perm
>             last = perm
>
>
> On Fri, Oct 11, 2013 at 2:35 PM, Neil Girdhar <mistersheik at gmail.com>wrote:
>
>> > Moreover, I don't think the runtime behavior of my one-liner is
>> particularly costly?
>>
>> It is *extremely* costly.  There can be n! permutations, so for even,
>> say, 12 elements, you are looking at many gigabytes of memory needlessly
>> used.  One big motivator for itertools is not to have to do this.  I'm
>> curious how you would solve this problem:
>> https://www.kattis.com/problems/industrialspy  efficiently in Python.  I
>> did it by using a unique-ifying generator, but ideally this would not be
>> necessary.  Ideally, Python would do exactly what C++ does with
>> next_permutation.
>>
>> Best,
>>
>> Neil
>>
>>
>> On Fri, Oct 11, 2013 at 4:27 PM, David Mertz <mertz at gnosis.cx> wrote:
>>
>>> Andrew & Neil (or whoever):
>>>
>>> Is this *really* what you want:
>>>
>>>  >>> from itertools import permutations
>>> >>> def nonredundant_permutations(seq):
>>> ...     return list(set(permutations(seq)))
>>> ...
>>> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
>>> [(Fraction(3, 1), Decimal('3'), 3.0),
>>>  (Fraction(3, 1), 3.0, Decimal('3')),
>>>  (Decimal('3'), Fraction(3, 1), 3.0),
>>>  (Decimal('3'), 3.0, Fraction(3, 1)),
>>>  (3.0, Fraction(3, 1), Decimal('3')),
>>>  (3.0, Decimal('3'), Fraction(3, 1))]
>>>
>>> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
>>> [(Fraction(3, 1), Decimal('3'), 3.0)]
>>>
>>> It seems odd to me to want that.  On the other hand, I provide a
>>> one-line implementation of the desired behavior if anyone wants it.
>>>  Moreover, I don't think the runtime behavior of my one-liner is
>>> particularly costly... maybe not the best possible, but the best big-O
>>> possible.
>>>
>>>
>>>
>>> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>>>
>>>> I think equality is perfectly reasonable here. The fact that {3.0, 3}
>>>> only has one member seems like the obvious precedent to follow here.
>>>>
>>>> Sent from a random iPhone
>>>>
>>>> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>>>>
>>>> What would you like this hypothetical function to output here:
>>>>
>>>> >>> from itertools import permutations
>>>> >>> from decimal import Decimal as D
>>>> >>> from fractions import Fraction as F
>>>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>>> >>> list(permutations(items))
>>>>
>>>> It's neither QUITE equality nor identity you are looking for, I think,
>>>> in nonredundant_permutation():
>>>>
>>>> >> "aa" == "AA".lower(), "aa" is "AA".lower()
>>>> (True, False)
>>>> >>> "aa" == "a"+"a", "aa" is "a"+"a"
>>>> (True, True)
>>>> >>> D(3) == 3.0, D(3) is 3.0
>>>> (True, False)
>>>>
>>>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>>>>
>>>>> "It is universally agreed that a list of n distinct symbols has n!
>>>>> permutations. However, when the symbols are not distinct, the most common
>>>>> convention, in mathematics and elsewhere, seems to be to count only
>>>>> distinct permutations." ?
>>>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>>>>> .
>>>>>
>>>>>
>>>>> Should we consider fixing itertools.permutations and to output only
>>>>> unique permutations (if possible, although I realize that would break
>>>>> code). It is completely non-obvious to have permutations returning
>>>>> duplicates. For a non-breaking compromise what about adding a flag?
>>>>>
>>>>> Best,
>>>>> Neil
>>>>>
>>>>> _______________________________________________
>>>>> Python-ideas mailing list
>>>>> Python-ideas at python.org
>>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Keeping medicines from the bloodstreams of the sick; food
>>>> from the bellies of the hungry; books from the hands of the
>>>> uneducated; technology from the underdeveloped; and putting
>>>> advocates of freedom in prisons.  Intellectual property is
>>>> to the 21st century what the slave trade was to the 16th.
>>>>
>>>>
>>>>
>>>> --
>>>> Keeping medicines from the bloodstreams of the sick; food
>>>> from the bellies of the hungry; books from the hands of the
>>>> uneducated; technology from the underdeveloped; and putting
>>>> advocates of freedom in prisons.  Intellectual property is
>>>> to the 21st century what the slave trade was to the 16th.
>>>>
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at python.org
>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>
>>>>
>>>
>>>
>>> --
>>> Keeping medicines from the bloodstreams of the sick; food
>>> from the bellies of the hungry; books from the hands of the
>>> uneducated; technology from the underdeveloped; and putting
>>> advocates of freedom in prisons.  Intellectual property is
>>> to the 21st century what the slave trade was to the 16th.
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "python-ideas" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> python-ideas+unsubscribe at googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/27f95407/attachment-0001.html>

From mertz at gnosis.cx  Sat Oct 12 00:03:48 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 15:03:48 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
Message-ID: <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>

Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting, and
falsely thought that would be general; but obviously it's not.

So I guess the question is whether there is ANY way to do this without
having to accumulate a 'seen' set (which can grow to size N!).  The answer
isn't jumping out at me, but that doesn't mean there's not a way.

I don't want itertools.permutations() to do "equality filtering", but
assuming some other function in itertools were to do that, how could it do
so algorithmically? Or whatever, same question if it is
itertools.permutations(seq, distinct=True) as the API.


On Fri, Oct 11, 2013 at 2:51 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

> Unfortunately, that doesn't quite work?
>
> list("".join(x) for x in it.permutations('aaabb', 3))
> ['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba',
> 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba',
> 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab',
> 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa',
> 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa',
> 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']
>
>
> On Fri, Oct 11, 2013 at 5:48 PM, David Mertz <mertz at gnosis.cx> wrote:
>
>> OK, you're right.  Just using set() has bad worst case memory costs.  I
>> was thinking of the case where there actually WERE lots of equalities, and
>> hence the resulting list would be much smaller than N!.  But of course
>> that's not general.  It takes more than one line, but here's an incremental
>> version:
>>
>> def nonredundant_permutations(seq):
>>     seq = sorted(seq)
>>     last = None
>>     for perm in permutations(seq):
>>         if perm != last:
>>             yield perm
>>             last = perm
>>
>>
>> On Fri, Oct 11, 2013 at 2:35 PM, Neil Girdhar <mistersheik at gmail.com>wrote:
>>
>>> > Moreover, I don't think the runtime behavior of my one-liner is
>>> particularly costly?
>>>
>>> It is *extremely* costly.  There can be n! permutations, so for even,
>>> say, 12 elements, you are looking at many gigabytes of memory needlessly
>>> used.  One big motivator for itertools is not to have to do this.  I'm
>>> curious how you would solve this problem:
>>> https://www.kattis.com/problems/industrialspy  efficiently in Python.
>>>  I did it by using a unique-ifying generator, but ideally this would not be
>>> necessary.  Ideally, Python would do exactly what C++ does with
>>> next_permutation.
>>>
>>> Best,
>>>
>>> Neil
>>>
>>>
>>> On Fri, Oct 11, 2013 at 4:27 PM, David Mertz <mertz at gnosis.cx> wrote:
>>>
>>>> Andrew & Neil (or whoever):
>>>>
>>>> Is this *really* what you want:
>>>>
>>>>  >>> from itertools import permutations
>>>> >>> def nonredundant_permutations(seq):
>>>> ...     return list(set(permutations(seq)))
>>>> ...
>>>> >>> pprint(list(permutations([F(3,1), D(3.0), 3.0])))
>>>> [(Fraction(3, 1), Decimal('3'), 3.0),
>>>>  (Fraction(3, 1), 3.0, Decimal('3')),
>>>>  (Decimal('3'), Fraction(3, 1), 3.0),
>>>>  (Decimal('3'), 3.0, Fraction(3, 1)),
>>>>  (3.0, Fraction(3, 1), Decimal('3')),
>>>>  (3.0, Decimal('3'), Fraction(3, 1))]
>>>>
>>>> >>> pprint(list(nonredundant_permutations([F(3,1), D(3.0), 3.0])))
>>>> [(Fraction(3, 1), Decimal('3'), 3.0)]
>>>>
>>>> It seems odd to me to want that.  On the other hand, I provide a
>>>> one-line implementation of the desired behavior if anyone wants it.
>>>>  Moreover, I don't think the runtime behavior of my one-liner is
>>>> particularly costly... maybe not the best possible, but the best big-O
>>>> possible.
>>>>
>>>>
>>>>
>>>> On Fri, Oct 11, 2013 at 1:19 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>>>>
>>>>> I think equality is perfectly reasonable here. The fact that {3.0, 3}
>>>>> only has one member seems like the obvious precedent to follow here.
>>>>>
>>>>> Sent from a random iPhone
>>>>>
>>>>> On Oct 11, 2013, at 13:02, David Mertz <mertz at gnosis.cx> wrote:
>>>>>
>>>>> What would you like this hypothetical function to output here:
>>>>>
>>>>> >>> from itertools import permutations
>>>>> >>> from decimal import Decimal as D
>>>>> >>> from fractions import Fraction as F
>>>>> >>> items = (3, 3.0, D(3), F(3,1), "aa", "AA".lower(), "a"+"a")
>>>>> >>> list(permutations(items))
>>>>>
>>>>> It's neither QUITE equality nor identity you are looking for, I think,
>>>>> in nonredundant_permutation():
>>>>>
>>>>> >> "aa" == "AA".lower(), "aa" is "AA".lower()
>>>>> (True, False)
>>>>> >>> "aa" == "a"+"a", "aa" is "a"+"a"
>>>>> (True, True)
>>>>> >>> D(3) == 3.0, D(3) is 3.0
>>>>> (True, False)
>>>>>
>>>>> On Fri, Oct 11, 2013 at 11:38 AM, Neil Girdhar <mistersheik at gmail.com>wrote:
>>>>>
>>>>>> "It is universally agreed that a list of n distinct symbols has n!
>>>>>> permutations. However, when the symbols are not distinct, the most common
>>>>>> convention, in mathematics and elsewhere, seems to be to count only
>>>>>> distinct permutations." ?
>>>>>> http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permutations-contain-duplicates-when-the-original
>>>>>> .
>>>>>>
>>>>>>
>>>>>> Should we consider fixing itertools.permutations and to output only
>>>>>> unique permutations (if possible, although I realize that would break
>>>>>> code). It is completely non-obvious to have permutations returning
>>>>>> duplicates. For a non-breaking compromise what about adding a flag?
>>>>>>
>>>>>> Best,
>>>>>> Neil
>>>>>>
>>>>>> _______________________________________________
>>>>>> Python-ideas mailing list
>>>>>> Python-ideas at python.org
>>>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Keeping medicines from the bloodstreams of the sick; food
>>>>> from the bellies of the hungry; books from the hands of the
>>>>> uneducated; technology from the underdeveloped; and putting
>>>>> advocates of freedom in prisons.  Intellectual property is
>>>>> to the 21st century what the slave trade was to the 16th.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Keeping medicines from the bloodstreams of the sick; food
>>>>> from the bellies of the hungry; books from the hands of the
>>>>> uneducated; technology from the underdeveloped; and putting
>>>>> advocates of freedom in prisons.  Intellectual property is
>>>>> to the 21st century what the slave trade was to the 16th.
>>>>>
>>>>> _______________________________________________
>>>>> Python-ideas mailing list
>>>>> Python-ideas at python.org
>>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Keeping medicines from the bloodstreams of the sick; food
>>>> from the bellies of the hungry; books from the hands of the
>>>> uneducated; technology from the underdeveloped; and putting
>>>> advocates of freedom in prisons.  Intellectual property is
>>>> to the 21st century what the slave trade was to the 16th.
>>>>
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at python.org
>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "python-ideas" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> python-ideas+unsubscribe at googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/f37c4732/attachment.html>

From python at mrabarnett.plus.com  Sat Oct 12 00:19:34 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 11 Oct 2013 23:19:34 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
Message-ID: <52587976.1000901@mrabarnett.plus.com>

On 11/10/2013 23:03, David Mertz wrote:
> Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting,
> and falsely thought that would be general; but obviously it's not.
>
> So I guess the question is whether there is ANY way to do this without
> having to accumulate a 'seen' set (which can grow to size N!).  The
> answer isn't jumping out at me, but that doesn't mean there's not a way.
>
> I don't want itertools.permutations() to do "equality filtering", but
> assuming some other function in itertools were to do that, how could it
> do so algorithmically? Or whatever, same question if it is
> itertools.permutations(seq, distinct=True) as the API.
>
Here's an implementation:

def unique_permutations(iterable, count=None, key=None):
     def perm(items, count):
         if count:
             prev_item = object()

             for i, item in enumerate(items):
                 if item != prev_item:
                     for p in perm(items[ : i] + items[i + 1 : ], count 
- 1):
                         yield [item] + p

                 prev_item = item

         else:
             yield []

     if key is None:
         key = lambda item: item

     items = sorted(iterable, key=key)

     if count is None:
         count = len(items)

     yield from perm(items, count)


And some results:

 >>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
 >>> print(list(unique_permutations([0, 'a', 0], key=str)))
[[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]


From mistersheik at gmail.com  Sat Oct 12 00:23:36 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 18:23:36 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <52587976.1000901@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
Message-ID: <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>

Beautiful!!


On Fri, Oct 11, 2013 at 6:19 PM, MRAB <python at mrabarnett.plus.com> wrote:

> On 11/10/2013 23:03, David Mertz wrote:
>
>> Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting,
>> and falsely thought that would be general; but obviously it's not.
>>
>> So I guess the question is whether there is ANY way to do this without
>> having to accumulate a 'seen' set (which can grow to size N!).  The
>> answer isn't jumping out at me, but that doesn't mean there's not a way.
>>
>> I don't want itertools.permutations() to do "equality filtering", but
>> assuming some other function in itertools were to do that, how could it
>> do so algorithmically? Or whatever, same question if it is
>> itertools.permutations(seq, distinct=True) as the API.
>>
>>  Here's an implementation:
>
> def unique_permutations(iterable, count=None, key=None):
>     def perm(items, count):
>         if count:
>             prev_item = object()
>
>             for i, item in enumerate(items):
>                 if item != prev_item:
>                     for p in perm(items[ : i] + items[i + 1 : ], count -
> 1):
>                         yield [item] + p
>
>                 prev_item = item
>
>         else:
>             yield []
>
>     if key is None:
>         key = lambda item: item
>
>     items = sorted(iterable, key=key)
>
>     if count is None:
>         count = len(items)
>
>     yield from perm(items, count)
>
>
> And some results:
>
> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
> >>> print(list(unique_**permutations([0, 'a', 0], key=str)))
> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]
>
>
> ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/**mailman/listinfo/python-ideas<https://mail.python.org/mailman/listinfo/python-ideas>
>
> --
>
> --- You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/**
> topic/python-ideas/**dDttJfkyu2k/unsubscribe<https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe>
> .
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe@**googlegroups.com<python-ideas%2Bunsubscribe at googlegroups.com>
> .
> For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
> .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/8ccbdc82/attachment.html>

From mertz at gnosis.cx  Sat Oct 12 00:45:04 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 15:45:04 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
Message-ID: <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>

I realize after reading
http://stackoverflow.com/questions/6284396/permutations-with-unique-valuesthat
my version was ALMOST right:

def nonredundant_permutations(seq, r=None):
    last = ()
    for perm in permutations(sorted(seq), r):
        if perm > last:
            yield perm
            last = perm

I can't look only for inequality, but must use the actual comparison.

>>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
[(Fraction(3, 1), Decimal('3'), 3.0)]

Of course, this approach DOES rely on the order in which
itertools.permutations() returns values.  However, it's a bit more compact
than MRAB's version.




On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

> Beautiful!!
>
>
> On Fri, Oct 11, 2013 at 6:19 PM, MRAB <python at mrabarnett.plus.com> wrote:
>
>> On 11/10/2013 23:03, David Mertz wrote:
>>
>>> Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting,
>>> and falsely thought that would be general; but obviously it's not.
>>>
>>> So I guess the question is whether there is ANY way to do this without
>>> having to accumulate a 'seen' set (which can grow to size N!).  The
>>> answer isn't jumping out at me, but that doesn't mean there's not a way.
>>>
>>> I don't want itertools.permutations() to do "equality filtering", but
>>> assuming some other function in itertools were to do that, how could it
>>> do so algorithmically? Or whatever, same question if it is
>>> itertools.permutations(seq, distinct=True) as the API.
>>>
>>>  Here's an implementation:
>>
>> def unique_permutations(iterable, count=None, key=None):
>>     def perm(items, count):
>>         if count:
>>             prev_item = object()
>>
>>             for i, item in enumerate(items):
>>                 if item != prev_item:
>>                     for p in perm(items[ : i] + items[i + 1 : ], count -
>> 1):
>>                         yield [item] + p
>>
>>                 prev_item = item
>>
>>         else:
>>             yield []
>>
>>     if key is None:
>>         key = lambda item: item
>>
>>     items = sorted(iterable, key=key)
>>
>>     if count is None:
>>         count = len(items)
>>
>>     yield from perm(items, count)
>>
>>
>> And some results:
>>
>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>> >>> print(list(unique_**permutations([0, 'a', 0], key=str)))
>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]
>>
>>
>> ______________________________**_________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/**mailman/listinfo/python-ideas<https://mail.python.org/mailman/listinfo/python-ideas>
>>
>> --
>>
>> --- You received this message because you are subscribed to a topic in
>> the Google Groups "python-ideas" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/**
>> topic/python-ideas/**dDttJfkyu2k/unsubscribe<https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe>
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> python-ideas+unsubscribe@**googlegroups.com<python-ideas%2Bunsubscribe at googlegroups.com>
>> .
>> For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>> .
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/596d16ea/attachment.html>

From ncoghlan at gmail.com  Sat Oct 12 01:49:55 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Oct 2013 09:49:55 +1000
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
Message-ID: <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>

On 12 Oct 2013 08:45, "David Mertz" <mertz at gnosis.cx> wrote:
>
>
> I realize after reading
http://stackoverflow.com/questions/6284396/permutations-with-unique-valuesthat
my version was ALMOST right:
>
> def nonredundant_permutations(seq, r=None):
>     last = ()
>     for perm in permutations(sorted(seq), r):
>         if perm > last:
>             yield perm
>             last = perm
>
> I can't look only for inequality, but must use the actual comparison.
>
> >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
> >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
> [(Fraction(3, 1), Decimal('3'), 3.0)]
>
> Of course, this approach DOES rely on the order in which
itertools.permutations() returns values.  However, it's a bit more compact
than MRAB's version.

As there is no requirement that entries in a sequence handled by
itertools.permutations be sortable, so the original question of why this
isn't done by default has been answered (the general solution risks
consuming too much memory, while the memory efficient solution constrains
the domain to only sortable sequences).

Cheers,
Nick.

>
>
>
>
> On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar <mistersheik at gmail.com>
wrote:
>>
>> Beautiful!!
>>
>>
>> On Fri, Oct 11, 2013 at 6:19 PM, MRAB <python at mrabarnett.plus.com> wrote:
>>>
>>> On 11/10/2013 23:03, David Mertz wrote:
>>>>
>>>> Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting,
>>>> and falsely thought that would be general; but obviously it's not.
>>>>
>>>> So I guess the question is whether there is ANY way to do this without
>>>> having to accumulate a 'seen' set (which can grow to size N!).  The
>>>> answer isn't jumping out at me, but that doesn't mean there's not a
way.
>>>>
>>>> I don't want itertools.permutations() to do "equality filtering", but
>>>> assuming some other function in itertools were to do that, how could it
>>>> do so algorithmically? Or whatever, same question if it is
>>>> itertools.permutations(seq, distinct=True) as the API.
>>>>
>>> Here's an implementation:
>>>
>>> def unique_permutations(iterable, count=None, key=None):
>>>     def perm(items, count):
>>>         if count:
>>>             prev_item = object()
>>>
>>>             for i, item in enumerate(items):
>>>                 if item != prev_item:
>>>                     for p in perm(items[ : i] + items[i + 1 : ], count
- 1):
>>>                         yield [item] + p
>>>
>>>                 prev_item = item
>>>
>>>         else:
>>>             yield []
>>>
>>>     if key is None:
>>>         key = lambda item: item
>>>
>>>     items = sorted(iterable, key=key)
>>>
>>>     if count is None:
>>>         count = len(items)
>>>
>>>     yield from perm(items, count)
>>>
>>>
>>> And some results:
>>>
>>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
>>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>>> >>> print(list(unique_permutations([0, 'a', 0], key=str)))
>>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]
>>>
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>> --
>>>
>>> --- You received this message because you are subscribed to a topic in
the Google Groups "python-ideas" group.
>>> To unsubscribe from this topic, visit
https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
python-ideas+unsubscribe at googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/82e23284/attachment-0001.html>

From mistersheik at gmail.com  Sat Oct 12 01:53:31 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 19:53:31 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
Message-ID: <CAA68w_n9M6Hhg-S-TBrobQcH4qTiiidw3ShQSi4F-1UL-yK+Og@mail.gmail.com>

Yes, that's all true.  I want to suggest that the efficient unique
permutations solution is very important to have.  Sortable sequences are
very common. There are itertools routines that only work with =-comparable
elements (e.g. groupby), so it's not a stretch to have a permutations that
is restricted to <-comparable elements.

Best,
Neil


On Fri, Oct 11, 2013 at 7:49 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 12 Oct 2013 08:45, "David Mertz" <mertz at gnosis.cx> wrote:
> >
> >
> > I realize after reading
> http://stackoverflow.com/questions/6284396/permutations-with-unique-valuesthat my version was ALMOST right:
> >
> > def nonredundant_permutations(seq, r=None):
> >     last = ()
> >     for perm in permutations(sorted(seq), r):
> >         if perm > last:
> >             yield perm
> >             last = perm
> >
> > I can't look only for inequality, but must use the actual comparison.
> >
> > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
> > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
> > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
> > [(Fraction(3, 1), Decimal('3'), 3.0)]
> >
> > Of course, this approach DOES rely on the order in which
> itertools.permutations() returns values.  However, it's a bit more compact
> than MRAB's version.
>
> As there is no requirement that entries in a sequence handled by
> itertools.permutations be sortable, so the original question of why this
> isn't done by default has been answered (the general solution risks
> consuming too much memory, while the memory efficient solution constrains
> the domain to only sortable sequences).
>
> Cheers,
> Nick.
>
> >
> >
> >
> >
> > On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> >>
> >> Beautiful!!
> >>
> >>
> >> On Fri, Oct 11, 2013 at 6:19 PM, MRAB <python at mrabarnett.plus.com>
> wrote:
> >>>
> >>> On 11/10/2013 23:03, David Mertz wrote:
> >>>>
> >>>> Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting,
> >>>> and falsely thought that would be general; but obviously it's not.
> >>>>
> >>>> So I guess the question is whether there is ANY way to do this without
> >>>> having to accumulate a 'seen' set (which can grow to size N!).  The
> >>>> answer isn't jumping out at me, but that doesn't mean there's not a
> way.
> >>>>
> >>>> I don't want itertools.permutations() to do "equality filtering", but
> >>>> assuming some other function in itertools were to do that, how could
> it
> >>>> do so algorithmically? Or whatever, same question if it is
> >>>> itertools.permutations(seq, distinct=True) as the API.
> >>>>
> >>> Here's an implementation:
> >>>
> >>> def unique_permutations(iterable, count=None, key=None):
> >>>     def perm(items, count):
> >>>         if count:
> >>>             prev_item = object()
> >>>
> >>>             for i, item in enumerate(items):
> >>>                 if item != prev_item:
> >>>                     for p in perm(items[ : i] + items[i + 1 : ], count
> - 1):
> >>>                         yield [item] + p
> >>>
> >>>                 prev_item = item
> >>>
> >>>         else:
> >>>             yield []
> >>>
> >>>     if key is None:
> >>>         key = lambda item: item
> >>>
> >>>     items = sorted(iterable, key=key)
> >>>
> >>>     if count is None:
> >>>         count = len(items)
> >>>
> >>>     yield from perm(items, count)
> >>>
> >>>
> >>> And some results:
> >>>
> >>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
> >>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
> >>> >>> print(list(unique_permutations([0, 'a', 0], key=str)))
> >>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]
> >>>
> >>>
> >>> _______________________________________________
> >>> Python-ideas mailing list
> >>> Python-ideas at python.org
> >>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>
> >>> --
> >>>
> >>> --- You received this message because you are subscribed to a topic in
> the Google Groups "python-ideas" group.
> >>> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> >>> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at python.org
> >> https://mail.python.org/mailman/listinfo/python-ideas
> >>
> >
> >
> >
> > --
> > Keeping medicines from the bloodstreams of the sick; food
> > from the bellies of the hungry; books from the hands of the
> > uneducated; technology from the underdeveloped; and putting
> > advocates of freedom in prisons.  Intellectual property is
> > to the 21st century what the slave trade was to the 16th.
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/5029baa6/attachment.html>

From abarnert at yahoo.com  Sat Oct 12 02:20:17 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 11 Oct 2013 17:20:17 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_n9M6Hhg-S-TBrobQcH4qTiiidw3ShQSi4F-1UL-yK+Og@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <CAA68w_n9M6Hhg-S-TBrobQcH4qTiiidw3ShQSi4F-1UL-yK+Og@mail.gmail.com>
Message-ID: <D1466D34-427E-4E2A-B35B-89D4F1B0177C@yahoo.com>

I think this is worth having even for 3.3 and 2.x, so I'd suggest sending a patch to more-itertools (https://github.com/erikrose/more-itertools) as well as here.

Sent from a random iPhone

On Oct 11, 2013, at 16:53, Neil Girdhar <mistersheik at gmail.com> wrote:

> Yes, that's all true.  I want to suggest that the efficient unique permutations solution is very important to have.  Sortable sequences are very common. There are itertools routines that only work with =-comparable elements (e.g. groupby), so it's not a stretch to have a permutations that is restricted to <-comparable elements.
> 
> Best,
> Neil
> 
> 
> On Fri, Oct 11, 2013 at 7:49 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> 
>> On 12 Oct 2013 08:45, "David Mertz" <mertz at gnosis.cx> wrote:
>> >
>> >
>> > I realize after reading http://stackoverflow.com/questions/6284396/permutations-with-unique-values that my version was ALMOST right:
>> >
>> > def nonredundant_permutations(seq, r=None):
>> >     last = ()
>> >     for perm in permutations(sorted(seq), r):
>> >         if perm > last:
>> >             yield perm
>> >             last = perm
>> >
>> > I can't look only for inequality, but must use the actual comparison.
>> >
>> > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
>> > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>> > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
>> > [(Fraction(3, 1), Decimal('3'), 3.0)]
>> >
>> > Of course, this approach DOES rely on the order in which itertools.permutations() returns values.  However, it's a bit more compact than MRAB's version.
>> 
>> As there is no requirement that entries in a sequence handled by itertools.permutations be sortable, so the original question of why this isn't done by default has been answered (the general solution risks consuming too much memory, while the memory efficient solution constrains the domain to only sortable sequences).
>> 
>> Cheers,
>> Nick.
>> 
>> >
>> >
>> >
>> >
>> > On Fri, Oct 11, 2013 at 3:23 PM, Neil Girdhar <mistersheik at gmail.com> wrote:
>> >>
>> >> Beautiful!!
>> >>
>> >>
>> >> On Fri, Oct 11, 2013 at 6:19 PM, MRAB <python at mrabarnett.plus.com> wrote:
>> >>>
>> >>> On 11/10/2013 23:03, David Mertz wrote:
>> >>>>
>> >>>> Bummer.  You are right, Neil.  I saw MRAB's suggestion about sorting,
>> >>>> and falsely thought that would be general; but obviously it's not.
>> >>>>
>> >>>> So I guess the question is whether there is ANY way to do this without
>> >>>> having to accumulate a 'seen' set (which can grow to size N!).  The
>> >>>> answer isn't jumping out at me, but that doesn't mean there's not a way.
>> >>>>
>> >>>> I don't want itertools.permutations() to do "equality filtering", but
>> >>>> assuming some other function in itertools were to do that, how could it
>> >>>> do so algorithmically? Or whatever, same question if it is
>> >>>> itertools.permutations(seq, distinct=True) as the API.
>> >>>>
>> >>> Here's an implementation:
>> >>>
>> >>> def unique_permutations(iterable, count=None, key=None):
>> >>>     def perm(items, count):
>> >>>         if count:
>> >>>             prev_item = object()
>> >>>
>> >>>             for i, item in enumerate(items):
>> >>>                 if item != prev_item:
>> >>>                     for p in perm(items[ : i] + items[i + 1 : ], count - 1):
>> >>>                         yield [item] + p
>> >>>
>> >>>                 prev_item = item
>> >>>
>> >>>         else:
>> >>>             yield []
>> >>>
>> >>>     if key is None:
>> >>>         key = lambda item: item
>> >>>
>> >>>     items = sorted(iterable, key=key)
>> >>>
>> >>>     if count is None:
>> >>>         count = len(items)
>> >>>
>> >>>     yield from perm(items, count)
>> >>>
>> >>>
>> >>> And some results:
>> >>>
>> >>> >>> print(list("".join(x) for x in unique_permutations('aaabb', 3)))
>> >>> ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>> >>> >>> print(list(unique_permutations([0, 'a', 0], key=str)))
>> >>> [[0, 0, 'a'], [0, 'a', 0], ['a', 0, 0]]
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Python-ideas mailing list
>> >>> Python-ideas at python.org
>> >>> https://mail.python.org/mailman/listinfo/python-ideas
>> >>>
>> >>> -- 
>> >>>
>> >>> --- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group.
>> >>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>> >>> To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe at googlegroups.com.
>> >>> For more options, visit https://groups.google.com/groups/opt_out.
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Python-ideas mailing list
>> >> Python-ideas at python.org
>> >> https://mail.python.org/mailman/listinfo/python-ideas
>> >>
>> >
>> >
>> >
>> > -- 
>> > Keeping medicines from the bloodstreams of the sick; food 
>> > from the bellies of the hungry; books from the hands of the 
>> > uneducated; technology from the underdeveloped; and putting 
>> > advocates of freedom in prisons.  Intellectual property is
>> > to the 21st century what the slave trade was to the 16th.
>> >
>> > _______________________________________________
>> > Python-ideas mailing list
>> > Python-ideas at python.org
>> > https://mail.python.org/mailman/listinfo/python-ideas
>> >
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/a39a358a/attachment-0001.html>

From python at mrabarnett.plus.com  Sat Oct 12 03:55:23 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 12 Oct 2013 02:55:23 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
Message-ID: <5258AC0B.1090603@mrabarnett.plus.com>

On 12/10/2013 00:49, Nick Coghlan wrote:
>
> On 12 Oct 2013 08:45, "David Mertz" <mertz at gnosis.cx
> <mailto:mertz at gnosis.cx>> wrote:
>  >
>  >
>  > I realize after reading
> http://stackoverflow.com/questions/6284396/permutations-with-unique-values
> that my version was ALMOST right:
>  >
>  > def nonredundant_permutations(seq, r=None):
>  >     last = ()
>  >     for perm in permutations(sorted(seq), r):
>  >         if perm > last:
>  >             yield perm
>  >             last = perm
>  >
>  > I can't look only for inequality, but must use the actual comparison.
>  >
>  > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
>  > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>  > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
>  > [(Fraction(3, 1), Decimal('3'), 3.0)]
>  >
>  > Of course, this approach DOES rely on the order in which
> itertools.permutations() returns values.  However, it's a bit more
> compact than MRAB's version.
>
> As there is no requirement that entries in a sequence handled by
> itertools.permutations be sortable, so the original question of why this
> isn't done by default has been answered (the general solution risks
> consuming too much memory, while the memory efficient solution
> constrains the domain to only sortable sequences).
>
OK, here's a new implementation:

def unique_permutations(iterable, count=None):
     def perm(items, count):
         if count:
             prev_item = object()

             for i, item in enumerate(items):
                 if item != prev_item:
                     for p in perm(items[ : i] + items[i + 1 : ], count 
- 1):
                         yield [item] + p

                 prev_item = item

         else:
             yield []

     items = list(iterable)

     keys = {}

     for item in items:
         keys.setdefault(item, len(keys))

     items.sort(key=keys.get)

     if count is None:
         count = len(items)

     yield from perm(items, count)


From steve at pearwood.info  Sat Oct 12 04:06:48 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 12 Oct 2013 13:06:48 +1100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
Message-ID: <20131012020647.GH7989@ando>

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
> "It is universally agreed that a list of n distinct symbols has n! 
> permutations. However, when the symbols are not distinct, the most common 
> convention, in mathematics and elsewhere, seems to be to count only 
> distinct permutations." ? 

I dispute this entire premise. Take a simple (and stereotypical) 
example, picking balls from an urn. 

Say that you have three Red and Two black balls, and randomly select 
without replacement. If you count only unique permutations, you get only 
four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
{'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which 
is incorrect. The right way to model this experiment is not to count 
distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 
'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to 
six ways of drawing RR. If that's not obvious enough, consider the case 
where you have two thousand red balls and two black balls -- do you 
really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most 
useful or common convention. If you're permuting a collection of 
non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would 
only want distinct permutations, and I can't.


> Should we consider fixing itertools.permutations and to output only unique 
> permutations (if possible, although I realize that would break code). 

Absolutely not. Even if you were right that it should return unique 
permutations, and I strongly disagree that you were, the fact that it 
would break code is a deal-breaker.



-- 
Steven

From python at mrabarnett.plus.com  Sat Oct 12 04:34:33 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 12 Oct 2013 03:34:33 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <5258AC0B.1090603@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com>
Message-ID: <5258B539.10307@mrabarnett.plus.com>

On 12/10/2013 02:55, MRAB wrote:
> On 12/10/2013 00:49, Nick Coghlan wrote:
>>
>> On 12 Oct 2013 08:45, "David Mertz" <mertz at gnosis.cx
>> <mailto:mertz at gnosis.cx>> wrote:
>>  >
>>  >
>>  > I realize after reading
>> http://stackoverflow.com/questions/6284396/permutations-with-unique-values
>> that my version was ALMOST right:
>>  >
>>  > def nonredundant_permutations(seq, r=None):
>>  >     last = ()
>>  >     for perm in permutations(sorted(seq), r):
>>  >         if perm > last:
>>  >             yield perm
>>  >             last = perm
>>  >
>>  > I can't look only for inequality, but must use the actual comparison.
>>  >
>>  > >>> ["".join(x) for x in nonredundant_permutations('aaabb',3)]
>>  > ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
>>  > >>> list(nonredundant_permutations([F(3,1), D(3.0), 3.0]))
>>  > [(Fraction(3, 1), Decimal('3'), 3.0)]
>>  >
>>  > Of course, this approach DOES rely on the order in which
>> itertools.permutations() returns values.  However, it's a bit more
>> compact than MRAB's version.
>>
>> As there is no requirement that entries in a sequence handled by
>> itertools.permutations be sortable, so the original question of why this
>> isn't done by default has been answered (the general solution risks
>> consuming too much memory, while the memory efficient solution
>> constrains the domain to only sortable sequences).
>>
> OK, here's a new implementation:
>
[snip]
I've just realised that I don't need to sort them at all.

Here's a new improved implementation:

def unique_permutations(iterable, count=None):
     def perm(items, count):
         if count:
             seen = set()

             for i, item in enumerate(items):
                 if item not in seen:
                     for p in perm(items[ : i] + items[i + 1 : ], count 
- 1):
                         yield [item] + p

                     seen.add(item)
         else:
             yield []

     items = list(iterable)

     if count is None:
         count = len(items)

     yield from perm(items, count)


From mertz at gnosis.cx  Sat Oct 12 04:36:26 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 19:36:26 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <5258AC0B.1090603@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com>
Message-ID: <CAEbHw4Yv9rCEz7LOz_123x2Z1X136S1gADkwTsUPwZgWBkxB7Q@mail.gmail.com>

Hi MRAB,

I'm confused by your implementation.  In particular, what do these lines do?

    # [...]
    items = list(iterable)
    keys = {}
    for item in items:
        keys.setdefault(item, len(keys))
    items.sort(key=keys.get)

I cannot understand how these can possibly have any effect (other than the
first line that makes a concrete list out of an iterable).

We loop through the list in its natural order.  E.g. say the list is '[a,
b, c]' (where those names are any types of objects whatsoever).  The loop
gives us:

    keys == {a:0, b:1, c:2}

When we do a sort on 'key=keys.get()' how can that ever possibly change the
order of 'items'?

There's also a bit of a flaw in that your implementation blows up if
anything yielded by iterable isn't hashable:

    >>> list(unique_permutations([ [1,2],[3,4],[5,6] ]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'

There's no problem doing this with itertools.permutations:

    >>> list(permutations([[1,2],[3,4],[5,6]]))
    [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, 2],
[5, 6]),
    ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, 4],
[1, 2])]

This particular one also succeeds with my nonredundant_permutations:

    >>> list(nonredundant_permutations([[1,2],[3,4],[5,6]]))
    [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1, 2],
[5, 6]),
    ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3, 4],
[1, 2])]

However, my version *DOES* fail when things cannot be compared under
inequality:

    >>> list(nonredundant_permutations([[1,2],3,4]))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in nonredundant_permutations
    TypeError: unorderable types: int() < list()

This also doesn't afflict itertools.permutations.


Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/7d8f8b55/attachment.html>

From mistersheik at gmail.com  Sat Oct 12 04:37:02 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 22:37:02 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131012020647.GH7989@ando>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
Message-ID: <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>

I think it's pretty indisputable that permutations are formally defined
this way (and I challenge you to find a source that doesn't agree with
that).  I'm sure you know that your idea of using permutations to evaluate
a multinomial distribution is not efficient.  A nicer way to evaluate
probabilities is to pass your set through a collections.Counter, and then
use the resulting dictionary with scipy.stats.multinomial (if it exists
yet).

I believe most people will be surprised that len(permutations(iterable))
does count unique permutations.

Best,

Neil


On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>wrote:

> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
> > "It is universally agreed that a list of n distinct symbols has n!
> > permutations. However, when the symbols are not distinct, the most common
> > convention, in mathematics and elsewhere, seems to be to count only
> > distinct permutations." ?
>
> I dispute this entire premise. Take a simple (and stereotypical)
> example, picking balls from an urn.
>
> Say that you have three Red and Two black balls, and randomly select
> without replacement. If you count only unique permutations, you get only
> four possibilities:
>
> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
> {'BR', 'RB', 'RR', 'BB'}
>
> which implies that drawing RR is no more likely than drawing BB, which
> is incorrect. The right way to model this experiment is not to count
> distinct permutations, but actual permutations:
>
> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB',
> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
>
> which makes it clear that there are two ways of drawing BB compared to
> six ways of drawing RR. If that's not obvious enough, consider the case
> where you have two thousand red balls and two black balls -- do you
> really conclude that there are the same number of ways to pick RR as BB?
>
> So I disagree that counting only distinct permutations is the most
> useful or common convention. If you're permuting a collection of
> non-distinct values, you should expect non-distinct permutations.
>
> I'm trying to think of a realistic, physical situation where you would
> only want distinct permutations, and I can't.
>
>
> > Should we consider fixing itertools.permutations and to output only
> unique
> > permutations (if possible, although I realize that would break code).
>
> Absolutely not. Even if you were right that it should return unique
> permutations, and I strongly disagree that you were, the fact that it
> would break code is a deal-breaker.
>
>
>
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/e0b1a2b1/attachment-0001.html>

From mertz at gnosis.cx  Sat Oct 12 04:48:23 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 19:48:23 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
Message-ID: <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>

Related to, but not quite the same as Steven D'Aprano's point, I would find
it very strange for itertools.permutations() to return a list that was
narrowed to equal-but-not-identical items.

This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0),
3.0]' several times.  I've created the Fraction, Decimal, and float for
distinct reasons to get different behaviors and available methods.  When I
want to look for the permutations of those I don't want "any old random
choice of equal values" since presumably I've given them a type for a
reason.

On the other hand, I can see a little bit of sense that
'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me
a list of 7!==5040 things that are exactly the same as each other.  On the
other hand, I don't know how to generalize that, since my feeling is far
less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's
redundancy, but there's also important information in the probability and
count of specific sequences.

My feeling, however, is that if one were to trim down the results from a
permutations-related function, it is more interesting to me to only
eliminate IDENTICAL items, not to eliminate merely EQUAL ones.


On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

> I think it's pretty indisputable that permutations are formally defined
> this way (and I challenge you to find a source that doesn't agree with
> that).  I'm sure you know that your idea of using permutations to evaluate
> a multinomial distribution is not efficient.  A nicer way to evaluate
> probabilities is to pass your set through a collections.Counter, and then
> use the resulting dictionary with scipy.stats.multinomial (if it exists
> yet).
>
> I believe most people will be surprised that len(permutations(iterable))
> does count unique permutations.
>
> Best,
>
> Neil
>
>
> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>wrote:
>
>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
>> > "It is universally agreed that a list of n distinct symbols has n!
>> > permutations. However, when the symbols are not distinct, the most
>> common
>> > convention, in mathematics and elsewhere, seems to be to count only
>> > distinct permutations." ?
>>
>> I dispute this entire premise. Take a simple (and stereotypical)
>> example, picking balls from an urn.
>>
>> Say that you have three Red and Two black balls, and randomly select
>> without replacement. If you count only unique permutations, you get only
>> four possibilities:
>>
>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
>> {'BR', 'RB', 'RR', 'BB'}
>>
>> which implies that drawing RR is no more likely than drawing BB, which
>> is incorrect. The right way to model this experiment is not to count
>> distinct permutations, but actual permutations:
>>
>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB',
>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
>>
>> which makes it clear that there are two ways of drawing BB compared to
>> six ways of drawing RR. If that's not obvious enough, consider the case
>> where you have two thousand red balls and two black balls -- do you
>> really conclude that there are the same number of ways to pick RR as BB?
>>
>> So I disagree that counting only distinct permutations is the most
>> useful or common convention. If you're permuting a collection of
>> non-distinct values, you should expect non-distinct permutations.
>>
>> I'm trying to think of a realistic, physical situation where you would
>> only want distinct permutations, and I can't.
>>
>>
>> > Should we consider fixing itertools.permutations and to output only
>> unique
>> > permutations (if possible, although I realize that would break code).
>>
>> Absolutely not. Even if you were right that it should return unique
>> permutations, and I strongly disagree that you were, the fact that it
>> would break code is a deal-breaker.
>>
>>
>>
>> --
>> Steven
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "python-ideas" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> python-ideas+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/81b3d1c6/attachment.html>

From mistersheik at gmail.com  Sat Oct 12 04:55:06 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Fri, 11 Oct 2013 22:55:06 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
Message-ID: <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>

I honestly think that Python should stick to the mathematical definition of
permutations rather than some kind of consensus of the tiny minority of
people here.  When next_permutation was added to C++, I believe the whole
standards committee discussed it and they came up with the thing that makes
the most sense.  The fact that dict and set use equality is I think the
reason that permutations should use equality.

Neil


On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz at gnosis.cx> wrote:

> Related to, but not quite the same as Steven D'Aprano's point, I would
> find it very strange for itertools.permutations() to return a list that was
> narrowed to equal-but-not-identical items.
>
> This is why I've raised the example of 'items=[Fraction(3,1),
> Decimal(3.0), 3.0]' several times.  I've created the Fraction, Decimal, and
> float for distinct reasons to get different behaviors and available
> methods.  When I want to look for the permutations of those I don't want
> "any old random choice of equal values" since presumably I've given them a
> type for a reason.
>
> On the other hand, I can see a little bit of sense that
> 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me
> a list of 7!==5040 things that are exactly the same as each other.  On the
> other hand, I don't know how to generalize that, since my feeling is far
> less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's
> redundancy, but there's also important information in the probability and
> count of specific sequences.
>
> My feeling, however, is that if one were to trim down the results from a
> permutations-related function, it is more interesting to me to only
> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
>
>
> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik at gmail.com>wrote:
>
>> I think it's pretty indisputable that permutations are formally defined
>> this way (and I challenge you to find a source that doesn't agree with
>> that).  I'm sure you know that your idea of using permutations to evaluate
>> a multinomial distribution is not efficient.  A nicer way to evaluate
>> probabilities is to pass your set through a collections.Counter, and then
>> use the resulting dictionary with scipy.stats.multinomial (if it exists
>> yet).
>>
>> I believe most people will be surprised that len(permutations(iterable))
>> does count unique permutations.
>>
>> Best,
>>
>> Neil
>>
>>
>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>wrote:
>>
>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
>>> > "It is universally agreed that a list of n distinct symbols has n!
>>> > permutations. However, when the symbols are not distinct, the most
>>> common
>>> > convention, in mathematics and elsewhere, seems to be to count only
>>> > distinct permutations." ?
>>>
>>> I dispute this entire premise. Take a simple (and stereotypical)
>>> example, picking balls from an urn.
>>>
>>> Say that you have three Red and Two black balls, and randomly select
>>> without replacement. If you count only unique permutations, you get only
>>> four possibilities:
>>>
>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
>>> {'BR', 'RB', 'RR', 'BB'}
>>>
>>> which implies that drawing RR is no more likely than drawing BB, which
>>> is incorrect. The right way to model this experiment is not to count
>>> distinct permutations, but actual permutations:
>>>
>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB',
>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
>>>
>>> which makes it clear that there are two ways of drawing BB compared to
>>> six ways of drawing RR. If that's not obvious enough, consider the case
>>> where you have two thousand red balls and two black balls -- do you
>>> really conclude that there are the same number of ways to pick RR as BB?
>>>
>>> So I disagree that counting only distinct permutations is the most
>>> useful or common convention. If you're permuting a collection of
>>> non-distinct values, you should expect non-distinct permutations.
>>>
>>> I'm trying to think of a realistic, physical situation where you would
>>> only want distinct permutations, and I can't.
>>>
>>>
>>> > Should we consider fixing itertools.permutations and to output only
>>> unique
>>> > permutations (if possible, although I realize that would break code).
>>>
>>> Absolutely not. Even if you were right that it should return unique
>>> permutations, and I strongly disagree that you were, the fact that it
>>> would break code is a deal-breaker.
>>>
>>>
>>>
>>> --
>>> Steven
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>> --
>>>
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "python-ideas" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> python-ideas+unsubscribe at googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/41a1c60c/attachment-0001.html>

From abarnert at yahoo.com  Sat Oct 12 04:57:08 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Fri, 11 Oct 2013 19:57:08 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
Message-ID: <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>

On Oct 11, 2013, at 19:48, David Mertz <mertz at gnosis.cx> wrote:

> My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

From ncoghlan at gmail.com  Sat Oct 12 06:35:13 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Oct 2013 14:35:13 +1000
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
Message-ID: <CADiSq7dvKoZgUb7Xr6oewGcQ+hYM-MBnQARJ2=usoeX4gnnZaw@mail.gmail.com>

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik at gmail.com> wrote:
>
> I honestly think that Python should stick to the mathematical definition
of permutations rather than some kind of consensus of the tiny minority of
people here.  When next_permutation was added to C++, I believe the whole
standards committee discussed it and they came up with the thing that makes
the most sense.  The fact that dict and set use equality is I think the
reason that permutations should use equality.

Why should the behaviour of hash based containers limit the behaviour of
itertools?

Python required a permutation solution that is memory efficient and works
with arbitrary objects, so that's what itertools provides.

However, you'd also like a memory efficient iterator for *mathematical*
permutations that pays attention to object values and filters out
equivalent results.

I *believe* the request is equivalent to giving a name to the following
genexp:

    (k for k, grp in groupby(permutations(sorted(input))))

That's a reasonable enough request (although perhaps more suited to the
recipes section in the itertools docs), but conflating it with complaints
about the way the existing iterator works is a good way to get people to
ignore you (especially now the language specific reasons for the current
behaviour have been pointed out, along with confirmation of the fact that
backwards compatibility requirements would prohibit changing it even if we
wanted to).

Cheers,
Nick.

>
> Neil
>
>
> On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz at gnosis.cx> wrote:
>>
>> Related to, but not quite the same as Steven D'Aprano's point, I would
find it very strange for itertools.permutations() to return a list that was
narrowed to equal-but-not-identical items.
>>
>> This is why I've raised the example of 'items=[Fraction(3,1),
Decimal(3.0), 3.0]' several times.  I've created the Fraction, Decimal, and
float for distinct reasons to get different behaviors and available
methods.  When I want to look for the permutations of those I don't want
"any old random choice of equal values" since presumably I've given them a
type for a reason.
>>
>> On the other hand, I can see a little bit of sense that
'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me
a list of 7!==5040 things that are exactly the same as each other.  On the
other hand, I don't know how to generalize that, since my feeling is far
less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's
redundancy, but there's also important information in the probability and
count of specific sequences.
>>
>> My feeling, however, is that if one were to trim down the results from a
permutations-related function, it is more interesting to me to only
eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
>>
>>
>> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik at gmail.com>
wrote:
>>>
>>> I think it's pretty indisputable that permutations are formally defined
this way (and I challenge you to find a source that doesn't agree with
that).  I'm sure you know that your idea of using permutations to evaluate
a multinomial distribution is not efficient.  A nicer way to evaluate
probabilities is to pass your set through a collections.Counter, and then
use the resulting dictionary with scipy.stats.multinomial (if it exists
yet).
>>>
>>> I believe most people will be surprised that
len(permutations(iterable)) does count unique permutations.
>>>
>>> Best,
>>>
>>> Neil
>>>
>>>
>>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>
wrote:
>>>>
>>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
>>>> > "It is universally agreed that a list of n distinct symbols has n!
>>>> > permutations. However, when the symbols are not distinct, the most
common
>>>> > convention, in mathematics and elsewhere, seems to be to count only
>>>> > distinct permutations." ?
>>>>
>>>> I dispute this entire premise. Take a simple (and stereotypical)
>>>> example, picking balls from an urn.
>>>>
>>>> Say that you have three Red and Two black balls, and randomly select
>>>> without replacement. If you count only unique permutations, you get
only
>>>> four possibilities:
>>>>
>>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
>>>> {'BR', 'RB', 'RR', 'BB'}
>>>>
>>>> which implies that drawing RR is no more likely than drawing BB, which
>>>> is incorrect. The right way to model this experiment is not to count
>>>> distinct permutations, but actual permutations:
>>>>
>>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
>>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB',
'RB',
>>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
>>>>
>>>> which makes it clear that there are two ways of drawing BB compared to
>>>> six ways of drawing RR. If that's not obvious enough, consider the case
>>>> where you have two thousand red balls and two black balls -- do you
>>>> really conclude that there are the same number of ways to pick RR as
BB?
>>>>
>>>> So I disagree that counting only distinct permutations is the most
>>>> useful or common convention. If you're permuting a collection of
>>>> non-distinct values, you should expect non-distinct permutations.
>>>>
>>>> I'm trying to think of a realistic, physical situation where you would
>>>> only want distinct permutations, and I can't.
>>>>
>>>>
>>>> > Should we consider fixing itertools.permutations and to output only
unique
>>>> > permutations (if possible, although I realize that would break code).
>>>>
>>>> Absolutely not. Even if you were right that it should return unique
>>>> permutations, and I strongly disagree that you were, the fact that it
>>>> would break code is a deal-breaker.
>>>>
>>>>
>>>>
>>>> --
>>>> Steven
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at python.org
>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
Google Groups "python-ideas" group.
>>>> To unsubscribe from this topic, visit
https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
python-ideas+unsubscribe at googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/f68979ba/attachment.html>

From stephen at xemacs.org  Sat Oct 12 07:10:21 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 12 Oct 2013 14:10:21 +0900
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
Message-ID: <87pprbgjlu.fsf@uwakimon.sk.tsukuba.ac.jp>

Neil Girdhar writes:

 > I honestly think that Python should stick to the mathematical
 > definition of permutations rather than some kind of consensus of
 > the tiny minority of people here.

Is there an agreed mathematical definition of permutations of
*sequences*?  Every definition I can find refers to permutations of
*sets*.  I think any categorist would agree that there are a large
number of maps of _Sequence_ to _Set_, in particular the two obviously
useful ones[1]: the one that takes each element of the sequence to a
*different* element of the corresponding set, and the one that takes
equal elements of the sequence to the *same* element of the
corresponding set.  The corresponding set need not be the underlying
set of the sequence, and which one is appropriate presumably depends
on applications.

 >?When next_permutation was added to C++, I believe the whole
 > standards committee discussed it and they came up with the thing
 > that makes the most sense.

To the negligible (in several senses of the word) fraction of humanity
that participates in C++ standardization.  Python is not C++ (thanking
all the Roman and Greek gods, and refusing to identify Zeus with
Jupiter, nor Aphrodite with Venus<wink/>).

 >?The fact that dict and set use equality is I think the reason that
 > permutations should use equality.

Sequences are not sets, and dict is precisely the wrong example for
you to use, since it makes exactly the point that values that are
identical may be bound to several different keys.  We don't unify keys
in a dict just because the values are identical (or equal).  Similar,
in representing a sequence as a set, we use a set of ordered pairs,
with the first component an unique integer indicating position, and
the second the sequence element.

Since there are several useful mathematical ways to convert sequences
to sets, and in particular one very similar, if not identical, to the
one you like is enshrined in the very convenient constructor set(), I
think it's useful to leave it as it is.

 > It is universally agreed that a list of n distinct symbols has n!
 > permutations.

But that's because there's really no sensible definition of
"underlying set" for such a list except the set containing exactly the
same elements as the list.[2]  But there is no universal agreement
that "permutations of a list" is a sensible phrase.

For example, although the Wikipedia article Permutation refers to
lists of permutations, linked list representations of data, to the
"list of objects" for use in Cauchy's notation, and to the cycle
representation as a list of sequences, it doesn't once refer to
permutation of a list.

They're obvious not averse to discussing lists, but the word use for
the entity being permuted is invariably "set".


Footnotes: 
[1]  And some maps not terribly useful for our purposes, such as one
that maps all sequences to a singleton.

[2]  A categorist would disagree, but that's not interesting.



From mertz at gnosis.cx  Sat Oct 12 07:26:07 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 22:26:07 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7dvKoZgUb7Xr6oewGcQ+hYM-MBnQARJ2=usoeX4gnnZaw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <CADiSq7dvKoZgUb7Xr6oewGcQ+hYM-MBnQARJ2=usoeX4gnnZaw@mail.gmail.com>
Message-ID: <CAEbHw4ZtKL3VF7BchWae6uxt9rKasgOyruoEK_LGamhhG=YTEQ@mail.gmail.com>

What you propose, Nick, is definitely different from the several functions
that have been bandied about here.  I.e.

>>> def nick_permutations(items, r=None):
...     return (k for k, grp in groupby(permutations(sorted(items),r)))

>>> ["".join(p) for p in nonredundant_permutations('aaabb', 3)]
['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']

>>> ["".join(p) for p in nick_permutations('aaabb', 3)]
['aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab',
'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab',
'aba', 'abb', 'aba', 'abb', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab',
'bba', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba']

>>> ["".join(p) for p in permutations('aaabb', 3)]
['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba',
'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba',
'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab',
'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa',
'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa',
'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']

If I'm thinking of this right, what you give is equivalent to the initial
flawed version of 'nonredundant_permutations()' that I suggested, which
used '!=' rather than the correct '>' in comparing to the 'last' tuple.

FWIW, I deliberately chose the name 'nonredundant_permutations' rather than
MRAB's choice of 'unique_permutations' because I think what the filtering
does is precisely NOT to give unique ones. Or rather, not to give ALL
unique ones, but only those defined by equivalence (i.e. rather than
identity).

My name is ugly, and if there were to be a function like it in itertools, a
better name should be found. But such a name should emphasize that it is
"filter by equivalence classes" ... actually, maybe this suggests another
function which is instead "filter by identity of tuples", potentially also
added to itertools.


On Fri, Oct 11, 2013 at 9:35 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik at gmail.com> wrote:
> >
> > I honestly think that Python should stick to the mathematical definition
> of permutations rather than some kind of consensus of the tiny minority of
> people here.  When next_permutation was added to C++, I believe the whole
> standards committee discussed it and they came up with the thing that makes
> the most sense.  The fact that dict and set use equality is I think the
> reason that permutations should use equality.
>
> Why should the behaviour of hash based containers limit the behaviour of
> itertools?
>
> Python required a permutation solution that is memory efficient and works
> with arbitrary objects, so that's what itertools provides.
>
> However, you'd also like a memory efficient iterator for *mathematical*
> permutations that pays attention to object values and filters out
> equivalent results.
>
> I *believe* the request is equivalent to giving a name to the following
> genexp:
>
>     (k for k, grp in groupby(permutations(sorted(input))))
>
> That's a reasonable enough request (although perhaps more suited to the
> recipes section in the itertools docs), but conflating it with complaints
> about the way the existing iterator works is a good way to get people to
> ignore you (especially now the language specific reasons for the current
> behaviour have been pointed out, along with confirmation of the fact that
> backwards compatibility requirements would prohibit changing it even if we
> wanted to).
>
> Cheers,
> Nick.
>
> >
> > Neil
> >
> >
> > On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz at gnosis.cx> wrote:
> >>
> >> Related to, but not quite the same as Steven D'Aprano's point, I would
> find it very strange for itertools.permutations() to return a list that was
> narrowed to equal-but-not-identical items.
> >>
> >> This is why I've raised the example of 'items=[Fraction(3,1),
> Decimal(3.0), 3.0]' several times.  I've created the Fraction, Decimal, and
> float for distinct reasons to get different behaviors and available
> methods.  When I want to look for the permutations of those I don't want
> "any old random choice of equal values" since presumably I've given them a
> type for a reason.
> >>
> >> On the other hand, I can see a little bit of sense that
> 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me
> a list of 7!==5040 things that are exactly the same as each other.  On the
> other hand, I don't know how to generalize that, since my feeling is far
> less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's
> redundancy, but there's also important information in the probability and
> count of specific sequences.
> >>
> >> My feeling, however, is that if one were to trim down the results from
> a permutations-related function, it is more interesting to me to only
> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
> >>
> >>
> >> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> >>>
> >>> I think it's pretty indisputable that permutations are formally
> defined this way (and I challenge you to find a source that doesn't agree
> with that).  I'm sure you know that your idea of using permutations to
> evaluate a multinomial distribution is not efficient.  A nicer way to
> evaluate probabilities is to pass your set through a collections.Counter,
> and then use the resulting dictionary with scipy.stats.multinomial (if it
> exists yet).
> >>>
> >>> I believe most people will be surprised that
> len(permutations(iterable)) does count unique permutations.
> >>>
> >>> Best,
> >>>
> >>> Neil
> >>>
> >>>
> >>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> >>>>
> >>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
> >>>> > "It is universally agreed that a list of n distinct symbols has n!
> >>>> > permutations. However, when the symbols are not distinct, the most
> common
> >>>> > convention, in mathematics and elsewhere, seems to be to count only
> >>>> > distinct permutations." ?
> >>>>
> >>>> I dispute this entire premise. Take a simple (and stereotypical)
> >>>> example, picking balls from an urn.
> >>>>
> >>>> Say that you have three Red and Two black balls, and randomly select
> >>>> without replacement. If you count only unique permutations, you get
> only
> >>>> four possibilities:
> >>>>
> >>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
> >>>> {'BR', 'RB', 'RR', 'BB'}
> >>>>
> >>>> which implies that drawing RR is no more likely than drawing BB, which
> >>>> is incorrect. The right way to model this experiment is not to count
> >>>> distinct permutations, but actual permutations:
> >>>>
> >>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
> >>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB',
> 'RB',
> >>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
> >>>>
> >>>> which makes it clear that there are two ways of drawing BB compared to
> >>>> six ways of drawing RR. If that's not obvious enough, consider the
> case
> >>>> where you have two thousand red balls and two black balls -- do you
> >>>> really conclude that there are the same number of ways to pick RR as
> BB?
> >>>>
> >>>> So I disagree that counting only distinct permutations is the most
> >>>> useful or common convention. If you're permuting a collection of
> >>>> non-distinct values, you should expect non-distinct permutations.
> >>>>
> >>>> I'm trying to think of a realistic, physical situation where you would
> >>>> only want distinct permutations, and I can't.
> >>>>
> >>>>
> >>>> > Should we consider fixing itertools.permutations and to output only
> unique
> >>>> > permutations (if possible, although I realize that would break
> code).
> >>>>
> >>>> Absolutely not. Even if you were right that it should return unique
> >>>> permutations, and I strongly disagree that you were, the fact that it
> >>>> would break code is a deal-breaker.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Steven
> >>>> _______________________________________________
> >>>> Python-ideas mailing list
> >>>> Python-ideas at python.org
> >>>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>>
> >>>> --
> >>>>
> >>>> ---
> >>>> You received this message because you are subscribed to a topic in
> the Google Groups "python-ideas" group.
> >>>> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> >>>> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> >>>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Python-ideas mailing list
> >>> Python-ideas at python.org
> >>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>
> >>
> >>
> >>
> >> --
> >> Keeping medicines from the bloodstreams of the sick; food
> >> from the bellies of the hungry; books from the hands of the
> >> uneducated; technology from the underdeveloped; and putting
> >> advocates of freedom in prisons.  Intellectual property is
> >> to the 21st century what the slave trade was to the 16th.
> >
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> >
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/b4d4965a/attachment-0001.html>

From mertz at gnosis.cx  Sat Oct 12 07:38:19 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 22:38:19 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>
Message-ID: <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>

Hi Andrew,

I've sort of said as much in my last reply to Nick.  But maybe I can
clarify further.  I can imagine *someone* wanting a filtering of
permutations by either identify or equality.  Maybe, in fact, by other
comparisons also for generality.

This might suggest an API like the following:

  equal_perms = distinct_permutations(items, r, filter_by=operator.eq)
  ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

  start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b:
a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned
tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by'
might be... but that seems less important to me if there were an option to
pass a predicate as you liked.



On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Oct 11, 2013, at 19:48, David Mertz <mertz at gnosis.cx> wrote:
>
> > My feeling, however, is that if one were to trim down the results from a
> permutations-related function, it is more interesting to me to only
> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
>
> I agree with the rest of your message, but I still think you're wrong
> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating
> the two values the same would be equally surprised by {3.0, 3} having only
> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And
> so on.
>
> In Python, sets, dict keys, groups, etc. work by ==. That was a choice
> that could have been made differently, but Python made that choice long
> ago, and has applied it completely consistently, and it would be very
> strange to choose differently in this case.




-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/ef0a6fab/attachment.html>

From mertz at gnosis.cx  Sat Oct 12 07:48:25 2013
From: mertz at gnosis.cx (David Mertz)
Date: Fri, 11 Oct 2013 22:48:25 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>
 <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>
Message-ID: <CAEbHw4Y+soJs7xGXV=keubBFUVZnXngMwxxa+P34QnmH=SLoHg@mail.gmail.com>

Btw. My implementation of nonredundant_permutations *IS* guaranteed to work
by the docs for Python 3.4.  Actually for Python 2.7+.  That is, it's not
just an implementation accident (as I thought before I checked), but a
promised API of itertools.permutations that:

  Permutations are emitted in lexicographic sort order. So, if the
  input iterable is sorted, the permutation tuples will be produced in
  sorted order.

As long as that holds, my function will indeed behave correctly (but of
course, with the limitation that it blows up if different items in the
argument iterable cannot be compared using operator.lt().

On Fri, Oct 11, 2013 at 10:38 PM, David Mertz <mertz at gnosis.cx> wrote:

> Hi Andrew,
>
> I've sort of said as much in my last reply to Nick.  But maybe I can
> clarify further.  I can imagine *someone* wanting a filtering of
> permutations by either identify or equality.  Maybe, in fact, by other
> comparisons also for generality.
>
> This might suggest an API like the following:
>
>   equal_perms = distinct_permutations(items, r, filter_by=operator.eq)
>   ident_perms = distinct_permutations(items, r, filter_by=operator.is_)
>
> Or even perhaps, in some use-case that isn't clear to me, e.g.
>
>   start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b:
> a[0]==b[0])
>
> Or perhaps more plausibly, some predicate that, e.g. tests if two returned
> tuples are the same under case normalization of the strings within them.
>
> I guess the argument then would be what the default value of 'filter_by'
> might be... but that seems less important to me if there were an option to
> pass a predicate as you liked.
>
>
>
> On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>
>> On Oct 11, 2013, at 19:48, David Mertz <mertz at gnosis.cx> wrote:
>>
>> > My feeling, however, is that if one were to trim down the results from
>> a permutations-related function, it is more interesting to me to only
>> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
>>
>> I agree with the rest of your message, but I still think you're wrong
>> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating
>> the two values the same would be equally surprised by {3.0, 3} having only
>> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And
>> so on.
>>
>> In Python, sets, dict keys, groups, etc. work by ==. That was a choice
>> that could have been made differently, but Python made that choice long
>> ago, and has applied it completely consistently, and it would be very
>> strange to choose differently in this case.
>
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131011/fbd59718/attachment.html>

From steve at pearwood.info  Sat Oct 12 08:34:46 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 12 Oct 2013 17:34:46 +1100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
Message-ID: <20131012063445.GI7989@ando>

On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:
> I honestly think that Python should stick to the mathematical definition of
> permutations rather than some kind of consensus of the tiny minority of
> people here.

So do I. And that is exactly what itertools.permutations already does.



-- 
Steven

From mistersheik at gmail.com  Sat Oct 12 08:55:25 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 12 Oct 2013 02:55:25 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7dvKoZgUb7Xr6oewGcQ+hYM-MBnQARJ2=usoeX4gnnZaw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <CADiSq7dvKoZgUb7Xr6oewGcQ+hYM-MBnQARJ2=usoeX4gnnZaw@mail.gmail.com>
Message-ID: <CAA68w_=dUBcBALcoqiVb3hoeWUxX6W4=evOtLe9P9=+Hd57JUg@mail.gmail.com>

Hi Nick,

Rereading my messages, I feel like I haven't been as diplomatic as I
wanted.  Like everyone here, I care a lot about Python and I want to see it
become as perfect as it can be made.  If my wording has been too strong,
it's only out of passion for Python.

I acknowledged in my initial request that it would be impossible to change
the default behaviour of itertools.permutations.  I understand that that
ship has sailed.  I think my best proposal is to have an efficient
distinct_permutations function in itertools.  It should be in itertools so
that it is discoverable.  It should be a function rather one of the recipes
proposed to make it as efficient as possible.  (Correct me if I'm wrong,
but like the set solution, groupby is also not so efficient.)

I welcome the discussion and hope that the most efficient implementation
someone here comes up with will be added one day to itertools.

Best,

Neil


On Sat, Oct 12, 2013 at 12:35 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik at gmail.com> wrote:
> >
> > I honestly think that Python should stick to the mathematical definition
> of permutations rather than some kind of consensus of the tiny minority of
> people here.  When next_permutation was added to C++, I believe the whole
> standards committee discussed it and they came up with the thing that makes
> the most sense.  The fact that dict and set use equality is I think the
> reason that permutations should use equality.
>
> Why should the behaviour of hash based containers limit the behaviour of
> itertools?
>
> Python required a permutation solution that is memory efficient and works
> with arbitrary objects, so that's what itertools provides.
>
> However, you'd also like a memory efficient iterator for *mathematical*
> permutations that pays attention to object values and filters out
> equivalent results.
>
> I *believe* the request is equivalent to giving a name to the following
> genexp:
>
>     (k for k, grp in groupby(permutations(sorted(input))))
>
> That's a reasonable enough request (although perhaps more suited to the
> recipes section in the itertools docs), but conflating it with complaints
> about the way the existing iterator works is a good way to get people to
> ignore you (especially now the language specific reasons for the current
> behaviour have been pointed out, along with confirmation of the fact that
> backwards compatibility requirements would prohibit changing it even if we
> wanted to).
>
> Cheers,
> Nick.
>
> >
> > Neil
> >
> >
> > On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz at gnosis.cx> wrote:
> >>
> >> Related to, but not quite the same as Steven D'Aprano's point, I would
> find it very strange for itertools.permutations() to return a list that was
> narrowed to equal-but-not-identical items.
> >>
> >> This is why I've raised the example of 'items=[Fraction(3,1),
> Decimal(3.0), 3.0]' several times.  I've created the Fraction, Decimal, and
> float for distinct reasons to get different behaviors and available
> methods.  When I want to look for the permutations of those I don't want
> "any old random choice of equal values" since presumably I've given them a
> type for a reason.
> >>
> >> On the other hand, I can see a little bit of sense that
> 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me
> a list of 7!==5040 things that are exactly the same as each other.  On the
> other hand, I don't know how to generalize that, since my feeling is far
> less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's
> redundancy, but there's also important information in the probability and
> count of specific sequences.
> >>
> >> My feeling, however, is that if one were to trim down the results from
> a permutations-related function, it is more interesting to me to only
> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
> >>
> >>
> >> On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> >>>
> >>> I think it's pretty indisputable that permutations are formally
> defined this way (and I challenge you to find a source that doesn't agree
> with that).  I'm sure you know that your idea of using permutations to
> evaluate a multinomial distribution is not efficient.  A nicer way to
> evaluate probabilities is to pass your set through a collections.Counter,
> and then use the resulting dictionary with scipy.stats.multinomial (if it
> exists yet).
> >>>
> >>> I believe most people will be surprised that
> len(permutations(iterable)) does count unique permutations.
> >>>
> >>> Best,
> >>>
> >>> Neil
> >>>
> >>>
> >>> On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> >>>>
> >>>> On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
> >>>> > "It is universally agreed that a list of n distinct symbols has n!
> >>>> > permutations. However, when the symbols are not distinct, the most
> common
> >>>> > convention, in mathematics and elsewhere, seems to be to count only
> >>>> > distinct permutations." ?
> >>>>
> >>>> I dispute this entire premise. Take a simple (and stereotypical)
> >>>> example, picking balls from an urn.
> >>>>
> >>>> Say that you have three Red and Two black balls, and randomly select
> >>>> without replacement. If you count only unique permutations, you get
> only
> >>>> four possibilities:
> >>>>
> >>>> py> set(''.join(t) for t in itertools.permutations('RRRBB', 2))
> >>>> {'BR', 'RB', 'RR', 'BB'}
> >>>>
> >>>> which implies that drawing RR is no more likely than drawing BB, which
> >>>> is incorrect. The right way to model this experiment is not to count
> >>>> distinct permutations, but actual permutations:
> >>>>
> >>>> py> list(''.join(t) for t in itertools.permutations('RRRBB', 2))
> >>>> ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB',
> 'RB',
> >>>> 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
> >>>>
> >>>> which makes it clear that there are two ways of drawing BB compared to
> >>>> six ways of drawing RR. If that's not obvious enough, consider the
> case
> >>>> where you have two thousand red balls and two black balls -- do you
> >>>> really conclude that there are the same number of ways to pick RR as
> BB?
> >>>>
> >>>> So I disagree that counting only distinct permutations is the most
> >>>> useful or common convention. If you're permuting a collection of
> >>>> non-distinct values, you should expect non-distinct permutations.
> >>>>
> >>>> I'm trying to think of a realistic, physical situation where you would
> >>>> only want distinct permutations, and I can't.
> >>>>
> >>>>
> >>>> > Should we consider fixing itertools.permutations and to output only
> unique
> >>>> > permutations (if possible, although I realize that would break
> code).
> >>>>
> >>>> Absolutely not. Even if you were right that it should return unique
> >>>> permutations, and I strongly disagree that you were, the fact that it
> >>>> would break code is a deal-breaker.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Steven
> >>>> _______________________________________________
> >>>> Python-ideas mailing list
> >>>> Python-ideas at python.org
> >>>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>>
> >>>> --
> >>>>
> >>>> ---
> >>>> You received this message because you are subscribed to a topic in
> the Google Groups "python-ideas" group.
> >>>> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> >>>> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> >>>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Python-ideas mailing list
> >>> Python-ideas at python.org
> >>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>
> >>
> >>
> >>
> >> --
> >> Keeping medicines from the bloodstreams of the sick; food
> >> from the bellies of the hungry; books from the hands of the
> >> uneducated; technology from the underdeveloped; and putting
> >> advocates of freedom in prisons.  Intellectual property is
> >> to the 21st century what the slave trade was to the 16th.
> >
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/5f5b266a/attachment-0001.html>

From mistersheik at gmail.com  Sat Oct 12 09:02:47 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 12 Oct 2013 03:02:47 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>
 <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>
Message-ID: <CAA68w_=gVSEOo=+WOm-D4YaF+EAA05gPa4V86gojEv=kC57seg@mail.gmail.com>

Why not just use the standard python way to generalize this: "key" rather
than the nonstandard "filter_by".


On Sat, Oct 12, 2013 at 1:38 AM, David Mertz <mertz at gnosis.cx> wrote:

> Hi Andrew,
>
> I've sort of said as much in my last reply to Nick.  But maybe I can
> clarify further.  I can imagine *someone* wanting a filtering of
> permutations by either identify or equality.  Maybe, in fact, by other
> comparisons also for generality.
>
> This might suggest an API like the following:
>
>   equal_perms = distinct_permutations(items, r, filter_by=operator.eq)
>   ident_perms = distinct_permutations(items, r, filter_by=operator.is_)
>
> Or even perhaps, in some use-case that isn't clear to me, e.g.
>
>   start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b:
> a[0]==b[0])
>
> Or perhaps more plausibly, some predicate that, e.g. tests if two returned
> tuples are the same under case normalization of the strings within them.
>
> I guess the argument then would be what the default value of 'filter_by'
> might be... but that seems less important to me if there were an option to
> pass a predicate as you liked.
>
>
>
> On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>
>> On Oct 11, 2013, at 19:48, David Mertz <mertz at gnosis.cx> wrote:
>>
>> > My feeling, however, is that if one were to trim down the results from
>> a permutations-related function, it is more interesting to me to only
>> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
>>
>> I agree with the rest of your message, but I still think you're wrong
>> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating
>> the two values the same would be equally surprised by {3.0, 3} having only
>> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And
>> so on.
>>
>> In Python, sets, dict keys, groups, etc. work by ==. That was a choice
>> that could have been made differently, but Python made that choice long
>> ago, and has applied it completely consistently, and it would be very
>> strange to choose differently in this case.
>
>
>
>
> --
> Keeping medicines from the bloodstreams of the sick; food
> from the bellies of the hungry; books from the hands of the
> uneducated; technology from the underdeveloped; and putting
> advocates of freedom in prisons.  Intellectual property is
> to the 21st century what the slave trade was to the 16th.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/4ee5dd57/attachment.html>

From mertz at gnosis.cx  Sat Oct 12 09:09:32 2013
From: mertz at gnosis.cx (David Mertz)
Date: Sat, 12 Oct 2013 00:09:32 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_=gVSEOo=+WOm-D4YaF+EAA05gPa4V86gojEv=kC57seg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>
 <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>
 <CAA68w_=gVSEOo=+WOm-D4YaF+EAA05gPa4V86gojEv=kC57seg@mail.gmail.com>
Message-ID: <CAEbHw4a_n0io+Diq5K5T-6PAVTDRZx3H6sgSHQRp4xc5GRxu4g@mail.gmail.com>

On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar <mistersheik at gmail.com>wrote:

> Why not just use the standard python way to generalize this: "key" rather
> than the nonstandard "filter_by".
>

Yes, 'key' is a much better name than what I suggested.

I'm not quite sure how best to implement this still.  I guess MRAB's
recursive approach should work, even though I like the simplicity of my
style that takes full advantage of the existing itertools.permutations()
(and uses 1/3 as many lines of--I think clearer--code).  His has the
advantage, however, that it doesn't require operator.lt() to work...
however, without benchmarking, I have a pretty strong feeling that my
suggestion will be faster since it avoids all that recursive call overhead.
 Maybe I'm wrong about that though.


> On Sat, Oct 12, 2013 at 1:38 AM, David Mertz <mertz at gnosis.cx> wrote:
>
>> Hi Andrew,
>>
>> I've sort of said as much in my last reply to Nick.  But maybe I can
>> clarify further.  I can imagine *someone* wanting a filtering of
>> permutations by either identify or equality.  Maybe, in fact, by other
>> comparisons also for generality.
>>
>> This might suggest an API like the following:
>>
>>   equal_perms = distinct_permutations(items, r, filter_by=operator.eq)
>>   ident_perms = distinct_permutations(items, r, filter_by=operator.is_)
>>
>> Or even perhaps, in some use-case that isn't clear to me, e.g.
>>
>>   start_same_perms = distinct_permutations(items, r, filter_by=lambda
>> a,b: a[0]==b[0])
>>
>> Or perhaps more plausibly, some predicate that, e.g. tests if two
>> returned tuples are the same under case normalization of the strings within
>> them.
>>
>> I guess the argument then would be what the default value of 'filter_by'
>> might be... but that seems less important to me if there were an option to
>> pass a predicate as you liked.
>>
>>
>>
>> On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert at yahoo.com>wrote:
>>
>>> On Oct 11, 2013, at 19:48, David Mertz <mertz at gnosis.cx> wrote:
>>>
>>> > My feeling, however, is that if one were to trim down the results from
>>> a permutations-related function, it is more interesting to me to only
>>> eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
>>>
>>> I agree with the rest of your message, but I still think you're wrong
>>> here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating
>>> the two values the same would be equally surprised by {3.0, 3} having only
>>> one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And
>>> so on.
>>>
>>> In Python, sets, dict keys, groups, etc. work by ==. That was a choice
>>> that could have been made differently, but Python made that choice long
>>> ago, and has applied it completely consistently, and it would be very
>>> strange to choose differently in this case.
>>
>>
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons.  Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/5c3479f2/attachment.html>

From mistersheik at gmail.com  Sat Oct 12 09:17:43 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 12 Oct 2013 03:17:43 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131012063445.GI7989@ando>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
Message-ID: <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>

I'm sorry, but I can't find a reference supporting the statement that the
current permutations function is consistent with the mathematical
definition.  Perhaps you would like to find a reference? A quick search
yielded the book "the Combinatorics of Permutations":
http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1   Please look in the
chapter "Permutation of multisets".

Best,

Neil


On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve at pearwood.info>wrote:

> On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:
> > I honestly think that Python should stick to the mathematical definition
> of
> > permutations rather than some kind of consensus of the tiny minority of
> > people here.
>
> So do I. And that is exactly what itertools.permutations already does.
>
>
>
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/aa038cdb/attachment-0001.html>

From steve at pearwood.info  Sat Oct 12 09:35:31 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 12 Oct 2013 18:35:31 +1100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
Message-ID: <20131012073531.GJ7989@ando>

On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote:
> I think it's pretty indisputable that permutations are formally defined
> this way (and I challenge you to find a source that doesn't agree with
> that).

If by "this way" you mean "unique permutations only", then yes it 
*completely* disputable, and I am doing so right now. I'm not arguing 
one way or the other for a separate "unique_permutations" generator, 
just that the existing permutations generator does the right thing. If 
you're satisfied with that answer, you can stop reading now, because the 
rest of my post is going to be rather long:

TL;DR:

If you want a unique_permutations generator, that's a reasonable 
request. If you insist on changing permutations, that's unreasonable, 
firstly because the current behaviour is correct, and secondly because 
backwards compatibility would constrain it to keep the existing 
behaviour even if it were wrong.

.
.
.

Still here? Okay then, let me justify why I say the current behaviour is 
correct.

Speaking as a math tutor who has taught High School level combinatorics 
for 20+ years, I've never come across any text book or source that 
defines permutations in terms of unique permutations only. In every case 
that I can remember, or that I still have access to, unique permutations 
is considered a different kind of operation ("permutations ignoring 
duplicates", if you like) rather than the default. E.g. "Modern 
Mathematics 6" by Fitzpatrick and Galbraith has a separate section for 
permutations with repetition, gives the example of taking permutations 
from the word "MAMMAL", and explicitly contrasts situations where you 
consider the three letters M as "different" from when you consider them 
"the same". But in all such cases, such a situation is discussed as a 
restriction on permutations, not an expansion, that is:

* there are permutations;
* sometimes you want to only consider unique permutations;

rather than:

* there are permutations, which are always unique;
* sometimes you want to consider things which are like permutations 
  except they're not necessarily unique.


I'd even turn this around and challenge you to find a source that *does* 
define them as always unique. Here's a typical example, from the Collins 
Dictionary of Mathematics:


[quote]
**permutation** or **ordered arrangement** n. 1 an ordered arrangement 
of a specified number of objects selected from a set. The number of 
distinct permutations of r objects from n is

    n!/(n-r)!

usually written <subscript>n P <subscript>r or <superscript>n P 
<subscript>r. For example there are six distinct permutations of two 
objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>. 
Compare COMBINATION.

2. any rearrangement of all the elements of a finite sequence, such as 
(1,3,2) and (3,1,2). It is *odd* or *even* according as the number of 
exchanges of position yielding it from the original order is odd or 
even. It is a *cyclic permutation* if it merely advances all the 
elements a fixed number of places; that is, if it is a CYCLE of maximal 
LENGTH. A *transposition* is a cycle of degree two, and all permutations 
factor as products of transpositions. See also SIGNATURE.

3. any BIJECTION of a set to itself, where the set may be finite or 
infinite.
[end quote]



The definition makes no comment about how to handle duplicate elements, 
but we can derive an answer for that:

1) We're told how many permutations there are. Picking r elements out of 
n gives us n!/(n-r)!. If you throw away duplicate permutations, you will 
fall short.

2) The number of permutations shouldn't depend on the specific 
entities being permuted. Permutations of (1, 2, 3, 4) and (A, B, C, D) 
should be identical. If your set of elements contains duplicates, such 
as (Red ball, Red ball, Red ball, Black ball, Black ball), we can put 
the balls into 1:1 correspondence with integers (1, 2, 3, 4, 5), permute 
the integers, then reverse the mapping to get balls again. If we do 
this, we ought to get the same result as just permuting the balls 
directly.

(That's not to say that there are never cases where we don't care to 
distinguish betweem one red ball and another. But in general we do 
distinguish between them.)

I think this argument may hinge on what you consider *distinct*. In this 
context, if I permute the string "RRRBB", I consider all three 
characters to be distinct. Object identity is an implementation detail 
(not all programming languages have "objects"); even equality is an 
irrelevant detail. If I'm choosing to permute "RRRBB" rather than 
"RB", then clearly *to me* there must be some distinguishing factor 
between the three Rs and two Bs.

Another source is Wolfram Mathworld:

http://mathworld.wolfram.com/Permutation.html

which likewise says nothing about discarding repeated permutations when 
there are repeated elements. See also their page on "Ball Picking":

http://mathworld.wolfram.com/BallPicking.html

Last but not least, here's a source which clearly distinguishes 
permutations from "permutations with duplicates":

http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html

and even gives a distinct formula for calculating the number of 
permutations. Neither Wolfram Mathworld nor the Collins Dictionary of 
Maths consider this formula important enough to mention, which suggests 
strongly that it should be considered separate from the default 
permutations.

(A little like cyclic permutations, which are different again.)


-- 
Steven

From steve at pearwood.info  Sat Oct 12 09:39:30 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Sat, 12 Oct 2013 18:39:30 +1100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131012073531.GJ7989@ando>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <20131012073531.GJ7989@ando>
Message-ID: <20131012073930.GK7989@ando>

On Sat, Oct 12, 2013 at 06:35:31PM +1100, Steven D'Aprano wrote:

> I think this argument may hinge on what you consider *distinct*. In this 
> context, if I permute the string "RRRBB", I consider all three 
> characters to be distinct.

/s/three/five/


-- 
Steven

From bauertomer at gmail.com  Sat Oct 12 10:18:35 2013
From: bauertomer at gmail.com (TB)
Date: Sat, 12 Oct 2013 11:18:35 +0300
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131012073531.GJ7989@ando>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <20131012073531.GJ7989@ando>
Message-ID: <525905DB.8070509@gmail.com>

On 10/12/2013 10:35 AM, Steven D'Aprano wrote:
> If you want a unique_permutations generator, that's a reasonable
> request. If you insist on changing permutations, that's unreasonable,
> firstly because the current behaviour is correct, and secondly because
> backwards compatibility would constrain it to keep the existing
> behaviour even if it were wrong.
>
I agree that backwards compatibility should be kept, but the current 
behaviour of itertools.permutations is (IMHO) surprising.

So here are my 2c: Until I tried it myself, I was sure that it will be 
like the corresponding permutations functions in Sage:

sage: list(Permutations("aba"))
[['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']]

or Mathematica: 
http://www.wolframalpha.com/input/?i=permutations+of+{a%2C+b%2C+a}

Currently the docstring of itertools.permutations just says "Return 
successive r-length permutations of elements in the iterable", without 
telling what happens with input of repeated elements. The full doc in 
the reference manual is better in that regard, but I think at least one 
example with repeated elements would be nice.

Regards,
TB

From mistersheik at gmail.com  Sat Oct 12 10:20:24 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 12 Oct 2013 04:20:24 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <525905DB.8070509@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <20131012073531.GJ7989@ando> <525905DB.8070509@gmail.com>
Message-ID: <CAA68w_khWvQy0zG6BZOS8HHCksKzE7cbA8Y0N-s=3Os9NY5RFA@mail.gmail.com>

+1


On Sat, Oct 12, 2013 at 4:18 AM, TB <bauertomer at gmail.com> wrote:

> On 10/12/2013 10:35 AM, Steven D'Aprano wrote:
>
>> If you want a unique_permutations generator, that's a reasonable
>> request. If you insist on changing permutations, that's unreasonable,
>> firstly because the current behaviour is correct, and secondly because
>> backwards compatibility would constrain it to keep the existing
>> behaviour even if it were wrong.
>>
>>  I agree that backwards compatibility should be kept, but the current
> behaviour of itertools.permutations is (IMHO) surprising.
>
> So here are my 2c: Until I tried it myself, I was sure that it will be
> like the corresponding permutations functions in Sage:
>
> sage: list(Permutations("aba"))
> [['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']]
>
> or Mathematica: http://www.wolframalpha.com/**input/?i=permutations+of+{a%
> **2C+b%2C+a}<http://www.wolframalpha.com/input/?i=permutations+of+%7Ba%2C+b%2C+a%7D>
>
> Currently the docstring of itertools.permutations just says "Return
> successive r-length permutations of elements in the iterable", without
> telling what happens with input of repeated elements. The full doc in the
> reference manual is better in that regard, but I think at least one example
> with repeated elements would be nice.
>
> Regards,
> TB
>
> ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/**mailman/listinfo/python-ideas<https://mail.python.org/mailman/listinfo/python-ideas>
>
> --
>
> --- You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/**
> topic/python-ideas/**dDttJfkyu2k/unsubscribe<https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe>
> .
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe@**googlegroups.com<python-ideas%2Bunsubscribe at googlegroups.com>
> .
> For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
> .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/710b24c5/attachment.html>

From abarnert at yahoo.com  Sat Oct 12 10:22:59 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Sat, 12 Oct 2013 01:22:59 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_=dUBcBALcoqiVb3hoeWUxX6W4=evOtLe9P9=+Hd57JUg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <CADiSq7dvKoZgUb7Xr6oewGcQ+hYM-MBnQARJ2=usoeX4gnnZaw@mail.gmail.com>
 <CAA68w_=dUBcBALcoqiVb3hoeWUxX6W4=evOtLe9P9=+Hd57JUg@mail.gmail.com>
Message-ID: <E063450C-AFA1-43C8-8902-FC5A99CB4A85@yahoo.com>

On Oct 11, 2013, at 23:55, Neil Girdhar <mistersheik at gmail.com> wrote:

> I think my best proposal is to have an efficient distinct_permutations function in itertools.  It should be in itertools so that it is discoverable.  It should be a function rather one of the recipes proposed to make it as efficient as possible.  (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)
> 
> I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.

I think getting something onto PyPI (whether as part of more-itertools or elsewhere) and/or the ActiveState recipes (and maybe StackOverflow and CodeReview) is the best way to get from here to there. Continuing to discuss it here, you've only got the half dozen or so people who are on this list and haven't tuned out this thread to come up with the most efficient implementation. Put it out in the world and people will begin giving you comments/bug reports/rants calling you an idiot for missing the obvious more efficient way to do it, and then you can use their code. And then, when you're satisfied with it, you have a concrete proposal for something to add to itertools in python X.Y+1 instead of some implementation to be named later to add one day.

I was also going to suggest that you drop the argument about whether this is the one true definition of sequence permutation and just focus on whether it's a useful thing to have, but it looks like you're way ahead of me there, so never mind.

From breamoreboy at yahoo.co.uk  Sat Oct 12 10:28:55 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Sat, 12 Oct 2013 09:28:55 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <525905DB.8070509@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <20131012073531.GJ7989@ando> <525905DB.8070509@gmail.com>
Message-ID: <l3b17u$e4a$1@ger.gmane.org>

On 12/10/2013 09:18, TB wrote:
> Currently the docstring of itertools.permutations just says "Return
> successive r-length permutations of elements in the iterable", without
> telling what happens with input of repeated elements. The full doc in
> the reference manual is better in that regard, but I think at least one
> example with repeated elements would be nice.
>
> Regards,
> TB

I look forward to seeing your suggested doc patch on the Python bug tracker.

-- 
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence


From tjreedy at udel.edu  Sat Oct 12 10:41:48 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 12 Oct 2013 04:41:48 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131012073531.GJ7989@ando>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <20131012073531.GJ7989@ando>
Message-ID: <l3b206$kkv$1@ger.gmane.org>

On 10/12/2013 3:35 AM, Steven D'Aprano wrote:

> I'd even turn this around and challenge you to find a source that *does*
> define them as always unique. Here's a typical example, from the Collins
> Dictionary of Mathematics:
>
>
> [quote]
> **permutation** or **ordered arrangement** n. 1 an ordered arrangement
> of a specified number of objects selected from a set. The number of
> distinct permutations of r objects from n is
>
>      n!/(n-r)!
>
> usually written <subscript>n P <subscript>r or <superscript>n P
> <subscript>r. For example there are six distinct permutations of two
> objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>.
> Compare COMBINATION.

The items of a set are, by definition of a set, distinct, so the 
question of different but equal permutations does not arise.

> 2. any rearrangement of all the elements of a finite sequence, such as
> (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of
> exchanges of position yielding it from the original order is odd or
> even. It is a *cyclic permutation* if it merely advances all the
> elements a fixed number of places; that is, if it is a CYCLE of maximal
> LENGTH. A *transposition* is a cycle of degree two, and all permutations
> factor as products of transpositions. See also SIGNATURE.

The items of a sequence may be duplicates. But in the treatments of 
permutations I have seen (admittedly not all of them), they are 
considered to be distinguished by position, so that one may replace the 
item by counts 1 to n and vice versa.

> 3. any BIJECTION of a set to itself, where the set may be finite or
> infinite.
> [end quote]

Back to a set of distinct items again.

You are correct that itertools.permutations does the right thing by 
standard definition.

> Last but not least, here's a source which clearly distinguishes
> permutations from "permutations with duplicates":
>
> http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html
>
> and even gives a distinct formula for calculating the number of
> permutations. Neither Wolfram Mathworld nor the Collins Dictionary of
> Maths consider this formula important enough to mention, which suggests
> strongly that it should be considered separate from the default
> permutations.

The question is whether this particular variation is important inportant 
enough to put in itertools. It is not a combinatorics module and did not 
start with permutations.

-- 
Terry Jan Reedy


From ncoghlan at gmail.com  Sat Oct 12 17:07:58 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Oct 2013 01:07:58 +1000
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
Message-ID: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik at gmail.com> wrote:
>
> I'm sorry, but I can't find a reference supporting the statement that the
current permutations function is consistent with the mathematical
definition.  Perhaps you would like to find a reference? A quick search
yielded the book "the Combinatorics of Permutations":
http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1   Please look in the
chapter "Permutation of multisets".

Itertools effectively produces the permutation of (index, value) pairs.
Hence Steven's point about the permutations of a list not being
mathematically defined, so you have to decide what set to map it to in
order to decide what counts as a unique value. The mapping itertools uses
considers position in the iterable relevant so exchanging two values that
are themselves equivalent is still considered a distinct permutation since
their original position is taken into account. Like a lot of mathematics,
it's a matter of paying close attention to which entities are actually
being manipulated and how the equivalence classes are being defined :)

Hence the current proposal amounts to adding another variant that provides
the permutations of an unordered multiset instead of those of a set of
(index, value) 2-tuples (with the indices stripped from the results).

One interesting point is that combining collections.Counter.elements() with
itertools.permutations() currently does the wrong thing, since
itertools.permutations() *always* considers iterable order significant,
while for collections.Counter.elements() it's explicitly arbitrary.

Cheers,
Nick.

>
> Best,
>
> Neil
>
>
> On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve at pearwood.info>
wrote:
>>
>> On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:
>> > I honestly think that Python should stick to the mathematical
definition of
>> > permutations rather than some kind of consensus of the tiny minority of
>> > people here.
>>
>> So do I. And that is exactly what itertools.permutations already does.
>>
>>
>>
>> --
>> Steven
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to a topic in the
Google Groups "python-ideas" group.
>> To unsubscribe from this topic, visit
https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
python-ideas+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131013/1e171f83/attachment.html>

From python at mrabarnett.plus.com  Sat Oct 12 18:55:31 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 12 Oct 2013 17:55:31 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4Yv9rCEz7LOz_123x2Z1X136S1gADkwTsUPwZgWBkxB7Q@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com>
 <CAEbHw4Yv9rCEz7LOz_123x2Z1X136S1gADkwTsUPwZgWBkxB7Q@mail.gmail.com>
Message-ID: <52597F03.90509@mrabarnett.plus.com>

On 12/10/2013 03:36, David Mertz wrote:
> Hi MRAB,
>
> I'm confused by your implementation.  In particular, what do these lines do?
>
>      # [...]
>      items = list(iterable)
>      keys = {}
>      for item in items:
>          keys.setdefault(item, len(keys))
>      items.sort(key=keys.get)
>
> I cannot understand how these can possibly have any effect (other than
> the first line that makes a concrete list out of an iterable).
>
> We loop through the list in its natural order.  E.g. say the list is
> '[a, b, c]' (where those names are any types of objects whatsoever).
>   The loop gives us:
>
>      keys == {a:0, b:1, c:2}
>
> When we do a sort on 'key=keys.get()' how can that ever possibly change
> the order of 'items'?
>
You're assuming that no item is equal to any other.

Try this:

keys = {}
for item in [1, 2, 2.0]:
     keys.setdefault(item, len(keys))

You'll get:

keys == {1: 0, 2: 1}

because 2 == 2.0.

> There's also a bit of a flaw in that your implementation blows up if
> anything yielded by iterable isn't hashable:
>
>      >>> list(unique_permutations([ [1,2],[3,4],[5,6] ]))
>      Traceback (most recent call last):
>        File "<stdin>", line 1, in <module>
>      TypeError: unhashable type: 'list'
>
That is true, so here is yet another implementation:

----8<----------------------------------------8<----

def distinct_permutations(iterable, count=None):
     def perm(items, count):
         if count:
             prev_item = object()

             for i, item in enumerate(items):
                 if item != prev_item:
                     for p in perm(items[ : i] + items[i + 1 : ], count 
- 1):
                         yield [item] + p

                     prev_item = item
         else:
             yield []

     hashable_items = {}
     unhashable_items = []

     for item in iterable:
         try:
             hashable_items[item].append(item)
         except KeyError:
             hashable_items[item] = [item]
         except TypeError:
             for key, values in unhashable_items:
                 if key == item:
                     values.append(item)
                     break
             else:
                 unhashable_items.append((item, [item]))

     items = []

     for values in hashable_items.values():
         items.extend(values)

     for key, values in unhashable_items:
         items.extend(values)

     if count is None:
         count = len(items)

     yield from perm(items, count)

----8<----------------------------------------8<----

It uses a dict for speed, with the fallback of a list for unhashable
items.

>
>      >>> list(permutations([[1,2],[3,4],[5,6]]))
>      [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1,
> 2], [5, 6]),
>      ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3,
> 4], [1, 2])]
>
> This particular one also succeeds with my nonredundant_permutations:
>
>      >>> list(nonredundant_permutations([[1,2],[3,4],[5,6]]))
>      [([1, 2], [3, 4], [5, 6]), ([1, 2], [5, 6], [3, 4]), ([3, 4], [1,
> 2], [5, 6]),
>      ([3, 4], [5, 6], [1, 2]), ([5, 6], [1, 2], [3, 4]), ([5, 6], [3,
> 4], [1, 2])]
>
My result is:

 >>> list(distinct_permutations([[1,2],[3,4],[5,6]]))
[[[1, 2], [3, 4], [5, 6]], [[1, 2], [5, 6], [3, 4]], [[3, 4], [1, 2], 
[5, 6]], [[3, 4], [5, 6], [1, 2]], [[5, 6], [1, 2], [3, 4]], [[5, 6], 
[3, 4], [1, 2]]]

> However, my version *DOES* fail when things cannot be compared under
> inequality:
>
>      >>> list(nonredundant_permutations([[1,2],3,4]))
>      Traceback (most recent call last):
>        File "<stdin>", line 1, in <module>
>        File "<stdin>", line 3, in nonredundant_permutations
>      TypeError: unorderable types: int() < list()
>
> This also doesn't afflict itertools.permutations.
>
My result is:

 >>> list(distinct_permutations([[1,2],3,4]))
[[3, 4, [1, 2]], [3, [1, 2], 4], [4, 3, [1, 2]], [4, [1, 2], 3], [[1, 
2], 3, 4], [[1, 2], 4, 3]]


From mertz at gnosis.cx  Sat Oct 12 18:56:13 2013
From: mertz at gnosis.cx (David Mertz)
Date: Sat, 12 Oct 2013 09:56:13 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
Message-ID: <CAEbHw4byqODXVV=V+c0SJv4evVpPjoNmt8wyVgv+B2+n6zO66Q@mail.gmail.com>

On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> One interesting point is that combining collections.Counter.elements()
> with itertools.permutations() currently does the wrong thing, since
> itertools.permutations() *always* considers iterable order significant,
> while for collections.Counter.elements() it's explicitly arbitrary.
>
I hadn't thought about it, but as I read the docs for 3.4 (and it's the
same back through 2.7), not only would both of these be permissible in a
Python implementation:

  >>> list(collections.Counter({'a':2,'b':1}).elements())
  ['a', 'a', 'b']

Or:

  >>> list(collections.Counter({'a':2,'b':1}).elements())
  ['b', 'a', 'a']

But even this would be per documentation (although really unlikely as an
implementation):

  >>> list(collections.Counter({'a':2,'b':1}).elements())
  ['a', 'b', 'a']

-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/250ac951/attachment.html>

From raymond.hettinger at gmail.com  Sat Oct 12 19:34:26 2013
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sat, 12 Oct 2013 10:34:26 -0700
Subject: [Python-ideas] An exhaust() function for iterators
In-Reply-To: <CANW+cAXMjP2KrPuF21CFQy16iaaHzOs-GM=jkPyPxODpOWpmsA@mail.gmail.com>
References: <CANW+cAXMjP2KrPuF21CFQy16iaaHzOs-GM=jkPyPxODpOWpmsA@mail.gmail.com>
Message-ID: <1657273B-685C-4335-A9E4-5DF5775DE620@gmail.com>


On Sep 28, 2013, at 9:06 PM, Clay Sweetser <clay.sweetser at gmail.com> wrote:

> 
> As it turns out, the fastest and most efficient method available in
> the standard library is collections.deque's __init__ and extend
> methods.


That technique is shown in the itertools docs in the consume() recipe.
It is the fastest way in CPython (in PyPy, a straight for-loop will
likely be the fastest).

I didn't immortalize it as a real itertool because I think most code is
better-off with a straight for-loop.

The itertools were inspired by functional languages and intended
to be used in a functional style where iterators with side-effects
would be considered bad form.

A regular for-loop is only a little bit slower, but it has a number of virtues
including clarity, signal checking, and thread switching.

In a real application, the speed difference of consume() vs a for-loop
is likely to be insignificant if the iterator is doing anything interesting at all.


Raymond



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/2620c31f/attachment-0001.html>

From mistersheik at gmail.com  Sat Oct 12 20:56:55 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 12 Oct 2013 14:56:55 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
Message-ID: <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>

Yes, you're right and I understand what's been done although like the 30
upvoters to the linked stackoverflow question, I find the current behaviour
surprising and would like to see a distinct_permutations function.  How do
I start to submit a patch?

Neil


On Sat, Oct 12, 2013 at 11:07 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik at gmail.com> wrote:
> >
> > I'm sorry, but I can't find a reference supporting the statement that
> the current permutations function is consistent with the mathematical
> definition.  Perhaps you would like to find a reference? A quick search
> yielded the book "the Combinatorics of Permutations":
> http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1   Please look in the
> chapter "Permutation of multisets".
>
> Itertools effectively produces the permutation of (index, value) pairs.
> Hence Steven's point about the permutations of a list not being
> mathematically defined, so you have to decide what set to map it to in
> order to decide what counts as a unique value. The mapping itertools uses
> considers position in the iterable relevant so exchanging two values that
> are themselves equivalent is still considered a distinct permutation since
> their original position is taken into account. Like a lot of mathematics,
> it's a matter of paying close attention to which entities are actually
> being manipulated and how the equivalence classes are being defined :)
>
> Hence the current proposal amounts to adding another variant that provides
> the permutations of an unordered multiset instead of those of a set of
> (index, value) 2-tuples (with the indices stripped from the results).
>
> One interesting point is that combining collections.Counter.elements()
> with itertools.permutations() currently does the wrong thing, since
> itertools.permutations() *always* considers iterable order significant,
> while for collections.Counter.elements() it's explicitly arbitrary.
>
> Cheers,
> Nick.
>
> >
> > Best,
> >
> > Neil
> >
> >
> > On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> >>
> >> On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:
> >> > I honestly think that Python should stick to the mathematical
> definition of
> >> > permutations rather than some kind of consensus of the tiny minority
> of
> >> > people here.
> >>
> >> So do I. And that is exactly what itertools.permutations already does.
> >>
> >>
> >>
> >> --
> >> Steven
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at python.org
> >> https://mail.python.org/mailman/listinfo/python-ideas
> >>
> >> --
> >>
> >> ---
> >> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> >> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> >> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/3aa98e1a/attachment.html>

From raymond.hettinger at gmail.com  Sun Oct 13 02:44:38 2013
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sat, 12 Oct 2013 17:44:38 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
Message-ID: <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>


On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik at gmail.com> wrote:

> , I find the current behaviour surprising and would like to see a distinct_permutations function.
>  How do I start to submit a patch?

You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer).

That said, the odds of it being accepted are slim.
There are many ways to write combinatoric functions
(Knuth has a whole book on the subject) and I don't
aspire to include multiple variants unless there are
strong motivating use cases.

In general, if someone wants to eliminate duplicates
from the population, they can do so easily with:

   permutations(set(population), n)

The current design solves the most common use cases
and it has some nice properties such as:
 * permutations is a subsequence of product
 * no assumptions are made about the comparability
   or orderability of members of the population
 * len(list(permutations(range(n), r))) == n! / (n-r)! 
   just like you were taught in school
 * it is fast

For more exotic needs, I think is appropriate to look
outside the standard library to more full-featured
combinatoric libraries (there are several listed at
pypi.python.org).

  
Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/c2474072/attachment.html>

From mistersheik at gmail.com  Sun Oct 13 03:24:36 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sat, 12 Oct 2013 21:24:36 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
Message-ID: <CAA68w_ke--We-+Ri3jEsGNgH7O54XdNxT2JORe45KFywHLeMYw@mail.gmail.com>

Hi Raymond,

I agree with you on the consistency point with itertools.product.  That's a
great point.

However, permutations(set(population)) is not the correct way to take the
permutations of a multiset.  Please take a look at how permutations are
taken from a multiset from any of the papers I linked or any paper that you
can find on the internet.  The number of permutations of multiset is n! /
 \prod a_i! for a_i are the element counts ? just like I was taught in
school.

There is currently no fast way to find these permutations of a multiset and
it is a common operation for solving problems.  What is needed, I think is
a function multiset_permutations that accepts an iterable.

Best,

Neil


On Sat, Oct 12, 2013 at 8:44 PM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik at gmail.com> wrote:
>
> , I find the current behaviour surprising and would like to see a
> distinct_permutations function.
>
>  How do I start to submit a patch?
>
>
> You can submit your patch at http://bugs.python.org and assign it to me
> (the module designer and maintainer).
>
> That said, the odds of it being accepted are slim.
> There are many ways to write combinatoric functions
> (Knuth has a whole book on the subject) and I don't
> aspire to include multiple variants unless there are
> strong motivating use cases.
>
> In general, if someone wants to eliminate duplicates
> from the population, they can do so easily with:
>
>    permutations(set(population), n)
>
> The current design solves the most common use cases
> and it has some nice properties such as:
>  * permutations is a subsequence of product
>  * no assumptions are made about the comparability
>    or orderability of members of the population
>  * len(list(permutations(range(n), r))) == n! / (n-r)!
>    just like you were taught in school
>  * it is fast
>
> For more exotic needs, I think is appropriate to look
> outside the standard library to more full-featured
> combinatoric libraries (there are several listed at
> pypi.python.org).
>
>
> Raymond
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/a581ff10/attachment-0001.html>

From ethan at stoneleaf.us  Sun Oct 13 03:11:18 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Sat, 12 Oct 2013 18:11:18 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
Message-ID: <5259F336.9070203@stoneleaf.us>

On 10/12/2013 05:44 PM, Raymond Hettinger wrote:
>
> On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik at gmail.com <mailto:mistersheik at gmail.com>> wrote:
>
>> , I find the current behaviour surprising and would like to see a distinct_permutations function.
>>  How do I start to submit a patch?
>
> You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer).
>
> That said, the odds of it being accepted are slim.

+1

About the only improvement I can see would be a footnote in the itertools doc table that lists the different 
combinatorics.  Being a naive permutations user myself I would have made the mistake of thinking that "r-length tuples, 
all possible orderings, no repeated elements" meant no repeated values.  The longer text for permutations makes it clear 
how it works.

My rst-foo is not good enough to link from the table down into the permutation text where the distinction is made clear. 
  If no one beats me to a proposed patch I'll see if I can figure it out.

--
~Ethan~

From steve at pearwood.info  Sun Oct 13 03:47:42 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Sun, 13 Oct 2013 12:47:42 +1100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
Message-ID: <20131013014742.GR7989@ando>

On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote:

> In general, if someone wants to eliminate duplicates
> from the population, they can do so easily with:
> 
>    permutations(set(population), n)

In fairness Raymond, the proposal is not to eliminate duplicates from 
the population, but from the permutations themselves. Consider the 
example I gave earlier, where you're permuting "RRRBB" two items at a 
time. There are 20 permutations including duplicates, but sixteen of 
them are repeated:

py> list(''.join(t) for t in permutations("RRRBB", 2))
['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 
'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']
py> set(''.join(t) for t in permutations("RRRBB", 2))
{'BR', 'RB', 'RR', 'BB'}

But if you eliminate duplicates from the population first, you get 
only two permutations:

py> list(''.join(t) for t in permutations(set("RRRBB"), 2))
['BR', 'RB']


If it were just a matter of calling set() on the output of permutations, 
that would be trivial enough. But, you might care about order, or 
elements might not be hashable, or you might have a LOT of permutations 
to generate before discarding:

population = "R"*1000 + "B"*500
set(''.join(t) for t in permutations(population, 2))  # takes a while...

In my opinion, if unique_permutations is no more efficient than calling 
set on the output of permutations, it's not worth it. But if somebody 
can come up with an implementation which is significantly more 
efficient, without making unreasonable assumptions about orderability, 
hashability or even comparability, then in my opinion that might be 
worthwhile.


-- 
Steven

From raymond.hettinger at gmail.com  Sun Oct 13 05:03:43 2013
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Sat, 12 Oct 2013 20:03:43 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131013014742.GR7989@ando>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
Message-ID: <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>


On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> the proposal is not to eliminate duplicates from 
> the population, but from the permutations themselves.

I'm curious about the use cases for this.
Other than red/blue marble examples and some puzzle problems,
does this come-up in any real problems?  Do we actually need this?


Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131012/3ff8be2f/attachment.html>

From mistersheik at gmail.com  Sun Oct 13 09:38:40 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sun, 13 Oct 2013 03:38:40 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
Message-ID: <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>

My intuition is that we want Python to be "complete".  Many other languages
can find the permutations of a multiset.  Python has a permutations
function.  Many people on stackoverflow expected that function to be able
to find those permutations.

One suggestion: Why not make it so that itertools.permutations checks if
its argument is an instance of collections.Mapping?  If it is, we could
interpret the items as a mapping from elements to positive integers, which
is a compact representation of a multiset.  Then, it could do the right
thing for that case.

Best,
Neil




On Sat, Oct 12, 2013 at 11:03 PM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>
> the proposal is not to eliminate duplicates from
> the population, but from the permutations themselves.
>
>
> I'm curious about the use cases for this.
> Other than red/blue marble examples and some puzzle problems,
> does this come-up in any real problems?  Do we actually need this?
>
>
> Raymond
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131013/25d072e2/attachment.html>

From ncoghlan at gmail.com  Sun Oct 13 11:27:54 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Oct 2013 19:27:54 +1000
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
 <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
Message-ID: <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>

On 13 October 2013 17:38, Neil Girdhar <mistersheik at gmail.com> wrote:
> My intuition is that we want Python to be "complete".  Many other languages
> can find the permutations of a multiset.  Python has a permutations
> function.  Many people on stackoverflow expected that function to be able to
> find those permutations.

Nope, we expressly *don't* want the standard library to be "complete",
because that would mean growing to the size of PyPI (or larger).
There's always going to be scope for applications to adopt new domain
specific dependencies with more in-depth support than that provided by
the standard library.

Many standard library modules are in fact deliberately designed as
"stepping stone" modules that will meet the needs of code which have
an incidental relationship to that task, but will need to be replaced
with something more sophisticated for code directly related to that
domain. Many times, that means they will ignore as irrelevant
distinctions that are critical in certain contexts, simply because
they don't come up all that often outside those specific domains, and
addressing them involves making the core module more complicated to
use for more typical cases.

In this case, the proposed alternate permutations mechanism only makes
a difference when:

1. The data set contains equivalent values
2. Input order is not considered significant, so exchanging equivalent
values should *not* create a new permutation (i.e. multiset
permutations rather than sequence permutations).

If users aren't likely to encounter situations where that makes a
difference, then providing both in the standard library isn't being
helpful, it's being actively user hostile by asking them to make a
decision they're not yet qualified to make for the sake of the few
experts that specifically need . Hence Raymond's request for data
modelling problems outside the "learning or studying combinatorics"
context to make the case for standard library inclusion.

Interestingly, I just found another language which has the equivalent
of the currrent behaviour of itertools.permutations: Haskell has it as
Data.List.permutations. As far as I can tell, Haskell doesn't offer
support for multiset permutations in the core, you need an additional
package like Math.Combinatorics (see:
http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4).

Since iterator based programming in Python is heavily inspired by
Haskell, this suggests that the current behaviour of
itertools.permutations is appropriate and that Raymond is right to be
dubious about including multiset permutations support directly in the
standard library.

Those interested in improving the experience of writing combinatorics
code in Python may wish to look into helping out with the
combinatorics package on PyPI:
http://phillipmfeldman.org/Python/for_developers.html (For example,
politely approach Phillip to see if he is interested in hosting it on
GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving
the PyPI metadata, etc - note I have no experience with this package,
it's just the first hit for "python combinatorics")

> One suggestion: Why not make it so that itertools.permutations checks if its
> argument is an instance of collections.Mapping?  If it is, we could
> interpret the items as a mapping from elements to positive integers, which
> is a compact representation of a multiset.  Then, it could do the right
> thing for that case.

If you want to go down the path of only caring about hashable values,
you may want to argue for a permutations method on collections.Counter
(it's conceivable that approach has the potential to be even faster
than an approach based on accepting and processing an arbitrary
iterable, since it can avoid generating repeated values in the first
place).

A Counter based multiset permutation algorithm was actually posted to
python-list back in 2009, just after collections.Counter was
introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html

I just created an updated version of that recipe and posted it as
https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From breamoreboy at yahoo.co.uk  Sun Oct 13 13:05:30 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Sun, 13 Oct 2013 12:05:30 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
Message-ID: <l3dupi$9rt$1@ger.gmane.org>

On 13/10/2013 08:38, Neil Girdhar wrote:
> My intuition is that we want Python to be "complete".

No thank you.  I much prefer "Python in a Nutshell" the size it is now, 
I'm not interested in competing with (say) "Java in a Nutshell".

-- 
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence


From oscar.j.benjamin at gmail.com  Sun Oct 13 17:54:16 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Sun, 13 Oct 2013 16:54:16 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
Message-ID: <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>

On 11 October 2013 22:38, Neil Girdhar <mistersheik at gmail.com> wrote:
> My code, which was the motivation for this suggestion:
>
> import itertools as it
> import math
>
> def is_prime(n):
>     for i in range(2, int(math.floor(math.sqrt(n))) + 1):
>         if n % i == 0:
>             return False
>     return n >= 2

I don't really understand what your code is doing but I just wanted to
point out that the above will fail for large integers (maybe not
relevant in your case):

>>> is_prime(2**19937-1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "tmp.py", line 3, in is_prime
    for i in range(2, int(math.floor(math.sqrt(n))) + 1):
OverflowError: long int too large to convert to float

Even without the OverflowError I suspect that there are primes p >
~1e16 such that is_prime(p**2) would incorrectly return True. This is
a consequence of depending on FP arithmetic in what should be exact
computation. The easy fix is to break when i**2 > n avoiding the
tricky sqrt operation. Alternatively you can use an exact integer sqrt
function to fix this:

def sqrt_floor(y):
    try:
        x = int(math.sqrt(y))
    except OverflowError:
        x = y
    while not (x ** 2 <= y < (x+1) ** 2):
        x = (x + y // x) // 2
    return x


Oscar

From mistersheik at gmail.com  Sun Oct 13 20:29:38 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sun, 13 Oct 2013 14:29:38 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
 <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
Message-ID: <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>

Did you read the problem?   Anyway, let's not get off topic (permutations).

Neil


On Sun, Oct 13, 2013 at 11:54 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com
> wrote:

> On 11 October 2013 22:38, Neil Girdhar <mistersheik at gmail.com> wrote:
> > My code, which was the motivation for this suggestion:
> >
> > import itertools as it
> > import math
> >
> > def is_prime(n):
> >     for i in range(2, int(math.floor(math.sqrt(n))) + 1):
> >         if n % i == 0:
> >             return False
> >     return n >= 2
>
> I don't really understand what your code is doing but I just wanted to
> point out that the above will fail for large integers (maybe not
> relevant in your case):
>
> >>> is_prime(2**19937-1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "tmp.py", line 3, in is_prime
>     for i in range(2, int(math.floor(math.sqrt(n))) + 1):
> OverflowError: long int too large to convert to float
>
> Even without the OverflowError I suspect that there are primes p >
> ~1e16 such that is_prime(p**2) would incorrectly return True. This is
> a consequence of depending on FP arithmetic in what should be exact
> computation. The easy fix is to break when i**2 > n avoiding the
> tricky sqrt operation. Alternatively you can use an exact integer sqrt
> function to fix this:
>
> def sqrt_floor(y):
>     try:
>         x = int(math.sqrt(y))
>     except OverflowError:
>         x = y
>     while not (x ** 2 <= y < (x+1) ** 2):
>         x = (x + y // x) // 2
>     return x
>
>
> Oscar
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131013/c4113e35/attachment.html>

From tim.peters at gmail.com  Sun Oct 13 21:02:56 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 13 Oct 2013 14:02:56 -0500
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <5258B539.10307@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
Message-ID: <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>

[MRAB, posts a beautiful solution]

I don't really have a use for this, but it was a lovely programming
puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
below.  And that's the end of my interest in this ;-)

It doesn't require that elements be orderable or even hashable.  It
does require that they can be compared for equality, but it's pretty
clear that if we _do_ include something like this, "equality" has to
be pluggable.  By default, this uses `operator.__eq__`, but any
2-argument function can be used.  E.g., use `operator.is_` to make it
believe that only identical objects are equal.  Or pass a lambda to
distinguish by type too (e.g., if you don't want 3 and 3.0 to be
considered equal).  Etc.

The code is much lower-level, to make it closer to an efficient C
implementation.  No dicts, no sets, no slicing or concatenation of
lists, etc.  It sticks to using little integers (indices) as far as
possible, which can be native in C (avoiding mounds of increfs and
decrefs).

Also, because "equality" is pluggable, it may be a slow operation.
The `equal()` function is only called here during initial setup, to
partition the elements into equivalence classes.  Where N is
len(iterables), at best `equal()` is called N-1 times (if all elements
happen to be equal), and at worst N*(N-1)/2 times (if no elements
happen to be equal), all independent of `count`.  It assumes `equal()`
is transitive.

It doesn't always return permutations in the same order as MRAB's
function, because - to avoid any searching - it iterates over
equivalence classes instead of over the original iterables.  This is
the simplest differing example I can think of:

>>> list(unique_permutations("aba", 2))
[('a', 'b'), ('a', 'a'), ('b', 'a')]

For the first result, MRAB's function first picks the first 'a', then
removes it from the iterables and recurses on ("ba", 1).  So it finds
'b' next, and yields ('a', 'b') (note:  this is the modified
unique_permutations() below - MRAB's original actually yielded lists,
not tuples).

But:

>>> list(up("aba", 2))
[('a', 'a'), ('a', 'b'), ('b', 'a')]

Different order!  That's because "up" is iterating over (conceptually)

    [EquivClass(first 'a', second 'a'), EquivClass('b')]

It first picks the first `a`, then adjusts list pointers (always a
fast, constant-time operation) so that it recurses on

    [EquivClass(second 'a'), EquivClass('b')]

So it next finds the second 'a', and yields (first 'a', second 'a') as
its first result.  Maybe this will make it clearer:

>>> list(up(["a1", "b", "a2"], 2, lambda x, y: x[0]==y[0]))
[('a1', 'a2'), ('a1', 'b'), ('b', 'a1')]

No, I guess that didn't make it clearer - LOL ;-)  Do I care?  No.

Anyway, here's the code.  Have fun :-)

# MRAB's beautiful solution, modified in two ways to be
# more like itertools.permutations:
# 1. Yield tuples instead of lists.
# 2. When count > len(iterable), don't yield anything.

def unique_permutations(iterable, count=None):
    def perm(items, count):
        if count:
            seen = set()
            for i, item in enumerate(items):
                if item not in seen:
                    for p in perm(items[:i] + items[i+1:], count - 1):
                        yield [item] + p
                    seen.add(item)
        else:
            yield []

    items = list(iterable)
    if count is None:
        count = len(items)
    if count > len(items):
        return
    for p in perm(items, count):
        yield tuple(p)

# New code, ending in generator `up()`.
import operator

# In C, this would be a struct of native C types,
# and the brief methods would be coded inline.
class ENode:
    def __init__(self, initial_index=None):
        self.indices = [initial_index] # list of equivalent indices
        self.current = 0
        self.prev = self.next = self

    def index(self):
        "Return current index."
        return self.indices[self.current]

    def unlink(self):
        "Remove self from list."
        self.prev.next = self.next
        self.next.prev = self.prev

    def insert_after(self, x):
        "Insert node x after self."
        x.prev = self
        x.next = self.next
        self.next.prev = x
        self.next = x

    def advance(self):
        """Advance the current index.

        If we're already at the end, remove self from list.

        .restore() undoes everything .advance() did."""

        assert self.current < len(self.indices)
        self.current += 1
        if self.current == len(self.indices):
            self.unlink()

    def restore(self):
        "Undo what .advance() did."
        assert self.current <= len(self.indices)
        if self.current == len(self.indices):
            self.prev.insert_after(self)
        self.current -= 1

def build_equivalence_classes(items, equal):
    ehead = ENode() # headed, doubly-linked circular list of equiv classes
    for i, elt in enumerate(items):
        e = ehead.next
        while e is not ehead:
            if equal(elt, items[e.indices[0]]):
                # Add (index of) elt to this equivalence class.
                e.indices.append(i)
                break
            e = e.next
        else:
            # elt not equal to anything seen so far:  append
            # new equivalence class.
            e = ENode(i)
            ehead.prev.insert_after(e)
    return ehead

def up(iterable, count=None, equal=operator.__eq__):
    def perm(i):
        if i:
            e = ehead.next
            assert e is not ehead
            while e is not ehead:
                result[count - i] = e.index()
                e.advance()
                yield from perm(i-1)
                e.restore()
                e = e.next
        else:
            yield tuple(items[j] for j in result)

    items = tuple(iterable)
    if count is None:
        count = len(items)
    if count > len(items):
        return

    ehead = build_equivalence_classes(items, equal)
    result = [None] * count
    yield from perm(count)

From python at mrabarnett.plus.com  Sun Oct 13 21:30:42 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 13 Oct 2013 20:30:42 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
Message-ID: <525AF4E2.6010301@mrabarnett.plus.com>

On 13/10/2013 20:02, Tim Peters wrote:
> [MRAB, posts a beautiful solution]
>
> I don't really have a use for this, but it was a lovely programming
> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
> below.  And that's the end of my interest in this ;-)
>
> It doesn't require that elements be orderable or even hashable.  It
> does require that they can be compared for equality, but it's pretty
> clear that if we _do_ include something like this, "equality" has to
> be pluggable.  By default, this uses `operator.__eq__`, but any
> 2-argument function can be used.  E.g., use `operator.is_` to make it
> believe that only identical objects are equal.  Or pass a lambda to
> distinguish by type too (e.g., if you don't want 3 and 3.0 to be
> considered equal).  Etc.
>
[snip]
I posted yet another implementation after that one.


From oscar.j.benjamin at gmail.com  Sun Oct 13 21:34:09 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Sun, 13 Oct 2013 20:34:09 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
 <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
 <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>
Message-ID: <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>

On 13 October 2013 19:29, Neil Girdhar <mistersheik at gmail.com> wrote:
> Did you read the problem?

I did but since you showed some code that you said you were working on
I thought you'd be interested to know that it could be improved.

> Anyway, let's not get off topic (permutations).

Getting back to your proposal, I disagree that permutations should be
"fixed". The current behaviour is correct. If I was asked to define a
permutation I would have given definition #3 from Steven's list: a
bijection from a set to itself. Formally a permutation of a collection
of non-unique elements is not defined.

They may also be uses for a function like the one that you proposed
but I've never needed it (and I have used permutations a few times)
and no one in this thread (including you) has given a use-case for
this.


Oscar

From mistersheik at gmail.com  Sun Oct 13 21:39:19 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sun, 13 Oct 2013 15:39:19 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
 <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
 <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>
 <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>
Message-ID: <CAA68w_mkQ-m=db-jkD=jmhKYxqnjWZ2PvGZykVtSBM94U=w2mQ@mail.gmail.com>

On Sun, Oct 13, 2013 at 3:34 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com>wrote:

> On 13 October 2013 19:29, Neil Girdhar <mistersheik at gmail.com> wrote:
> > Did you read the problem?
>
> I did but since you showed some code that you said you were working on
> I thought you'd be interested to know that it could be improved.
>

The code solves the problem according to its specification :)  (The numbers
are less than 1e8.)


> > Anyway, let's not get off topic (permutations).
>
> Getting back to your proposal, I disagree that permutations should be
> "fixed". The current behaviour is correct. If I was asked to define a
> permutation I would have given definition #3 from Steven's list: a
> bijection from a set to itself. Formally a permutation of a collection
> of non-unique elements is not defined.
>
> They may also be uses for a function like the one that you proposed
> but I've never needed it (and I have used permutations a few times)
> and no one in this thread (including you) has given a use-case for
> this.
>
>
> Oscar
>

The problem is a use-case.  Did you read it?  Did you try solving it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131013/5516edb6/attachment.html>

From python at mrabarnett.plus.com  Sun Oct 13 22:04:21 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 13 Oct 2013 21:04:21 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
 <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
 <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>
 <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>
Message-ID: <525AFCC5.4070309@mrabarnett.plus.com>

On 13/10/2013 20:34, Oscar Benjamin wrote:
> On 13 October 2013 19:29, Neil Girdhar <mistersheik at gmail.com> wrote:
>> Did you read the problem?
>
> I did but since you showed some code that you said you were working on
> I thought you'd be interested to know that it could be improved.
>
>> Anyway, let's not get off topic (permutations).
>
> Getting back to your proposal, I disagree that permutations should be
> "fixed". The current behaviour is correct. If I was asked to define a
> permutation I would have given definition #3 from Steven's list: a
> bijection from a set to itself. Formally a permutation of a collection
> of non-unique elements is not defined.
>
> They may also be uses for a function like the one that you proposed
> but I've never needed it (and I have used permutations a few times)
> and no one in this thread (including you) has given a use-case for
> this.
>
Here's a use case: anagrams.


From mistersheik at gmail.com  Sun Oct 13 22:56:55 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Sun, 13 Oct 2013 16:56:55 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
Message-ID: <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>

Executive summary:  Thanks for discussion everyone.  I'm now convinced that
itertools.permutations is fine as it is.  I am not totally convinced that
multiset_permutations doesn't belong in itertools, or else there should be
a standard combinatorics library.

On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 13 October 2013 17:38, Neil Girdhar <mistersheik at gmail.com> wrote:
> > My intuition is that we want Python to be "complete".  Many other
> languages
> > can find the permutations of a multiset.  Python has a permutations
> > function.  Many people on stackoverflow expected that function to be
> able to
> > find those permutations.
>
> Nope, we expressly *don't* want the standard library to be "complete",
> because that would mean growing to the size of PyPI (or larger).
> There's always going to be scope for applications to adopt new domain
> specific dependencies with more in-depth support than that provided by
> the standard library.
>

By complete I meant that just as if you were to add the "error function,
erf" to math, you would want to add an equivalent version to cmath.  When I
saw the permutation function in itertools, I expected that it would work on
both sets and multisets, or else there would be another function that did.

>
> Many standard library modules are in fact deliberately designed as
> "stepping stone" modules that will meet the needs of code which have
> an incidental relationship to that task, but will need to be replaced
> with something more sophisticated for code directly related to that
> domain. Many times, that means they will ignore as irrelevant
> distinctions that are critical in certain contexts, simply because
> they don't come up all that often outside those specific domains, and
> addressing them involves making the core module more complicated to
> use for more typical cases.
>

Good point.

>
> In this case, the proposed alternate permutations mechanism only makes
> a difference when:
>
> 1. The data set contains equivalent values
> 2. Input order is not considered significant, so exchanging equivalent
> values should *not* create a new permutation (i.e. multiset
> permutations rather than sequence permutations).
>
> If users aren't likely to encounter situations where that makes a
> difference, then providing both in the standard library isn't being
> helpful, it's being actively user hostile by asking them to make a
> decision they're not yet qualified to make for the sake of the few
> experts that specifically need . Hence Raymond's request for data
> modelling problems outside the "learning or studying combinatorics"
> context to make the case for standard library inclusion.
>
> Interestingly, I just found another language which has the equivalent
> of the currrent behaviour of itertools.permutations: Haskell has it as
> Data.List.permutations. As far as I can tell, Haskell doesn't offer
> support for multiset permutations in the core, you need an additional
> package like Math.Combinatorics (see:
>
> http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4
> ).
>
> Since iterator based programming in Python is heavily inspired by
> Haskell, this suggests that the current behaviour of
> itertools.permutations is appropriate and that Raymond is right to be
> dubious about including multiset permutations support directly in the
> standard library.
>
>
You've convinced me that itertools permutations is doing the right thing :)
 I'm not sure if multiset permutations should be in the standard library or
not.  It is very useful.


> Those interested in improving the experience of writing combinatorics
> code in Python may wish to look into helping out with the
> combinatorics package on PyPI:
> http://phillipmfeldman.org/Python/for_developers.html (For example,
> politely approach Phillip to see if he is interested in hosting it on
> GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving
> the PyPI metadata, etc - note I have no experience with this package,
> it's just the first hit for "python combinatorics")
>
> > One suggestion: Why not make it so that itertools.permutations checks if
> its
> > argument is an instance of collections.Mapping?  If it is, we could
> > interpret the items as a mapping from elements to positive integers,
> which
> > is a compact representation of a multiset.  Then, it could do the right
> > thing for that case.
>
> If you want to go down the path of only caring about hashable values,
> you may want to argue for a permutations method on collections.Counter
> (it's conceivable that approach has the potential to be even faster
> than an approach based on accepting and processing an arbitrary
> iterable, since it can avoid generating repeated values in the first
> place).
>
> A Counter based multiset permutation algorithm was actually posted to
> python-list back in 2009, just after collections.Counter was
> introduced:
> https://mail.python.org/pipermail/python-list/2009-January/521685.html
>
>
Nice find!


> I just created an updated version of that recipe and posted it as
> https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py
>
>
  Why not just define multiset_permutations to accept a dict (dict is a
base class of Counter)?  Since you're going to convert from an iterable
(with duplicates) to a dict (via Counter) anyway, why not accept it as
such.  Users who want an interface similar to itertools.permutations can
pass their iterable through Counter first.

Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131013/d6e723c1/attachment-0001.html>

From tim.peters at gmail.com  Sun Oct 13 23:22:06 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 13 Oct 2013 16:22:06 -0500
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <525AF4E2.6010301@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
Message-ID: <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>

[Tim]
>> [MRAB, posts a beautiful solution]
>>
>> I don't really have a use for this, but it was a lovely programming
>> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
>> below.  And that's the end of my interest in this ;-)
>>
>> It doesn't require that elements be orderable or even hashable.  It
>> does require that they can be compared for equality, but it's pretty
>> clear that if we _do_ include something like this, "equality" has to
>> be pluggable.
>> ...

[MRAB]
> I posted yet another implementation after that one.

I know.  I was talking about the beautiful one ;-)  The later one
could build equivalence classes faster (than mine) in many cases, but
I don't care much about the startup costs.  I care a lot more about:

1. Avoiding searches in the recursive function; i.e., this:

         for i, item in enumerate(items):
            if item != prev_item:

Making such tests millions (billions ...) of times adds up - and
equality testing may not be cheap.  The algorithm I posted does no
item testing after the setup is done (none in its recursive function).

2. Making "equality" pluggable.  Your later algorithm bought "find
equivalence classes" speed for hashable elements by using a dict, but
a dict's notion of equality can't be changed.  So, make equality
pluggable, and that startup-time speed advantage vanishes for all but
operator.__eq__'s idea of equality.

From oscar.j.benjamin at gmail.com  Mon Oct 14 01:10:51 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 14 Oct 2013 00:10:51 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
Message-ID: <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>

On 13 October 2013 22:22, Tim Peters <tim.peters at gmail.com> wrote:
> 2. Making "equality" pluggable.  Your later algorithm bought "find
> equivalence classes" speed for hashable elements by using a dict, but
> a dict's notion of equality can't be changed.  So, make equality
> pluggable, and that startup-time speed advantage vanishes for all but
> operator.__eq__'s idea of equality.

It sounds like you want Antoine's TransformDict:
http://www.python.org/dev/peps/pep-0455/


Oscar

From oscar.j.benjamin at gmail.com  Mon Oct 14 01:32:34 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 14 Oct 2013 00:32:34 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
 <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
 <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>
Message-ID: <CAHVvXxSBO7Zg0tCju6UXL3qAU671kKN+6jbmKHU=UU6xCjXw7A@mail.gmail.com>

On 14 October 2013 00:23, Tim Peters <tim.peters at gmail.com> wrote:
> [Tim]
>>> 2. Making "equality" pluggable.  Your later algorithm bought "find
>>> equivalence classes" speed for hashable elements by using a dict, but
>>> a dict's notion of equality can't be changed.  So, make equality
>>> pluggable, and that startup-time speed advantage vanishes for all but
>>> operator.__eq__'s idea of equality.
>
> [Oscar Benjamin]
>> It sounds like you want Antoine's TransformDict:
>> http://www.python.org/dev/peps/pep-0455/
>
> Not really in this case - I want a two-argument function ("are A and B
> equal?").  Not all plausible cases of that can be mapped to a
> canonical hashable key.  For example, consider permutations of a list
> of lists, where the user doesn't want int and float elements of the
> lists to be considered equal when they happen to have the same value.
> Is that a stretch?  Oh ya ;-)

Will this do?
d = TransformDict(lambda x: (type(x), x))


Oscar

From tim.peters at gmail.com  Mon Oct 14 01:44:14 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 13 Oct 2013 18:44:14 -0500
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxSBO7Zg0tCju6UXL3qAU671kKN+6jbmKHU=UU6xCjXw7A@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
 <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
 <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>
 <CAHVvXxSBO7Zg0tCju6UXL3qAU671kKN+6jbmKHU=UU6xCjXw7A@mail.gmail.com>
Message-ID: <CAExdVN=ThUZ=kG-R=+eeknmK-rKbsKfCYNKqNUKMvZYpbqr+oA@mail.gmail.com>

[Oscar Benjamin]
> Will this do?
> d = TransformDict(lambda x: (type(x), x))

No.  In the example I gave, *lists* will be passed as x (it was a list
of lists:  the lists are the elements of the permutations, and they
happen to have internal structure of their own).  So the `type(x)`
there is useless (it will always be the list type), while the lists
themselves would still be compared by operator.__eq__.

Not to mention that the constructed tuple isn't hashable anyway (x is
a list), so can't be used by TransformDict.

So that idea doesn't work several times over ;-)

From oscar.j.benjamin at gmail.com  Mon Oct 14 01:55:29 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 14 Oct 2013 00:55:29 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAExdVN=ThUZ=kG-R=+eeknmK-rKbsKfCYNKqNUKMvZYpbqr+oA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
 <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
 <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>
 <CAHVvXxSBO7Zg0tCju6UXL3qAU671kKN+6jbmKHU=UU6xCjXw7A@mail.gmail.com>
 <CAExdVN=ThUZ=kG-R=+eeknmK-rKbsKfCYNKqNUKMvZYpbqr+oA@mail.gmail.com>
Message-ID: <CAHVvXxS+Veo2USS+CV+P6HrGh-wW3Ztfw4=zB81XzC0_BfhXRg@mail.gmail.com>

On 14 October 2013 00:44, Tim Peters <tim.peters at gmail.com> wrote:
> [Oscar Benjamin]
>> Will this do?
>> d = TransformDict(lambda x: (type(x), x))
>
> No.  In the example I gave, *lists* will be passed as x (it was a list
> of lists:  the lists are the elements of the permutations, and they
> happen to have internal structure of their own).  So the `type(x)`
> there is useless (it will always be the list type), while the lists
> themselves would still be compared by operator.__eq__.
>
> Not to mention that the constructed tuple isn't hashable anyway (x is
> a list), so can't be used by TransformDict.
>
> So that idea doesn't work several times over ;-)

Damn, you're right. I obviously didn't think that one through hard
enough. Okay how about this?
d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x)))


Oscar

From tim.peters at gmail.com  Mon Oct 14 01:23:51 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 13 Oct 2013 18:23:51 -0500
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
 <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
Message-ID: <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>

[Tim]
>> 2. Making "equality" pluggable.  Your later algorithm bought "find
>> equivalence classes" speed for hashable elements by using a dict, but
>> a dict's notion of equality can't be changed.  So, make equality
>> pluggable, and that startup-time speed advantage vanishes for all but
>> operator.__eq__'s idea of equality.

[Oscar Benjamin]
> It sounds like you want Antoine's TransformDict:
> http://www.python.org/dev/peps/pep-0455/

Not really in this case - I want a two-argument function ("are A and B
equal?").  Not all plausible cases of that can be mapped to a
canonical hashable key.  For example, consider permutations of a list
of lists, where the user doesn't want int and float elements of the
lists to be considered equal when they happen to have the same value.
Is that a stretch?  Oh ya ;-)

From oscar.j.benjamin at gmail.com  Mon Oct 14 02:20:19 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 14 Oct 2013 01:20:19 +0100
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAExdVN=nd7DQNvNH=1XFAf7yG=UZjZDSz4T8oUOib2B2wWR8ng@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
 <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
 <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>
 <CAHVvXxSBO7Zg0tCju6UXL3qAU671kKN+6jbmKHU=UU6xCjXw7A@mail.gmail.com>
 <CAExdVN=ThUZ=kG-R=+eeknmK-rKbsKfCYNKqNUKMvZYpbqr+oA@mail.gmail.com>
 <CAHVvXxS+Veo2USS+CV+P6HrGh-wW3Ztfw4=zB81XzC0_BfhXRg@mail.gmail.com>
 <CAExdVN=nd7DQNvNH=1XFAf7yG=UZjZDSz4T8oUOib2B2wWR8ng@mail.gmail.com>
Message-ID: <CAHVvXxS7uiKqBNjThhT7mhR_XNbZHgwL9Q=TS-gO6KFOv0f0AQ@mail.gmail.com>

On 14 October 2013 01:15, Tim Peters <tim.peters at gmail.com> wrote:
> [Oscar Benjamin]
>> ...
>> Damn, you're right. I obviously didn't think that one through hard
>> enough. Okay how about this?
>> d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x)))
>
> Oscar, please give this up - it's not going to work.  `x` can be any
> object whatsoever, with arbitrarily complex internal structure, and
> the user can have an arbitrarily convoluted idea of what "equal"
> means.  Did I mention that these lists don't *only* have ints and
> floats as elements, but also nested sublists?  Oh ya - they also want
> a float and a singleton list containing the same float to be
> considered equal ;-)  Etc.

That does seem contrived but then I guess the whole problem is however....

> Besides, you're trying to solve a problem I didn't have to begin with
> ;-)  That is, I don't care much about the cost of building equivalence
> classes - it's a startup cost for the generator, not an "inner loop"
> cost.  Even if you could bash every case into a different convoluted
> hashable tuple, in general it's going to be - in this specific problem
> - far easier for the user to define an equal() function they like,
> working directly on the two objects.  That doesn't require an endless
> sequence of tricks.

okay I see what you mean.


Oscar

From tim.peters at gmail.com  Mon Oct 14 02:15:00 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 13 Oct 2013 19:15:00 -0500
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxS+Veo2USS+CV+P6HrGh-wW3Ztfw4=zB81XzC0_BfhXRg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAEbHw4a-n3yAAMc1HROJF8iTogXxqnxfCJ3YPJT3KGW4VSusvQ@mail.gmail.com>
 <CAA68w_m_yvoJpROOfB651UcAhdSHiU5ovzann7tcEGkfy2k8ug@mail.gmail.com>
 <CAEbHw4YXk_+Xb2miDb4c46gy_kRMDRm8sQ+nuC7goRK-3ob_yA@mail.gmail.com>
 <52587976.1000901@mrabarnett.plus.com>
 <CAA68w_=6ZFzgEws1Z12Yh_=FzLwBjOn2nAZp8s+sPZyQrQ9AmQ@mail.gmail.com>
 <CAEbHw4ZASOwc0dnWkwOGPTr0j_HpNwwjT+7WuP+EZDBZAO_XLA@mail.gmail.com>
 <CADiSq7ewK9yRJJ6is58XcikfdWMAspSO0jv0TALV-4ggx2bDLg@mail.gmail.com>
 <5258AC0B.1090603@mrabarnett.plus.com> <5258B539.10307@mrabarnett.plus.com>
 <CAExdVNnSdbQ2JozhTUkJLxx=bc6=24iebb5hEuPKrXW_1072Mg@mail.gmail.com>
 <525AF4E2.6010301@mrabarnett.plus.com>
 <CAExdVNmPLHq-zXv4Rwhs72cSCC9cBQGC2UfRF99_ZQjhfUmH=w@mail.gmail.com>
 <CAHVvXxS-6EVuzB71Kr6ntCGZ-5LTkWwk9K0KNXtgXkoED-28tw@mail.gmail.com>
 <CAExdVNn2QwtVvDzSwgOwk9JjCTPyhYcS5TGRNmg3HQEBCMDAdA@mail.gmail.com>
 <CAHVvXxSBO7Zg0tCju6UXL3qAU671kKN+6jbmKHU=UU6xCjXw7A@mail.gmail.com>
 <CAExdVN=ThUZ=kG-R=+eeknmK-rKbsKfCYNKqNUKMvZYpbqr+oA@mail.gmail.com>
 <CAHVvXxS+Veo2USS+CV+P6HrGh-wW3Ztfw4=zB81XzC0_BfhXRg@mail.gmail.com>
Message-ID: <CAExdVN=nd7DQNvNH=1XFAf7yG=UZjZDSz4T8oUOib2B2wWR8ng@mail.gmail.com>

[Oscar Benjamin]
> ...
> Damn, you're right. I obviously didn't think that one through hard
> enough. Okay how about this?
> d = TransformDict(lambda x: (tuple(map(type, x)), tuple(x)))

Oscar, please give this up - it's not going to work.  `x` can be any
object whatsoever, with arbitrarily complex internal structure, and
the user can have an arbitrarily convoluted idea of what "equal"
means.  Did I mention that these lists don't *only* have ints and
floats as elements, but also nested sublists?  Oh ya - they also want
a float and a singleton list containing the same float to be
considered equal ;-)  Etc.

Besides, you're trying to solve a problem I didn't have to begin with
;-)  That is, I don't care much about the cost of building equivalence
classes - it's a startup cost for the generator, not an "inner loop"
cost.  Even if you could bash every case into a different convoluted
hashable tuple, in general it's going to be - in this specific problem
- far easier for the user to define an equal() function they like,
working directly on the two objects.  That doesn't require an endless
sequence of tricks.

From felix at groebert.org  Mon Oct 14 14:25:53 2013
From: felix at groebert.org (=?ISO-8859-1?Q?Felix_Gr=F6bert?=)
Date: Mon, 14 Oct 2013 14:25:53 +0200
Subject: [Python-ideas] pytaint: taint tracking in python
Message-ID: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>

Hi,

I'd like to start a discussion on adding a security feature: taint tracking.

As part of his internship, Marcin (cc) has been working on a patch to
cpython-2.7.5 which is available online. We also published a design
document and slides.

https://github.com/felixgr/pytaint

The idea behind taint tracking (or taint checking) is that we mark
('taint') untrusted data and prevent the programmer from using it in
sensitive places (called sinks). A standard use case would be in a web
application, where data extracted from HTTP requests is tainted and a
database connection is sensitive sink. In other words: objects returned by
http request have a property indicating taint, and when one of them is
passed to database connection, a TaintException is raised.

The idea itself is not new (Ruby and Perl have it; there are also some
python libraries floating around) and pretty much noone uses it - however
with a few improvements, it can be made viable.

Firstly, we introduce different kinds of taint (motivation: a string may be
attack vector for many classes of attacks - e.g. XSS, SQLi - and we need
different escaping for that). Secondly, we allow to easily apply it to
existing software - a programmer can simply write a config file specifying
taint sources, sensitive sinks and taint cleaners, and enable tracking by
adding one line to his app.

We think it's a very useful feature for developing most of webapps and
other security-sensitive application in Python, any thoughts on this?

Thanks,
Felix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/68152c37/attachment.html>

From dickinsm at gmail.com  Mon Oct 14 14:29:04 2013
From: dickinsm at gmail.com (Mark Dickinson)
Date: Mon, 14 Oct 2013 13:29:04 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
 <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
Message-ID: <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>

On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

>
> By complete I meant that just as if you were to add the "error function,
> erf" to math, you would want to add an equivalent version to cmath.
>

An interesting choice of example.  *Why* would you want to do so?

Since you bring this up, I assume you're already aware that math.erf exists
but cmath.erf does not.  I believe there are good, practical reasons *not*
to add cmath.erf, in spite of the existence of math.erf.  Not least of
these is that cmath.erf would be significantly more complicated to
implement and of significantly less interest to users.  And perhaps there's
a parallel with itertools.permutations and the proposed
itertools.multiset_permutations here...

-- 
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/75db9947/attachment.html>

From mistersheik at gmail.com  Mon Oct 14 14:37:59 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Mon, 14 Oct 2013 08:37:59 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
Message-ID: <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>

Actually I didn't notice that.  It seems weird to find erf in math, but erf
for complex numbers in scipy.special.  It's just about organization and
user discovery.  I realize that from the developer's point of view, erf for
complex numbers is complicated, but why does the user care?




On Mon, Oct 14, 2013 at 8:29 AM, Mark Dickinson <dickinsm at gmail.com> wrote:

> On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <mistersheik at gmail.com>wrote:
>
>>
>> By complete I meant that just as if you were to add the "error function,
>> erf" to math, you would want to add an equivalent version to cmath.
>>
>
> An interesting choice of example.  *Why* would you want to do so?
>
> Since you bring this up, I assume you're already aware that math.erf
> exists but cmath.erf does not.  I believe there are good, practical reasons
> *not* to add cmath.erf, in spite of the existence of math.erf.  Not least
> of these is that cmath.erf would be significantly more complicated to
> implement and of significantly less interest to users.  And perhaps there's
> a parallel with itertools.permutations and the proposed
> itertools.multiset_permutations here...
>
> --
> Mark
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/a9028fef/attachment.html>

From oscar.j.benjamin at gmail.com  Mon Oct 14 15:11:42 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 14 Oct 2013 14:11:42 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
Message-ID: <CAHVvXxQ=3_iFdky1roZDGskcW+aUx5Ok0nRr-AFVCY7Wb_8S8Q@mail.gmail.com>

On 14 October 2013 13:37, Neil Girdhar <mistersheik at gmail.com> wrote:
>
> Actually I didn't notice that.  It seems weird to find erf in math, but erf
> for complex numbers in scipy.special.  It's just about organization and user
> discovery.  I realize that from the developer's point of view, erf for
> complex numbers is complicated, but why does the user care?

This is the first time I've seen a suggestion that there should be
cmath.erf. So I would say that most users don't care about having a
complex error function. Whoever would take the time to implement the
complex error function might instead spend that time implementing and
maintaining something that users do care about.


Oscar

From ncoghlan at gmail.com  Mon Oct 14 15:15:06 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 14 Oct 2013 23:15:06 +1000
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
Message-ID: <CADiSq7crWUhWHYbuHgWTxrM1mZbLAk9D6thBB5+Nx7cx-K=o+w@mail.gmail.com>

On 14 October 2013 22:25, Felix Gr?bert <felix at groebert.org> wrote:
> We think it's a very useful feature for developing most of webapps and other
> security-sensitive application in Python, any thoughts on this?

It's definitely an interesting idea, and the idea of pursuing it
initially as a separate project to optionally harden Python 2
applications is a good one.

Longer term, before it can be considered for inclusion as a language feature:

1. It needs to work with Python 3 (which has a substantially different
text model), as Python 2 is no longer receiving new features.
2. The performance impact needs to be assessed when the feature is
disabled (the default) and when various sources and sinks are defined.

The performance numbers comparing http://hg.python.org/benchmarks/
between vanilla CPython 2.7.5 and pytaint may also be of interest to
potential users of the Python 2.7 version.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From mistersheik at gmail.com  Mon Oct 14 15:15:06 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Mon, 14 Oct 2013 09:15:06 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAHVvXxQ=3_iFdky1roZDGskcW+aUx5Ok0nRr-AFVCY7Wb_8S8Q@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <CAHVvXxQ=3_iFdky1roZDGskcW+aUx5Ok0nRr-AFVCY7Wb_8S8Q@mail.gmail.com>
Message-ID: <CAA68w_ksw7tAwxAJNYnuq7cMnQQ6M6R_iQzpRqkOm=VS_=WQ-w@mail.gmail.com>

Look I don't want it, and anyway it's already in scipy.special.  I just
organizational symmetry.  I expected to find complex versions of math
functions in cmath ?not in scipy special.


On Mon, Oct 14, 2013 at 9:11 AM, Oscar Benjamin
<oscar.j.benjamin at gmail.com>wrote:

> On 14 October 2013 13:37, Neil Girdhar <mistersheik at gmail.com> wrote:
> >
> > Actually I didn't notice that.  It seems weird to find erf in math, but
> erf
> > for complex numbers in scipy.special.  It's just about organization and
> user
> > discovery.  I realize that from the developer's point of view, erf for
> > complex numbers is complicated, but why does the user care?
>
> This is the first time I've seen a suggestion that there should be
> cmath.erf. So I would say that most users don't care about having a
> complex error function. Whoever would take the time to implement the
> complex error function might instead spend that time implementing and
> maintaining something that users do care about.
>
>
> Oscar
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/1e8fc44a/attachment.html>

From breamoreboy at yahoo.co.uk  Mon Oct 14 15:26:59 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Mon, 14 Oct 2013 14:26:59 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_ksw7tAwxAJNYnuq7cMnQQ6M6R_iQzpRqkOm=VS_=WQ-w@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <CAHVvXxQ=3_iFdky1roZDGskcW+aUx5Ok0nRr-AFVCY7Wb_8S8Q@mail.gmail.com>
 <CAA68w_ksw7tAwxAJNYnuq7cMnQQ6M6R_iQzpRqkOm=VS_=WQ-w@mail.gmail.com>
Message-ID: <l3grer$rkk$1@ger.gmane.org>

On 14/10/2013 14:15, Neil Girdhar wrote:
> Look I don't want it, and anyway it's already in scipy.special.  I just
> organizational symmetry.  I expected to find complex versions of math
> functions in cmath ?not in scipy special.
>

Why are you comparing core Python modules with third party ones?

-- 
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence


From abarnert at yahoo.com  Mon Oct 14 18:07:26 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 14 Oct 2013 09:07:26 -0700
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
Message-ID: <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>

On Oct 14, 2013, at 5:25, Felix Gr?bert <felix at groebert.org> wrote:

> The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable.

A good part of the reason no one uses it is that SQL injection is always given as the motivation for the idea, but it's not a very good solution for that problem, and there's already a well-known better solution: parameterized queries.

SQL isn't the only case where you build executable strings--a document formatter might build Postscript code; a forum might build HTML (maybe even with embedded JS); a game might even read Python code from an in-game console or untrusted mod that's allowed to run in a different globals environment but not the main one; etc. Has anyone successfully used perl's long-standing taint mode for any such purposes? If not, can you demonstrate using it in python?

I don't think that would be _necessary_ for a python taint mode implementation to be considered useful, but it would certainly help get attention to the idea.

From raymond.hettinger at gmail.com  Mon Oct 14 19:56:23 2013
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Mon, 14 Oct 2013 10:56:23 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
Message-ID: <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>


On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik at gmail.com> wrote:

>  I'm now convinced that itertools.permutations is fine as it is.  I am not totally convinced that multiset_permutations doesn't belong in itertools,


Now that we have a good algorithm,  I'm open to adding this to itertools,
but it would need to have a name that didn't create any confusion
with respect to the existing tools, perhaps something like:

    anagrams(population, r)

    Return an iterator over a all distinct r-length permutations
    of the population.

    Unlike permutations(), element uniqueness is determined
    by value rather than by position.  Also, anagrams() makes
    no guarantees about the order the tuples are generated.



Raymond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/256705e9/attachment.html>

From bruce at leapyear.org  Mon Oct 14 20:03:21 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 14 Oct 2013 11:03:21 -0700
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
Message-ID: <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>

There's another good use case for tainting: html injection (XSS). There's a
good solution for that too but XSS is still prevalent because it's easy to
build html by concatenating strings without escaping and template systems
make it too easy to inject strings without escaping (or put another way,
they make it equally easy to inject escaped strings as unescaped strings).

However, the issue is not just tainting but typing as well. When I have a
string, I need to know if it's raw text or html text. If it's html text, I
need to know if it's safe (generated by the program or user input that's
been sanitized (carefully)) or unsafe (raw user input). I'm not sure it
isn't

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security


On Mon, Oct 14, 2013 at 9:07 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Oct 14, 2013, at 5:25, Felix Gr?bert <felix at groebert.org> wrote:
>
> > The idea itself is not new (Ruby and Perl have it; there are also some
> python libraries floating around) and pretty much noone uses it - however
> with a few improvements, it can be made viable.
>
> A good part of the reason no one uses it is that SQL injection is always
> given as the motivation for the idea, but it's not a very good solution for
> that problem, and there's already a well-known better solution:
> parameterized queries.
>
> SQL isn't the only case where you build executable strings--a document
> formatter might build Postscript code; a forum might build HTML (maybe even
> with embedded JS); a game might even read Python code from an in-game
> console or untrusted mod that's allowed to run in a different globals
> environment but not the main one; etc. Has anyone successfully used perl's
> long-standing taint mode for any such purposes? If not, can you demonstrate
> using it in python?
>
> I don't think that would be _necessary_ for a python taint mode
> implementation to be considered useful, but it would certainly help get
> attention to the idea.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/55b75564/attachment-0001.html>

From mistersheik at gmail.com  Mon Oct 14 22:28:44 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Mon, 14 Oct 2013 16:28:44 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
Message-ID: <CAA68w_mUPjbPeGGDUWLfmUQLMOC8Qoh=pOi82rGRBbRy-Ag--g@mail.gmail.com>

Excellent!

My top two names are
1. multiset_permutations (reflects the mathematical name)
2. anagrams

Note that we may also want to add multiset_combinations.  It hasn't been
part of this discussion, but it may be part of another discussion and I
wanted to point this out as I know many of you are future-conscious.

We seem to be all agreed that we want to accept "r", the length of the
permutation desired.

With permutations, the *set* is passed in as a iterable representing
distinct elements.  With multiset_permutations, there are three ways to
pass in the *multiset*:
- 1. an iterable whose elements (or an optional key function applied to
which) are compared using __eq__
- 2. a dict (of which collections.Counter) is a subclass
- 3. an iterable whose elements are key-value pairs and whose values are
counts

Example uses:
1. multiset_permutations(word)
2. multiset_permutations(Counter(word))
3. multiset_permutations(Counter(word).items())

>From a dictionary:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k,
v) for k, v in d.items()))
2. multiset_permutations(d)
3. multiset_permutations(d.items())

>From an iterable of key-value pairs:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k,
v) for k, v in it))
2. multiset_permutations({k: v for k, v in it})
3. multiset_permutations(it)

The advantage of 2 is that no elements are compared by
multiset_permutations (so it is simpler and faster).
The advantage of 3 is that no elements are compared, and they need not be
comparable or hashable.  This version is truly a generalization of the
"permutations" function.  This way, for any input "it" you could pass to
permutations, you could equivalently pass zip(it, itertools.repeat(1)) to
multiset_permutations.

Comments?

Neil


On Mon, Oct 14, 2013 at 1:56 PM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik at gmail.com> wrote:
>
>  I'm now convinced that itertools.permutations is fine as it is.  I am not
> totally convinced that multiset_permutations doesn't belong in itertools,
>
>
> Now that we have a good algorithm,  I'm open to adding this to itertools,
> but it would need to have a name that didn't create any confusion
> with respect to the existing tools, perhaps something like:
>
>     anagrams(population, r)
>
>     Return an iterator over a all distinct r-length permutations
>     of the population.
>
>     Unlike permutations(), element uniqueness is determined
>     by value rather than by position.  Also, anagrams() makes
>     no guarantees about the order the tuples are generated.
>
>
>
> Raymond
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/99750761/attachment.html>

From bruce at leapyear.org  Mon Oct 14 22:52:40 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 14 Oct 2013 13:52:40 -0700
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <525AFCC5.4070309@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
 <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
 <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>
 <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>
 <525AFCC5.4070309@mrabarnett.plus.com>
Message-ID: <CAGu0Ans6+qUOt0hCktBB9a0HZ8iKKgPn9_S8A9MqOFZaK1yPDg@mail.gmail.com>

On Sun, Oct 13, 2013 at 1:04 PM, MRAB <python at mrabarnett.plus.com> wrote:

> Here's a use case: anagrams.
>

For what it's worth, I've written anagram-finding code, and I didn't do it
with permutations. The faster approach is to create a dictionary mapping a
canonical form of each word to a list of words, e.g.,

{
  'ACT': ['ACT', 'CAT'],
  'AET': ['ATE', 'EAT', 'ETA', 'TEA']
}

This requires extra work to build the map but you do that just once when
you read the dictionary and then every lookup is O(1) not O(len(word)).
This canonical form approach is useful for other word transformations that
are used in puzzles, e.g., words that are have the same consonants
(ignoring vowels).


--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security

P.S. Yes, I know: if you play Scrabble, TAE is also a word.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/a1c25cf8/attachment.html>

From tjreedy at udel.edu  Tue Oct 15 00:59:54 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 14 Oct 2013 18:59:54 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mUPjbPeGGDUWLfmUQLMOC8Qoh=pOi82rGRBbRy-Ag--g@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
 <CAA68w_mUPjbPeGGDUWLfmUQLMOC8Qoh=pOi82rGRBbRy-Ag--g@mail.gmail.com>
Message-ID: <l3ht11$4cc$1@ger.gmane.org>

On 10/14/2013 4:28 PM, Neil Girdhar wrote:
> Excellent!
>
> My top two names are
> 1. multiset_permutations (reflects the mathematical name)
> 2. anagrams

I like anagrams. I did not completely get what this issue was about 
until someone finally mentioned anagrams as  use case.

-- 
Terry Jan Reedy


From tim.peters at gmail.com  Tue Oct 15 02:48:17 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 14 Oct 2013 19:48:17 -0500
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
Message-ID: <CAExdVNnzObNSqmhKwdQpG+H3FfHPQGWbeqfVBmdBW9VhTyL_Uw@mail.gmail.com>

[Raymond Hettinger]
> Now that we have a good algorithm,  I'm open to adding this to itertools,

I remain reluctant, because I still haven't seen a compelling use
case.  Yes, it generates all distinct r-letter anagrams - but so what?
 LOL ;-)  Seriously, I've written anagram programs several times in my
life, and generating "all possible" never occurred to me because it's
so crushingly inefficient.


> but it would need to have a name that didn't create any confusion
> with respect to the existing tools, perhaps something like:
>
>     anagrams(population, r)

"anagrams" is great!  Inspired :-)

What about an optional argument to define what the _user_ means by
"equality"?  The algorithm I posted had an optional
`equal=operator.__eq__` argument.  Else you're going to be pushed to
add a clumsy `TransformAnagrams` later <0.4 wink>.

>     Return an iterator over a all distinct r-length permutations
>     of the population.
>
>     Unlike permutations(), element uniqueness is determined
>     by value rather than by position.  Also, anagrams() makes
>     no guarantees about the order the tuples are generated.

Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the
elements support a total order, and appear in the iterable in
non-decreasing order, then the anagrams are generated in
non-decreasing lexicographic order.  And that may be a useful
guarantee (hard to tell without a real use case, though!).

There's another ambiguity I haven't seen addressed explicitly.  Consider this:

>>> from fractions import Fraction
>>> for a in anagrams([3, 3.0, Fraction(3)], 3):
...        print(a)

(3, 3.0, Fraction(3, 1))

All the algorithms posted here work to show all 3 elements in this
case.  But why?  If the elements all equal, then other outputs "should
be" acceptable too.  Like

(3, 3, 3)

or

(3.0, Fraction(3, 1), 3.0)

etc.  All those outputs compare equal!

This isn't visible if, e.g., the iterable's elements are letters
(where a == b if and only if str(a) == str(b), so the output looks the
same no matter what).

At least "my" algorithm could be simplified materially if it only
saved (and iterated over) a (single) canonical representative for each
equivalence class, instead of saving entire equivalence classes and
then jumping through hoops to cycle through each equivalence class's
elements.

But, for some reason, output (3, 3, 3) just "looks wrong" above.  I'm
not sure why.

From mistersheik at gmail.com  Tue Oct 15 03:17:26 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Mon, 14 Oct 2013 21:17:26 -0400
Subject: [Python-ideas] Fwd: Extremely weird itertools.permutations
In-Reply-To: <CAGu0Ans6+qUOt0hCktBB9a0HZ8iKKgPn9_S8A9MqOFZaK1yPDg@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <CAEbHw4Yj3cEo=xen7uSsEtA+MNe9dteuNk3a0kct6aXbD70JBw@mail.gmail.com>
 <CAEbHw4aDZd-4qjYT2vaJ+pWO8k3jQAW1mpY7dHfZQ9mw8F5C8A@mail.gmail.com>
 <01670D03-157D-49A5-A611-420B05F67DD8@yahoo.com>
 <CAEbHw4Y6ndT9Pdi3mvGn8=uCgY-UAMCOA0yVd8Vjtzg6-k=0Dg@mail.gmail.com>
 <CAA68w_nwit-+kJ+Mx3QeeQRs2i9WEqEHjF_PBcpo=vyrjsiF4A@mail.gmail.com>
 <CAA68w_nbtc5GQdvE8QG6-zTqG6e3LYryc5Va7A9kSJRrsoBx3A@mail.gmail.com>
 <CAHVvXxQ0n=a-LPM3LDUoQPsVpshMgRzxb1O0LmcBH=eGkebDmg@mail.gmail.com>
 <CAA68w_m5k7R=h+uHmH29f5MEjv=F28C+Jc9K=MmOmdrmzcRq3A@mail.gmail.com>
 <CAHVvXxRk4MKMMHKEMDo31vc9DDuULh+8s1w7QnXjxA30ZcKHWA@mail.gmail.com>
 <525AFCC5.4070309@mrabarnett.plus.com>
 <CAGu0Ans6+qUOt0hCktBB9a0HZ8iKKgPn9_S8A9MqOFZaK1yPDg@mail.gmail.com>
Message-ID: <CAA68w_mm4r4+bnHh0igbkH0mA-j53mV_roTVun=9ZQkqNrCEfg@mail.gmail.com>

Here are a couple people looking for the function that doesn't exist (yet?)

http://stackoverflow.com/questions/9660085/python-permutations-with-constraints/9660395#9660395
http://stackoverflow.com/questions/15592299/generating-unique-permutations-in-python


On Mon, Oct 14, 2013 at 4:52 PM, Bruce Leban <bruce at leapyear.org> wrote:

>
> On Sun, Oct 13, 2013 at 1:04 PM, MRAB <python at mrabarnett.plus.com> wrote:
>
>> Here's a use case: anagrams.
>>
>
> For what it's worth, I've written anagram-finding code, and I didn't do it
> with permutations. The faster approach is to create a dictionary mapping a
> canonical form of each word to a list of words, e.g.,
>
> {
>   'ACT': ['ACT', 'CAT'],
>   'AET': ['ATE', 'EAT', 'ETA', 'TEA']
> }
>
> This requires extra work to build the map but you do that just once when
> you read the dictionary and then every lookup is O(1) not O(len(word)).
> This canonical form approach is useful for other word transformations that
> are used in puzzles, e.g., words that are have the same consonants
> (ignoring vowels).
>
>
> --- Bruce
> I'm hiring: http://www.cadencemd.com/info/jobs
> Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
> Learn how hackers think: http://j.mp/gruyere-security
>
> P.S. Yes, I know: if you play Scrabble, TAE is also a word.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/992f831f/attachment.html>

From python at mrabarnett.plus.com  Tue Oct 15 03:17:56 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 15 Oct 2013 02:17:56 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAExdVNnzObNSqmhKwdQpG+H3FfHPQGWbeqfVBmdBW9VhTyL_Uw@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
 <CAExdVNnzObNSqmhKwdQpG+H3FfHPQGWbeqfVBmdBW9VhTyL_Uw@mail.gmail.com>
Message-ID: <525C97C4.2030601@mrabarnett.plus.com>

On 15/10/2013 01:48, Tim Peters wrote:
> [Raymond Hettinger]
>> Now that we have a good algorithm,  I'm open to adding this to itertools,
>
> I remain reluctant, because I still haven't seen a compelling use
> case.  Yes, it generates all distinct r-letter anagrams - but so what?
>   LOL ;-)  Seriously, I've written anagram programs several times in my
> life, and generating "all possible" never occurred to me because it's
> so crushingly inefficient.
>
>
>> but it would need to have a name that didn't create any confusion
>> with respect to the existing tools, perhaps something like:
>>
>>     anagrams(population, r)
>
> "anagrams" is great!  Inspired :-)
>
> What about an optional argument to define what the _user_ means by
> "equality"?  The algorithm I posted had an optional
> `equal=operator.__eq__` argument.  Else you're going to be pushed to
> add a clumsy `TransformAnagrams` later <0.4 wink>.
>
>>     Return an iterator over a all distinct r-length permutations
>>     of the population.
>>
>>     Unlike permutations(), element uniqueness is determined
>>     by value rather than by position.  Also, anagrams() makes
>>     no guarantees about the order the tuples are generated.
>
> Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the
> elements support a total order, and appear in the iterable in
> non-decreasing order, then the anagrams are generated in
> non-decreasing lexicographic order.  And that may be a useful
> guarantee (hard to tell without a real use case, though!).
>
[snip]
I can see that one disadvantage of my algorithm is that the worst-case
storage requirement is O(n^2) (I think). This is because the set of
first items could have N members, the set of second items could have
N-1 members, etc. On the other hand, IMHO, the sheer number of
permutations will become a problem long before the memory requirement
does! :-)


From steve at pearwood.info  Tue Oct 15 03:27:18 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 15 Oct 2013 12:27:18 +1100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
References: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
Message-ID: <20131015012718.GZ7989@ando>

On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:
> Actually I didn't notice that.  It seems weird to find erf in math, but erf
> for complex numbers in scipy.special.  It's just about organization and
> user discovery.  I realize that from the developer's point of view, erf for
> complex numbers is complicated, but why does the user care?

99% of users don't care about math.errf at all. Of those who do, 99%
don't care about cmath.errf. I'd like to see cmath.errf because I'm a
maths junkie, but if I were responsible for *actually doing the work*
I'd make the same decision to leave cmath.errf out and leave it for a
larger, more complete library like scipy.

There are an infinitely large number of potential programs which could 
in principle be added to Python's std lib, and only a finite number of 
person-hours to do the work. And there are costs to adding software to 
the std lib, not just benefits.


-- 
Steven

From mistersheik at gmail.com  Tue Oct 15 03:29:24 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Mon, 14 Oct 2013 21:29:24 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <20131015012718.GZ7989@ando>
References: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <20131015012718.GZ7989@ando>
Message-ID: <CAA68w_=5mvPOt6AcZpPbwTqV9_krQ29t_Hcnj4Fwvq7UL0euvA@mail.gmail.com>

You make a good point.  It was just a random example to illustrate that
desire for completeness.


On Mon, Oct 14, 2013 at 9:27 PM, Steven D'Aprano <steve at pearwood.info>wrote:

> On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:
> > Actually I didn't notice that.  It seems weird to find erf in math, but
> erf
> > for complex numbers in scipy.special.  It's just about organization and
> > user discovery.  I realize that from the developer's point of view, erf
> for
> > complex numbers is complicated, but why does the user care?
>
> 99% of users don't care about math.errf at all. Of those who do, 99%
> don't care about cmath.errf. I'd like to see cmath.errf because I'm a
> maths junkie, but if I were responsible for *actually doing the work*
> I'd make the same decision to leave cmath.errf out and leave it for a
> larger, more complete library like scipy.
>
> There are an infinitely large number of potential programs which could
> in principle be added to Python's std lib, and only a finite number of
> person-hours to do the work. And there are costs to adding software to
> the std lib, not just benefits.
>
>
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscribe at googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/7c0078ef/attachment-0001.html>

From ncoghlan at gmail.com  Tue Oct 15 03:39:30 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Oct 2013 11:39:30 +1000
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAA68w_=5mvPOt6AcZpPbwTqV9_krQ29t_Hcnj4Fwvq7UL0euvA@mail.gmail.com>
References: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com>
 <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <20131015012718.GZ7989@ando>
 <CAA68w_=5mvPOt6AcZpPbwTqV9_krQ29t_Hcnj4Fwvq7UL0euvA@mail.gmail.com>
Message-ID: <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>

On 15 October 2013 11:29, Neil Girdhar <mistersheik at gmail.com> wrote:
> You make a good point.  It was just a random example to illustrate that
> desire for completeness.

The desire for conceptual purity and consistency is a good one, it
just needs to be balanced against the practical constraints of
writing, maintaining, documenting, teaching and learning the standard
library.

"It isn't worth the hassle" is the answer to a whole lot of "Why not
X?" questions in software development :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From tim.peters at gmail.com  Tue Oct 15 03:40:00 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 14 Oct 2013 20:40:00 -0500
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <525C97C4.2030601@mrabarnett.plus.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <501BBA96-9DEF-4417-A18A-70FC65729329@gmail.com>
 <CAExdVNnzObNSqmhKwdQpG+H3FfHPQGWbeqfVBmdBW9VhTyL_Uw@mail.gmail.com>
 <525C97C4.2030601@mrabarnett.plus.com>
Message-ID: <CAExdVN=dxcP=4szfefSvBPKUcp9dpdcLdSvEUGe5YYgWKP8kvg@mail.gmail.com>

[MRAB]
> I can see that one disadvantage of my algorithm is that the worst-case
> storage requirement is O(n^2) (I think). This is because the set of
> first items could have N members, the set of second items could have
> N-1 members, etc. On the other hand, IMHO, the sheer number of
> permutations will become a problem long before the memory requirement
> does! :-)

My rewrite is O(N) space (best and worst cases).  I _think_ yours is
too, but I understand my rewrite better by now ;-)

Each element of the iterable appears in exactly one ENode:  the
`ehead` list is a partitioning of the input iterable.

From mistersheik at gmail.com  Tue Oct 15 03:40:52 2013
From: mistersheik at gmail.com (Neil Girdhar)
Date: Mon, 14 Oct 2013 21:40:52 -0400
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>
References: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <20131015012718.GZ7989@ando>
 <CAA68w_=5mvPOt6AcZpPbwTqV9_krQ29t_Hcnj4Fwvq7UL0euvA@mail.gmail.com>
 <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>
Message-ID: <CAA68w_m=Xdc+Yi14hVATRkYn_upph3osVCc6j6gmXJgMxkc9cw@mail.gmail.com>

On Mon, Oct 14, 2013 at 9:39 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
>
Totally agree.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131014/ff5db81a/attachment.html>

From tim.peters at gmail.com  Tue Oct 15 04:45:33 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 14 Oct 2013 21:45:33 -0500
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <CAA68w_km3WgriCO3YjUXv+HgRR+Mm5oiXs=m9KrMdFuRoRmiQw@mail.gmail.com>
 <20131012063445.GI7989@ando>
 <CAA68w_kZOSi5u_zWe9YtSkMcEDbmQ2g9tPOTqL6-wmfiJNGh8Q@mail.gmail.com>
 <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
Message-ID: <CAExdVNkg5NxV1C-d7yS1Y3XW9SVA+CQ4uRMnfLMLXQywy40sDQ@mail.gmail.com>

One example of prior art:  Maxima, which I use in its wxMaxima incarnation.

"""
Function:  permutations(a)

Returns a set of all distinct permutations of the members of the list
or set a. Each permutation is a list, not a set.

When a is a list, duplicate members of a are included in the permutations
"""

Examples from a Maxima shell:

> permutations([1, 2. 3]);
{[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}

> permutations([[1, 2], [1, 2], [2, 3]])
{[[1,2],[1,2],[2,3]],
 [[1,2],[2,3],[1,2]],
 [[2,3],[1,2],[1,2]]}

> permutations({1, 1.0, 1, 1.0})
{[1,1.0],[1.0,1]}

That last one may be surprising at first, but note that it's the first
example where I passed a _set_ (instead of a list).  And:

> {1, 1.0, 1, 1.0}
{1,1.0}

Best I can tell, Maxima has no builtin function akin to our
permutations(it, r) when r < len(it).  But Maxima has a huge number of
builtin functions, and I often struggle to find ones I _want_ in its
docs ;-)

From breamoreboy at yahoo.co.uk  Tue Oct 15 09:30:21 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Tue, 15 Oct 2013 08:30:21 +0100
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>
References: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <20131015012718.GZ7989@ando>
 <CAA68w_=5mvPOt6AcZpPbwTqV9_krQ29t_Hcnj4Fwvq7UL0euvA@mail.gmail.com>
 <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>
Message-ID: <l3iqu6$spq$2@ger.gmane.org>

On 15/10/2013 02:39, Nick Coghlan wrote:
>
> The desire for conceptual purity and consistency is a good one, it
> just needs to be balanced against the practical constraints of
> writing, maintaining, documenting, teaching and learning the standard
> library.
>
> "It isn't worth the hassle" is the answer to a whole lot of "Why not
> X?" questions in software development :)
>
> Cheers,
> Nick.
>

Would our volunteers be more inclined to take on the hassle if they got 
double time on Saturdays and triple time on Sundays? :)

-- 
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence


From felix at groebert.org  Tue Oct 15 11:58:41 2013
From: felix at groebert.org (=?ISO-8859-1?Q?Felix_Gr=F6bert?=)
Date: Tue, 15 Oct 2013 11:58:41 +0200
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
 <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
Message-ID: <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>

1. Please correct me if I misunderstand the Python project, but if the idea
is deemed 'good' by this list, a PEP can follow and the feature can be
included in Python 3? It is not necessary to have a Python 3 implementation
beforehand?
The existing Python 2.7.5 pytaint implementation is intended to be run by
users who need tainting in Python 2 but can also serve as a reference /
benchmark / proof-of-concept implementation for this discussion.

2. I haven't had the time to publish benchmarks yet but I plan to. Also, of
course, the cpython tests pass and we added additional taint tracking
tests. We also ran the internal tests of our python codebase with the
pytaint interpreter. This had negligible fails, mostly because some C
extensions haven't had been recompiled to work with the redefined string
objects.

Regarding taint tracking as a feature for python:

First of all, taint tracking is a general language feature and can be
considered for additional applications besides security. When it comes to
the security community, taint tracking is certainly controversial.
Nevertheless, my pytaint announcement received 50 retweets and 30 favs from
a part of the security community, if that counts for something ;)

As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
template systems and parameterized queries. Another library solution exists
to shell injection: pipes.quote. However, all these solutions require the
developer to pick the correct library and method. We have empirical
indicators that this works, but maybe only in 70% of cases. The rest of the
developers are introducing new vulnerabilities. Thus, an additional
language-based feature can help to mitigate the remaining 30% of cases. A
web app framework (or a python-developing company) can maintain and ship a
pytaint configuration which will throw a TaintError exception in those 30%
of cases and prevent the vulnerability from being exploited.

This argument follows along the principle of defense-in-depth: why just
have one security feature (e.g. pipes.quote) if we can offer several
security features to the developer? This has previously worked well for
system security: ALSR, DEP, etc.

Regarding the relation to typing:

We are using Mertis on purpose to be able to distinguish between different
forms of string cleaning. Today, most HTML template systems don't even make
a distinction between different escaping contexts. However, with a pytaint
Merit configuration for raw HTML, URLs, HTML attribution contents, CSS
attributes and JS strings, you would be able to make sure that your string
is cleaned for the specific context you're using it in. This can be
implemented for each template system individually but it would be easier to
just write a pytaint config.
If you don't clean strings based on browser context, you will run into
problems: a string is cleaned with HTML-entity encoding but used in a
<iframe src> attribute. An attacker could trigger a XSS by suppling
javascript:alert(document.cookie).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131015/5daefba3/attachment-0001.html>

From greg at krypto.org  Tue Oct 15 18:57:22 2013
From: greg at krypto.org (Gregory P. Smith)
Date: Tue, 15 Oct 2013 09:57:22 -0700
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
 <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
 <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
Message-ID: <CAGE7PNK1JyvteVTNEwzhF0wU5R2UELKriOJUhURC=BBjhZwT_Q@mail.gmail.com>

On Tue, Oct 15, 2013 at 2:58 AM, Felix Gr?bert <felix at groebert.org> wrote:

> 1. Please correct me if I misunderstand the Python project, but if the
> idea is deemed 'good' by this list, a PEP can follow and the feature can be
> included in Python 3? It is not necessary to have a Python 3 implementation
> beforehand?
> The existing Python 2.7.5 pytaint implementation is intended to be run by
> users who need tainting in Python 2 but can also serve as a reference /
> benchmark / proof-of-concept implementation for this discussion.
>

FWIW having reviewed parts of this code as it was implemented by Marcin
I'll state up front that porting this to Python 3 will mostly be a matter
of mechanical work. Python 3's bytes (PyBytes) and str (PyUnicode) objects
are not _that_ different in implementation in comparison to Python 2's str
(PyString) and unicode (PyUnicode) objects for the purposes of adding and
tracking taint.

Besides, the code could use more eyeballs as would happen in any porting
process. :)


> 2. I haven't had the time to publish benchmarks yet but I plan to. Also,
> of course, the cpython tests pass and we added additional taint tracking
> tests. We also ran the internal tests of our python codebase with the
> pytaint interpreter. This had negligible fails, mostly because some C
> extensions haven't had been recompiled to work with the redefined string
> objects.
>
> Regarding taint tracking as a feature for python:
>
> First of all, taint tracking is a general language feature and can be
> considered for additional applications besides security. When it comes to
> the security community, taint tracking is certainly controversial.
> Nevertheless, my pytaint announcement received 50 retweets and 30 favs from
> a part of the security community, if that counts for something ;)
>
> As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
> template systems and parameterized queries. Another library solution exists
> to shell injection: pipes.quote. However, all these solutions require the
> developer to pick the correct library and method. We have empirical
> indicators that this works, but maybe only in 70% of cases. The rest of the
> developers are introducing new vulnerabilities. Thus, an additional
> language-based feature can help to mitigate the remaining 30% of cases. A
> web app framework (or a python-developing company) can maintain and ship a
> pytaint configuration which will throw a TaintError exception in those 30%
> of cases and prevent the vulnerability from being exploited.
>
> This argument follows along the principle of defense-in-depth: why just
> have one security feature (e.g. pipes.quote) if we can offer several
> security features to the developer? This has previously worked well for
> system security: ALSR, DEP, etc.
>
> Regarding the relation to typing:
>
> We are using Mertis on purpose to be able to distinguish between different
> forms of string cleaning. Today, most HTML template systems don't even make
> a distinction between different escaping contexts. However, with a pytaint
> Merit configuration for raw HTML, URLs, HTML attribution contents, CSS
> attributes and JS strings, you would be able to make sure that your string
> is cleaned for the specific context you're using it in. This can be
> implemented for each template system individually but it would be easier to
> just write a pytaint config.
>

Indeed. I like the taint merits system. It is much more powerful than what
Perl 5 ever had with a single taint bit.

The ability to configure taint properties "offline" via JSON files is also
neat. You can effectively create taint merit and sink metadata for existing
Python libraries without needing to modify them (similar to how Cython lets
you specify types via an external file for it to apply its magic better to
other libraries without needing to modify them).

-gps


> If you don't clean strings based on browser context, you will run into
> problems: a string is cleaned with HTML-entity encoding but used in a
> <iframe src> attribute. An attacker could trigger a XSS by suppling
> javascript:alert(document.cookie).
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131015/5b7b746e/attachment.html>

From tjreedy at udel.edu  Tue Oct 15 19:14:55 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 15 Oct 2013 13:14:55 -0400
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
 <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
 <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
Message-ID: <l3jt66$vfn$1@ger.gmane.org>

On 10/15/2013 5:58 AM, Felix Gr?bert wrote:
> 1. Please correct me if I misunderstand the Python project, but if the
> idea is deemed 'good' by this list,

This list is a discussion forum, not a decision-making body.

An individual person can consider an idea 'good' in some sense without 
thinking that it should be included in the CPython distribution.

 > a PEP can follow and the feature can be included in Python 3?

A PEP must be discussed on the pydev (core developer) list and approved 
by GvR or a person delegated by him.

 > It is not necessary to have a Python 3 implementation beforehand?

A Python 3 implementation is necessary for inclusion. It may or may not 
be needed for PEP approval, depending on the pydev discussion and 
ultimately the PEP decider.

> The existing Python 2.7.5 pytaint implementation is intended to be run
> by users who need tainting in Python 2 but can also serve as a reference
> / benchmark / proof-of-concept implementation for this discussion.
...
> Regarding taint tracking as a feature for python:
>
> First of all, taint tracking is a general language feature

Making objects instances of classes with attributes is a general 
feature, which Python already has. From what I have seen posted, taint 
tracking is a particular implementation of a specialized subjective 
concept 'untrusted code text'.  The concept is based on the unfortunate 
social-psychological fact that some people enjoy messing up other 
people's  lives.

 > As Andrew and Bruce mention, there are other solutions to XSS and
 > SQLi: template systems and parameterized queries. Another library
 > solution exists to shell injection: pipes.quote.

Right. Taints are not the only possible implementation that uses the 
same concept.

 > However, all these solutions require the developer to pick the
 > correct library and method.

The same would be true of a taint library. Note that web frameworks, 
etc, are not in the stdlib. I am not sure that taints should be either.

The idea of marking bytes (or strings) with their encoding (or source 
encoding) has been rejected. I don't think anything else should be added 
either.

-- 
Terry Jan Reedy



From abarnert at yahoo.com  Tue Oct 15 19:30:18 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 15 Oct 2013 10:30:18 -0700
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <l3jt66$vfn$1@ger.gmane.org>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
 <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
 <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
 <l3jt66$vfn$1@ger.gmane.org>
Message-ID: <E62EAF37-471D-4AB2-B038-33924526F0C9@yahoo.com>

On Oct 15, 2013, at 10:14, Terry Reedy <tjreedy at udel.edu> wrote:

> The same would be true of a taint library. Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either.

Well, some of the things that could benefit from taint checking _are_ in the stdlib--sqlite3.Cursor.execute, eval, etc.

More importantly, it sounds like (at least this particular implementation of) tainted string tracking requires language support. So it seems to me that it has to be in the stdlib or not be at all. (I suppose you could add language support that allows for a variety of different taint libraries and not have any in the stdlib, but that seems even less likely to be acceptable than the larger suggestion.) So what you're suggesting really amounts to saying that this project should remain a fork of CPython.

That being said, with no investigation into the difficulties or costs of implementing taint tracking in PyPy, Jython, and IronPython, not to mention not-quite-implementations like Cython, there might be other arguments for that position.

From tjreedy at udel.edu  Tue Oct 15 20:10:23 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 15 Oct 2013 14:10:23 -0400
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <E62EAF37-471D-4AB2-B038-33924526F0C9@yahoo.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
 <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
 <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
 <l3jt66$vfn$1@ger.gmane.org> <E62EAF37-471D-4AB2-B038-33924526F0C9@yahoo.com>
Message-ID: <l3k0e6$7o3$1@ger.gmane.org>

On 10/15/2013 1:30 PM, Andrew Barnert wrote:
> On Oct 15, 2013, at 10:14, Terry Reedy
> <tjreedy at udel.edu> wrote:
>
>> The same would be true of a taint library. Note that web
>> frameworks, etc, are not in the stdlib. I am not sure that taints
>> should be either.
>
> Well, some of the things that could benefit from taint checking _are_
> in the stdlib--sqlite3.Cursor.execute, eval, etc.

Perhaps a security-oriented sql package could try to force use of 
parameterized queries, even though that would be less convenient for 
hard-coded queries. Or, the db2 interface standard could be augmented 
with a standard interface for tainted strings. (Or such an 
interface/protocol might be defined in a pep.)

As for eval (and exec), a package module could easily provide a wrapper.

def eval(code, glob, loc):
   if safe(code):
     builtin_eval(text(code), glob, loc)
   else:
     raise TaintError("only eval save strings")

It could even replace the binding in builtins.

Note that in Python 3, exec is also a function, not a statement (and 
keyword), so that it too can be wrapped and masked.

> More importantly, it sounds like (at least this particular
> implementation of) tainted string tracking requires language support.

If 'language support' means changing str

> So what you're suggesting really amounts to saying that this project
> should remain a fork of CPython.

which 'fork implies to me, then experience with an implementation for 
3.3+, using the new FSR classes, is needed for any real discussion.

> That being said, with no investigation into the difficulties or costs
> of implementing taint tracking in PyPy, Jython, and IronPython, not
> to mention not-quite-implementations like Cython, there might be
> other arguments for that position [of remaining 3rd party].

Good catch. I presume Jython and IronPython simply use Java and C# 
strings respectively.

-- 
Terry Jan Reedy


From ron3200 at gmail.com  Tue Oct 15 22:03:59 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Tue, 15 Oct 2013 15:03:59 -0500
Subject: [Python-ideas] Another "little" combination, permutation, iterator,
 yield from, example.
Message-ID: <l3k736$mj7$1@ger.gmane.org>


On 10/13/2013 02:02 PM, Tim Peters wrote:> [MRAB, posts a beautiful solution]
 >
 > I don't really have a use for this, but it was a lovely programming
 > puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
 > below.  And that's the end of my interest in this;-)

A what the heck...  :-)


This is what I came up with which works like MRAB's by removing the first 
element and using recursion on the rest.  (It's how you do it in lisp or 
scheme.)

It's just two generators working together.

You can get items with specified length by using a list comprehension or 
filter.

(Disclaimer... I didn't do any more testing (performance or otherwise)
  than you see below.)

Cheers,
    Ron Adam


def combine_with_all(x, seq):
     """Combine item x to every item in seq."""
     for s in seq:
       if (not (isinstance(s, list))):
         s = [s]
       yield s + [x]

def unique_sets(seq):
     """Return all unique combinations of elements in seq."""
     if len(seq) == 0:
         return []
     first, rest = seq[0], seq[1:]
     yield [first]
     yield from unique_sets(rest)
     yield from combine_with_all(first, unique_sets(rest))



### EXAMPLES ###

for args in [('x', []),
              ('x', [1, 2]),
              ('x', [[1], [2]]),
              ('x', [[1, 2], 3]),
              ('x', [[1, 2], [3]]),
              ('x', ['abc']),
              ('x', ['ab', 'c'])]:
     print(list(combine_with_all(*args)))

#  []
#  [[1, 'x'], [2, 'x']]
#  [[1, 'x'], [2, 'x']]
#  [[1, 2, 'x'], [3, 'x']]
#  [[1, 2, 'x'], [3, 'x']]
#  [['a', 'x'], ['b', 'x'], ['c', 'x']]
#  [['abc', 'x']]
#  [['ab', 'x'], ['c', 'x']]


print(list(unique_sets('abc')))
#  [['a'], ['b'], ['c'], ['c', 'b'], ['b', 'a'], ['c', 'a'],
#   ['c', 'b', 'a']]

print(list(unique_sets(['abc'])))
#  [['abc']]

print(list(unique_sets(['a', 'b', 'c']) ))
#  [['a'], ['b'], ['c'], ['c', 'b'], ['b', 'a'], ['c', 'a'],
#   ['c', 'b', 'a']]

print(list (unique_sets( [(1,2), (3,4)] )))
#  [[(1, 2)], [(3, 4)], [(3, 4), (1, 2)]]

print(list (unique_sets( [[1,2], [3,4]] )))
#  [[[1, 2]], [[3, 4]], [[3, 4], [1, 2]]]

print([x for x in unique_sets(['a', 'b', 'c']) if len(x) == 2])
#  [['c', 'a'], ['b', 'a'], ['c', 'b']]

print([x for x in unique_sets(["a1", "b", "a2"]) if len(x) == 2])
#  [['a2', 'b'], ['b', 'a1'], ['a2', 'a1']]




From dreamingforward at gmail.com  Tue Oct 15 22:15:30 2013
From: dreamingforward at gmail.com (Mark Janssen)
Date: Tue, 15 Oct 2013 13:15:30 -0700
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
Message-ID: <CAMjeLr8iJUrD-QkPr92km5Cn=nwxGiPVtj2Gz8+eJfLD+GOdbw@mail.gmail.com>

> I'd like to start a discussion on adding a security feature: taint tracking.

I don't about anyone else, but I got a little problem with using the
word "taint".

Mark

From bruce at leapyear.org  Tue Oct 15 22:34:46 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Tue, 15 Oct 2013 13:34:46 -0700
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>
References: <CADiSq7eoqGL5q+D3id6HPsxzQNoQ=1tv060=kLMBJmBKBDOcTA@mail.gmail.com>
 <CAA68w_nrSmoy6si+ptgGDMu8x-Yf2+9kdKL7DcpYcH0HA20dkQ@mail.gmail.com>
 <EAE6954B-ECA5-462A-B9F9-81684CD78C7B@gmail.com> <20131013014742.GR7989@ando>
 <D6923731-7CC0-4038-B6BE-24FD4630A169@gmail.com>
 <CAA68w_mfc+7Z6U87DHavgO7Yh1AVN4uQaFSw6yc=L3AO5h15Xw@mail.gmail.com>
 <CADiSq7c=LcD_iikHt6GCvFthvSi08rwg_xJ71Etonhk1zvh5Jw@mail.gmail.com>
 <CAA68w_mw_2+Uip9Ab2hSEyePisOM4=bm9ucu=CqUrUZ=4qVjVg@mail.gmail.com>
 <CAAu3qLXE4qsdv76_K_jpP48-gJ7r9Zmx9EsM5CveLhnzjcyitg@mail.gmail.com>
 <CAA68w_mJ9cSU9QXQcMeb2sGth+Lv4gY6BwtmjA42kmDHFwt7kA@mail.gmail.com>
 <20131015012718.GZ7989@ando>
 <CAA68w_=5mvPOt6AcZpPbwTqV9_krQ29t_Hcnj4Fwvq7UL0euvA@mail.gmail.com>
 <CADiSq7fykrP7+vM7P84Qdi-BOCkYRuO+qCaoTRT3fnQz3MHZGQ@mail.gmail.com>
Message-ID: <CAGu0Anvm+c66MXQUx1wY3iNfyxT-nFtywEJSNrnrQsHyiaNk1w@mail.gmail.com>

On Mon, Oct 14, 2013 at 6:39 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> "It isn't worth the hassle" is the answer to a whole lot of "Why not
> X?" questions in software development :)
>

Sometimes it's not worth the hassle on either side. If this is added to the
standard library and I write code that uses it, my code won't be backwards
compatible with older versions of Python. So I'll either have to not
support older Python versions or use an alternative implementation. If this
is on pypi then that's not an issue. Not everything useful should be in the
standard library.

If I had come across a need for this, I'd have just used
unique_everseen(permuations(...)) until performance became an issue.

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131015/ffbdad49/attachment.html>

From phd at phdru.name  Tue Oct 15 22:56:41 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 16 Oct 2013 00:56:41 +0400
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CAMjeLr8iJUrD-QkPr92km5Cn=nwxGiPVtj2Gz8+eJfLD+GOdbw@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <CAMjeLr8iJUrD-QkPr92km5Cn=nwxGiPVtj2Gz8+eJfLD+GOdbw@mail.gmail.com>
Message-ID: <20131015205641.GA23332@iskra.aviel.ru>

On Tue, Oct 15, 2013 at 01:15:30PM -0700, Mark Janssen <dreamingforward at gmail.com> wrote:
> > I'd like to start a discussion on adding a security feature: taint tracking.
> 
> I don't about anyone else, but I got a little problem with using the
> word "taint".

   Too late to discuss -- it's become a well-known term:
https://en.wikipedia.org/wiki/Taint_checking

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.

From ned at nedbatchelder.com  Tue Oct 15 23:14:25 2013
From: ned at nedbatchelder.com (Ned Batchelder)
Date: Tue, 15 Oct 2013 17:14:25 -0400
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
Message-ID: <525DB031.2030409@nedbatchelder.com>

On 10/14/13 8:25 AM, Felix Gr?bert wrote:
> The idea itself is not new (Ruby and Perl have it; there are also some 
> python libraries floating around) and pretty much noone uses it - 
> however with a few improvements, it can be made viable.

I'd be interested to hear why this feature isn't used in the languages 
that already have it.  That seems to be a strike against it.  Your 
proposed changes sound like they make it a more complex feature, and 
therefore less likely to be used.

--Ned.

From ncoghlan at gmail.com  Tue Oct 15 23:56:45 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 16 Oct 2013 07:56:45 +1000
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <BCABA0D9-A863-406F-A387-225ED50BB5D3@yahoo.com>
 <CAGu0AnugchWkOO+5vt6npOFYqp3wPVjc3E8o_765F2_FE+bzhA@mail.gmail.com>
 <CABY9dC1GskPuN8P-iSCrsuqn+RJPTgfCq2s69bCt_=+machxJQ@mail.gmail.com>
Message-ID: <CADiSq7c4j32fBapNGXk+DhLZDSOdzTTZoLShtwdM0TCNzrFCXw@mail.gmail.com>

On 15 Oct 2013 19:59, "Felix Gr?bert" <felix at groebert.org> wrote:
>
> 1. Please correct me if I misunderstand the Python project, but if the
idea is deemed 'good' by this list, a PEP can follow and the feature can be
included in Python 3? It is not necessary to have a Python 3 implementation
beforehand?

Sure. I was just pointing out that the significantly different str and
bytes types and the removal of the implicit conversions between them in 3.x
could complicate the eventual forward porting process. (Although GPS has
indicated it shouldn't be a major problem in this case).

> The existing Python 2.7.5 pytaint implementation is intended to be run by
users who need tainting in Python 2 but can also serve as a reference /
benchmark / proof-of-concept implementation for this discussion.
>
> 2. I haven't had the time to publish benchmarks yet but I plan to. Also,
of course, the cpython tests pass and we added additional taint tracking
tests. We also ran the internal tests of our python codebase with the
pytaint interpreter. This had negligible fails, mostly because some C
extensions haven't had been recompiled to work with the redefined string
objects.
>
> Regarding taint tracking as a feature for python:
>
> First of all, taint tracking is a general language feature and can be
considered for additional applications besides security. When it comes to
the security community, taint tracking is certainly controversial.
Nevertheless, my pytaint announcement received 50 retweets and 30 favs from
a part of the security community, if that counts for something ;)

If you can provide a way to taint strings with an encoding assumption such
that combining strings with conflicting encoding assumptions fails, that
would be a big point in favour of the system.

A way to track the origins of tainted objects would also be a big winner.
While I assume tracking that would be too expensive to do by default,
tracing the origin of bad data can be a genuinely hard debugging problem,
so being able to fire up failing unit tests or vulnerability scans in a
taint tracing mode could be very interesting.

>
> As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
template systems and parameterized queries. Another library solution exists
to shell injection: pipes.quote. However, all these solutions require the
developer to pick the correct library and method. We have empirical
indicators that this works, but maybe only in 70% of cases. The rest of the
developers are introducing new vulnerabilities. Thus, an additional
language-based feature can help to mitigate the remaining 30% of cases. A
web app framework (or a python-developing company) can maintain and ship a
pytaint configuration which will throw a TaintError exception in those 30%
of cases and prevent the vulnerability from being exploited.
>
> This argument follows along the principle of defense-in-depth: why just
have one security feature (e.g. pipes.quote) if we can offer several
security features to the developer? This has previously worked well for
system security: ALSR, DEP, etc.

Yes, the idea sounds interesting to me in principle. If it can be adapted
to help with the "where did the bad string data come from?" problem more
generally, then it becomes genuinely compelling :)

> Regarding the relation to typing:
>
> We are using Mertis on purpose to be able to distinguish between
different forms of string cleaning. Today, most HTML template systems don't
even make a distinction between different escaping contexts. However, with
a pytaint Merit configuration for raw HTML, URLs, HTML attribution
contents, CSS attributes and JS strings, you would be able to make sure
that your string is cleaned for the specific context you're using it in.
This can be implemented for each template system individually but it would
be easier to just write a pytaint config.
> If you don't clean strings based on browser context, you will run into
problems: a string is cleaned with HTML-entity encoding but used in a
<iframe src> attribute. An attacker could trigger a XSS by suppling
javascript:alert(document.cookie).

It seems to me that viewing this as a parallel typing system for data
strings is a potentially useful way of looking at things.

Cheers,
Nick.

>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131016/466d2f91/attachment-0001.html>

From ncoghlan at gmail.com  Wed Oct 16 00:02:26 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 16 Oct 2013 08:02:26 +1000
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <525DB031.2030409@nedbatchelder.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <525DB031.2030409@nedbatchelder.com>
Message-ID: <CADiSq7cjrGCiTNmgw8s39j=yPuCufuZ27ft9ac1bCutcAPnLBQ@mail.gmail.com>

On 16 Oct 2013 07:15, "Ned Batchelder" <ned at nedbatchelder.com> wrote:
>
> On 10/14/13 8:25 AM, Felix Gr?bert wrote:
>>
>> The idea itself is not new (Ruby and Perl have it; there are also some
python libraries floating around) and pretty much noone uses it - however
with a few improvements, it can be made viable.
>
>
> I'd be interested to hear why this feature isn't used in the languages
that already have it.  That seems to be a strike against it.  Your proposed
changes sound like they make it a more complex feature, and therefore less
likely to be used.

At least the Perl one is a bit too simplistic for sophisticated cases, as
it just divides the world into safe and unsafe strings.

That approach is closer to the safe/unsafe marking mechanisms that Python
web frameworks already tend to use for templating and other aspects of
response generation.

Cheers,
Nick.

>
> --Ned.
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131016/79435513/attachment.html>

From tim.peters at gmail.com  Wed Oct 16 02:25:13 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Tue, 15 Oct 2013 19:25:13 -0500
Subject: [Python-ideas] Another "little" combination, permutation,
 iterator, yield from, example.
In-Reply-To: <l3k736$mj7$1@ger.gmane.org>
References: <l3k736$mj7$1@ger.gmane.org>
Message-ID: <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>

[Tim]
>> [MRAB, posts a beautiful solution]
>>
>> I don't really have a use for this, but it was a lovely programming
>> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
>> below.  And that's the end of my interest in this;-)

[Ron Adam]
> A what the heck...  :-)
>
>
> This is what I came up with which works like MRAB's by removing the first
> element and using recursion on the rest.  (It's how you do it in lisp or
> scheme.)

It's solving a different problem, though.  In your "Return all unique
combinations of elements in seq", by "unique" you really mean "by
position", not " by value".  For example:

>>>  for p in unique_sets('aab'):
...        print(p)
['a']
['a']
['b']
['b', 'a']
['a', 'a']
['b', 'a']
['b', 'a', 'a']

See?  ['a'] is produced twice, and so is ['b', 'a'].  The whole point
of the algorithms in the thread this spawned from was to avoid
generating *value*-duplicates to begin with.  Filtering them out later
is too inefficient.

>>> for p in unique_sets('aaaaaaaaaa'):
...        print(p)
['a']
['a']
['a']
['a']
['a']
['a']
['a']
['a']
['a']
['a']
['a', 'a']
['a', 'a']
['a', 'a']
['a', 'a', 'a']
['a', 'a']
['a', 'a']
['a', 'a']
['a', 'a', 'a']
... on and on and on ....

Only one comment on the code:


> def unique_sets(seq):
>     """Return all unique combinations of elements in seq."""
>     if len(seq) == 0:
>         return []
>     first, rest = seq[0], seq[1:]

Python's newer unpacking syntax makes that one prettier:

    first, *rest = seq

Much higher-level than crusty old Lisp or Scheme ;-)

> ....

From abarnert at yahoo.com  Wed Oct 16 03:53:51 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 15 Oct 2013 18:53:51 -0700
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <CADiSq7cjrGCiTNmgw8s39j=yPuCufuZ27ft9ac1bCutcAPnLBQ@mail.gmail.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <525DB031.2030409@nedbatchelder.com>
 <CADiSq7cjrGCiTNmgw8s39j=yPuCufuZ27ft9ac1bCutcAPnLBQ@mail.gmail.com>
Message-ID: <BE4F99C2-121D-49FF-9C10-15A61ED7C4F2@yahoo.com>

On Oct 15, 2013, at 15:02, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 16 Oct 2013 07:15, "Ned Batchelder" <ned at nedbatchelder.com> wrote:
> >
> > On 10/14/13 8:25 AM, Felix Gr?bert wrote:
> >>
> >> The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable.
> >
> >
> > I'd be interested to hear why this feature isn't used in the languages that already have it.  That seems to be a strike against it.  Your proposed changes sound like they make it a more complex feature, and therefore less likely to be used.
> 
> At least the Perl one is a bit too simplistic for sophisticated cases, as it just divides the world into safe and unsafe strings.
> 
> That approach is closer to the safe/unsafe marking mechanisms that Python web frameworks already tend to use for templating and other aspects of response generation.
> 
Also keep in mind that we're talking about a perl 3 feature intended to solve SQL injection problems, and once parameterized SQL was invented it was no longer useful for that. (Yes, you can still embed strings directly into SQL statements and quote and escape them manually because you're sure you're too smart to ever make a mistake, or because you just haven't bothered to learn the language or domain--but the kind of person who does that also doesn't turn on taint mode.) 

A more flexible feature designed for other problems that haven't proven as amenable to an easy fix might find more use. Which is exactly why I suggested that the OP give better use cases than SQL injection--and he obliged.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131015/678bedd7/attachment.html>

From ron3200 at gmail.com  Wed Oct 16 08:21:13 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Wed, 16 Oct 2013 01:21:13 -0500
Subject: [Python-ideas] Another "little" combination, permutation,
 iterator, yield from, example.
In-Reply-To: <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
References: <l3k736$mj7$1@ger.gmane.org>
 <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
Message-ID: <l3lb8h$2ul$1@ger.gmane.org>



On 10/15/2013 07:25 PM, Tim Peters wrote:
> [Tim]
>>> [MRAB, posts a beautiful solution]
>>>
>>> I don't really have a use for this, but it was a lovely programming
>>> puzzle, so I'll include an elaborate elaboration of MRAB's algorithm
>>> below.  And that's the end of my interest in this;-)
>
> [Ron Adam]
>> A what the heck...  :-)
>>
>>
>> This is what I came up with which works like MRAB's by removing the first
>> element and using recursion on the rest.  (It's how you do it in lisp or
>> scheme.)
>
> It's solving a different problem, though.  In your "Return all unique
> combinations of elements in seq", by "unique" you really mean "by
> position", not " by value".  For example:

Correct.

And sometimes an items position is part of what makes it unique.  Multiple 
roller combination locks and slot-machines, come to mind.  I'm sure there 
are other things where the label or face value is only one aspect of it's 
identity.


>>>>   for p in unique_sets('aab'):
> ...        print(p)
> ['a']
> ['a']
> ['b']
> ['b', 'a']
> ['a', 'a']
> ['b', 'a']
> ['b', 'a', 'a']
>
> See?  ['a'] is produced twice, and so is ['b', 'a'].  The whole point
> of the algorithms in the thread this spawned from was to avoid
> generating *value*-duplicates to begin with.  Filtering them out later
> is too inefficient.


Well, we can filter them to begin-with instead of later.

for p in unique_sets(list(set('aaaaaaaa'))):
     print(p)

['a']



And skip combination duplicates later as they are produced.

for p in unique_sets('aaaaaaaa'):
     if cached(tuple(p)):
        continue
     print(p)

['a']
['a', 'a']
['a', 'a', 'a']
['a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a', 'a', 'a']
['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']


Not generating them to begin with requires some sort of known ordering or 
pattern in the data.

The difference between checking them inside the algorithm as they are 
produced, and checking them externally as they are *yielded* is only the 
yield overhead if we are talking about python code in both places.

In effect the inner loop control is yielded out with the value.  That's a 
very nice thing about "yield from", and it still works that way in the 
recursive example.  That makes things like this a lot more flexible.


> Only one comment on the code:
>
>
>> def unique_sets(seq):
>>      """Return all unique combinations of elements in seq."""
>>      if len(seq) == 0:
>>          return []
>>      first, rest = seq[0], seq[1:]
>
> Python's newer unpacking syntax makes that one prettier:
>
>      first, *rest = seq
>

LOL... yep.  Thats why that part bugged me.  I knew there was something I 
was missing.

The other part that bothers me is the insistence list check.  And not 
accepting sets directly.


 > Much higher-level than crusty old Lisp or Scheme ;-)

HAHA... yep, although they are both new to me and clojure seems to be doing 
well.


A little digression...

I've been thinking that pythons byte code could be a little higher level 
and still be efficient.  So I wrote a little interpreter (in python) to 
test some ideas.

   * Use nested [sequences] in the bytecode rather than jumps.
   * Use tuples to contain the function followed by it's arguments..
        (very easy to parse and evaluate this way).
   * Keep bytecodes and keywords to a minimal set.
   * Use as little punctuation symbols as possible.

I didn't start learning lisp and scheme until after I got that much 
working. But it turned out (and I recently found out) it to be very close 
to a scheme interpreter or lisp interpreter.

This example works, with some syntax alterations from the lisp version. 
And after adding in the needed standard lisp/scheme functions.. car, cdr, 
etc...

(Looks much nicer with syntax highlighting.)

def "power-set set" [
     if (is-empty set) [return (list set)]
     let psets-without-car (power-set (cdr set))
     let psets-with-car
         (mapcar
             (cons
                 (lambda 'subset' [return (cons (car set) subset)])
                 psets-without-car))
     return (join-lists psets-with-car psets-without-car)
]

echo (power-set [[1 2] [3 4] [5 6]])

(power-set [[1 2] [3 4] [5 6]]) == [[[1 2] [3 4] [5 6]] [[1 2] [3 4]] [[1 
2] [5 6]] [[1 2]] [[3 4] [5 6]] [[3 4]] [[5 6]] []]


When I first looked at the lisp version of this, I had no idea it had three 
different return points.  Explicit is better than implicit. ;-)


So how does that anything like byte code?  The structure is just nested 
tokens and symbols.


(This will parse and evaluate fine in this form ... with the comments too.)

     def               # keyword
     "power-set set"   # string-literal
     [                 # block-start
     if                # keyword
     (                 # expression-start
     is-empty          # name
     set               # name
     )                 # expression-stop
     [                 # block-start
     return            # keyword
     (                 # expression-start
     list              # name
     set               # name
     )                 # expression-stop
     ]                 # block-stop
     .... etc.


There's a bit more... python wrappers for dictionaries, and sets, and lower 
level functions.

I was thinking of rewriting it in C and seeing how it performs, but 
Gambit-C may be a better choice.  It has nearly all the low level features 
python needs and compiles to C or Java.
So don't hold your breath.
It will be a while before I can get that far.  :-)

Cheers,
     Ron










From tim.peters at gmail.com  Wed Oct 16 22:56:38 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Wed, 16 Oct 2013 15:56:38 -0500
Subject: [Python-ideas] Another "little" combination, permutation,
 iterator, yield from, example.
In-Reply-To: <l3lb8h$2ul$1@ger.gmane.org>
References: <l3k736$mj7$1@ger.gmane.org>
 <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
 <l3lb8h$2ul$1@ger.gmane.org>
Message-ID: <CAExdVNnybqFa5pTk9dRANogRGV=auZUPaTMKs83T4nOLdhu=uQ@mail.gmail.com>

...

[Tim]
>> It's solving a different problem, though.  In your "Return all unique
>> combinations of elements in seq", by "unique" you really mean "by
>> position", not " by value".  ...

[Ron Adam]
> Correct.
>
> And sometimes an items position is part of what makes it unique.  Multiple
> roller combination locks and slot-machines, come to mind.  I'm sure there
> are other things where the label or face value is only one aspect of it's
> identity.

I'm wondering whether you read any of the thread this one spawned
from?  Yes, it was long and tedious ;-)  But Python's itertools
already has `permutations()` and `combinations()` functions that work
by position.  Those problems are already solved in the standard
library.  The entire point of that thread was to investigate "by
value" functions instead.

>> >>>   for p in unique_sets('aab'):
>> ...        print(p)
>> ['a']
>> ['a']
>> ['b']
>> ['b', 'a']
>> ['a', 'a']
>> ['b', 'a']
>> ['b', 'a', 'a']
>>
>> See?  ['a'] is produced twice, and so is ['b', 'a'].  The whole point
>> of the algorithms in the thread this spawned from was to avoid
>> generating *value*-duplicates to begin with.  Filtering them out later
>> is too inefficient.

> Well, we can filter them to begin-with instead of later.
>
> for p in unique_sets(list(set('aaaaaaaa'))):
>     print(p)
>
> ['a']

Yes, and that approach was already discussed at great length.  It's
semantically incorrect; e.g.,

>>> for a in anagrams('aab', 2):
...    print(a)
('a', 'a')
('a', 'b')
('b', 'a')

('a', 'a') is _expected_ output.

> And skip combination duplicates later as they are produced.
>
> for p in unique_sets('aaaaaaaa'):
>     if cached(tuple(p)):
>        continue
>     print(p)
>
> ['a']
> ['a', 'a']
> ...

And that approach too was discussed at great length.  It's too slow
and too wasteful of space in general,  For a time example,

>>> for a in anagrams('A' * 1000 + 'B' * 1000, 2):
...     print(a)
('A', 'A')
('A', 'B')
('B', 'A')
('B', 'B')

How long do you think unique_sets would take to do that?  No, I can't
count that high either ;-)

For a space example, just pick one with _many_ distinct outputs.  The
cache must grow to hold all of them.  All discussed before.

> ...
> Not generating them to begin with requires some sort of known ordering or
> pattern in the data.

It requires comparing the iterable's elements for equality.  That's
all.  Where N = len(iterable), the algorithm I posted requires
worst-case O(N) space and worst-case N-choose-2 item comparisons,
regardless of how many "anagrams" are generated.


> The difference between checking them inside the algorithm as they are
> produced, and checking them externally as they are *yielded* is only the
> yield overhead if we are talking about python code in both places.

The algorithm I posted was Python code, and can be astronomically more
efficient than "checking them externally" (see the 'A' 1000 + 'B' *
1000 example above).

> ...
> I've been thinking that pythons byte code could be a little higher level and
> still be efficient.  So I wrote a little interpreter (in python) to test
> some ideas.
>
> [much snipped]

That's a lot more interesting to me ;-)  No time for it now, though.
Maybe later.

From abarnert at yahoo.com  Thu Oct 17 03:13:59 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 16 Oct 2013 18:13:59 -0700
Subject: [Python-ideas] Another "little" combination, permutation,
	iterator, yield from, example.
In-Reply-To: <l3lb8h$2ul$1@ger.gmane.org>
References: <l3k736$mj7$1@ger.gmane.org>
 <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
 <l3lb8h$2ul$1@ger.gmane.org>
Message-ID: <0481416B-78F2-4320-8A40-975453759C3F@yahoo.com>

On Oct 15, 2013, at 23:21, Ron Adam <ron3200 at gmail.com> wrote:

> A little digression...
> 
> I've been thinking that pythons byte code could be a little higher level and still be efficient.  So I wrote a little interpreter (in python) to test some ideas.

Please post this on a separate thread. It's a potentially very cool idea, and it's in danger of being lost buried under yet another combinatorially-slow implementation of multiset/sequence permutations.

From felix at groebert.org  Thu Oct 17 11:36:56 2013
From: felix at groebert.org (=?ISO-8859-1?Q?Felix_Gr=F6bert?=)
Date: Thu, 17 Oct 2013 11:36:56 +0200
Subject: [Python-ideas] pytaint: taint tracking in python
In-Reply-To: <BE4F99C2-121D-49FF-9C10-15A61ED7C4F2@yahoo.com>
References: <CABY9dC0cMbLxGb6u3Ffw09rL4DF_yAsAHhoz4SHf-g9cu1RX+A@mail.gmail.com>
 <525DB031.2030409@nedbatchelder.com>
 <CADiSq7cjrGCiTNmgw8s39j=yPuCufuZ27ft9ac1bCutcAPnLBQ@mail.gmail.com>
 <BE4F99C2-121D-49FF-9C10-15A61ED7C4F2@yahoo.com>
Message-ID: <CABY9dC14jc3CDT2_05TsESSoYLYYxvq0sk3oGJZv9omaDV7eXw@mail.gmail.com>

Sorry for quoting indirectly.

> Note that web frameworks, etc, are not in the stdlib. I am not sure that
taints should be either.

In pytaint we decided to modify the interpreter (and provide a helper
module) for several reasons, the major reason being performance. If you
just do wrapping/monkey patching of str/unicode, the performance impact is
much bigger since a lot of internals are using str/unicode. Thus the
overall slowdown is high for a wrapper-based implementation.

https://github.com/felixgr/pytaint/commit/07254534810341b3552a8c8452bbf749fe2f30c9#diff-2

Therefore, I think the feature should be a part of (1) the language and (2)
embedded in the core interpreter mechanics.

> That being said, with no investigation into the difficulties or costs of
implementing taint tracking in PyPy, Jython, and IronPython, not to mention
not-quite-implementations like Cython, there might be other arguments for
that position.

I cannot speak for the projects but a colleague has previously implemented
a similar feature for Java and Ruby. This, at least, hints towards a
feasible implementation for Jython.

http://www.youtube.com/watch?v=WmZvnKYiNlE

http://repo.staticsafe.ca/presentations/hitbsecconf2012/D1T2%20-%20Meder%20Kydyraliev%20-%20Defibrilating%20Web%20Security.pdf

> I'd be interested to hear why this feature isn't used in the languages
that already have it

As it was mentioned earlier, we suggest a different form of taint tracking.
Pytaint-style taint tracking is (a) company/framework-wide configurable,
(b) distinguishes between different forms of taint and cleaners and (c) is
more performant than previous python implementations.

To expand on point (a) again, I think it would be very beneficial to web
app frameworks to have pytaint. Web app frameworks could then continue to
provide APIs which are SQLi-safe (parameterized) and SQLi-unsafe (raw
strings). If a user without knowledge about the domain would then use one
of the unsafe API insecurely, pytaint would catch it. And a user who is
familiar with the problem domain could still continue to use the more
flexible but unsafe API securely.
SQLi is just an example here, there a many other possible security issues
which can mitigated with pytaint (see the examples on github).

> A way to track the origins of tainted objects would also be a big winner

I agree it would be a cool additional optional feature of pytaint.
But let's focus this discussion on the currently proposed pytaint
design/implementation :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131017/5bd3e66b/attachment.html>

From ron3200 at gmail.com  Thu Oct 17 18:10:22 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Thu, 17 Oct 2013 11:10:22 -0500
Subject: [Python-ideas] Extremely weird itertools.permutations
In-Reply-To: <CAEbHw4a_n0io+Diq5K5T-6PAVTDRZx3H6sgSHQRp4xc5GRxu4g@mail.gmail.com>
References: <9ae0d30b-1c32-4041-9282-19d00a9f8f9f@googlegroups.com>
 <20131012020647.GH7989@ando>
 <CAA68w_mu0_qjQd_BQrpQjqVj77Aarw+qjM2QEKS9YSFis-zfHw@mail.gmail.com>
 <CAEbHw4aLXROrzPvh9xE2-TwOWnVRDE4q9PNUW1eZtQncCOh=Hg@mail.gmail.com>
 <6123349B-FFCB-42FE-B973-B3C8251302C4@yahoo.com>
 <CAEbHw4Y3i7VRpwrSV47wxK5eau7H2oUJvc09FCu8ubSBefdYAQ@mail.gmail.com>
 <CAA68w_=gVSEOo=+WOm-D4YaF+EAA05gPa4V86gojEv=kC57seg@mail.gmail.com>
 <CAEbHw4a_n0io+Diq5K5T-6PAVTDRZx3H6sgSHQRp4xc5GRxu4g@mail.gmail.com>
Message-ID: <l3p255$1rj$1@ger.gmane.org>



On 10/12/2013 02:09 AM, David Mertz wrote:
> On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar
> <mistersheik at gmail.com
> <mailto:mistersheik at gmail.com>> wrote:
>
>     Why not just use the standard python way to generalize this: "key"
>     rather than the nonstandard "filter_by".
>
>
> Yes, 'key' is a much better name than what I suggested.
>
> I'm not quite sure how best to implement this still.  I guess MRAB's
> recursive approach should work, even though I like the simplicity of my
> style that takes full advantage of the existing itertools.permutations()
> (and uses 1/3 as many lines of--I think clearer--code).  His has the
> advantage, however, that it doesn't require operator.lt
> <http://operator.lt>() to work... however, without benchmarking, I have a
> pretty strong feeling that my suggestion will be faster since it avoids all
> that recursive call overhead.  Maybe I'm wrong about that though.


I'd like to see some nice examples of how it's used.  It seems to me, There 
is some mixing of combination/permutation concepts.


def filter_repeats(itr):
     seen = set()
     for i in itr:
         if i in seen:
             continue
         seen.add(i)
         yield i

def unique_combinations(itr, length=None):
     if length == None:
         length = len(itr)
     return it.product(filter_repeats(itr), repeat=length)


This one isn't the same as yours, but it's an example of filter_repeats. 
While that isn't the most efficient way in some cases, filtering repeat 
items out of things is a fairly common problem.

Cheers,
    Ron


From ron3200 at gmail.com  Thu Oct 17 19:58:46 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Thu, 17 Oct 2013 12:58:46 -0500
Subject: [Python-ideas] Another "little" combination, permutation,
 iterator, yield from, example.
In-Reply-To: <0481416B-78F2-4320-8A40-975453759C3F@yahoo.com>
References: <l3k736$mj7$1@ger.gmane.org>
 <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
 <l3lb8h$2ul$1@ger.gmane.org> <0481416B-78F2-4320-8A40-975453759C3F@yahoo.com>
Message-ID: <l3p8gd$kbr$1@ger.gmane.org>



On 10/16/2013 08:13 PM, Andrew Barnert wrote:
> On Oct 15, 2013, at 23:21, Ron Adam<ron3200 at gmail.com>  wrote:
>
>> >A little digression...
>> >
>> >I've been thinking that pythons byte code could be a little higher level and still be efficient.  So I wrote a little interpreter (in python) to test some ideas.
> Please post this on a separate thread. It's a potentially very cool idea, and it's in danger of being lost buried under yet another combinatorially-slow implementation of multiset/sequence permutations.

Potentially is the key word here.  It's in very very early stages.

I will put the files in the tracker once I get it cleaned up a bit more. 
(in a few days.)  And will post a link to it in a new thread and a summary.

The part I've been working on is just to design a simple mid/lower level 
language and test the basic design.  It's about 10 times slower than python 
right now, becuase python is used to simulate the evaluation loop and call 
stack.  It's pretty much just a toy at this stage, but an interesting one.

Ultimately I'd like to have it run on Gambit-C, which is very fast, but 
hides all the low-level memory and resouces mangagement things undernieth 
it and will compile the programs to C code.  So I think it will give many 
new options.

The down side is it will require learning some scheme programming.  The 
test language I'm toying with can make that part much easier I hope.

Cheers,
    Ron


























From tjreedy at udel.edu  Thu Oct 17 23:59:08 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 17 Oct 2013 17:59:08 -0400
Subject: [Python-ideas] Another "little" combination, permutation,
 iterator, yield from, example.
In-Reply-To: <l3p8gd$kbr$1@ger.gmane.org>
References: <l3k736$mj7$1@ger.gmane.org>
 <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
 <l3lb8h$2ul$1@ger.gmane.org> <0481416B-78F2-4320-8A40-975453759C3F@yahoo.com>
 <l3p8gd$kbr$1@ger.gmane.org>
Message-ID: <l3pmj2$lou$1@ger.gmane.org>

On 10/17/2013 1:58 PM, Ron Adam wrote:
>
>
> On 10/16/2013 08:13 PM, Andrew Barnert wrote:
>> On Oct 15, 2013, at 23:21, Ron
>> Adam<ron3200 at gmail.com>  wrote:
>>
>>> >A little digression...
>>> >
>>> >I've been thinking that pythons byte code could be a little higher
>>> level and still be efficient.  So I wrote a little interpreter (in
>>> python) to test some ideas.

If you are not familiar with http://code.google.com/p/wpython2/
"A wordcode-based Python implementation"
you might find it interesting.

-- 
Terry Jan Reedy


From ron3200 at gmail.com  Fri Oct 18 03:03:00 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Thu, 17 Oct 2013 20:03:00 -0500
Subject: [Python-ideas] Another "little" combination, permutation,
 iterator, yield from, example.
In-Reply-To: <l3pmj2$lou$1@ger.gmane.org>
References: <l3k736$mj7$1@ger.gmane.org>
 <CAExdVNkcT+--aJTgf+0+uLsyrE1rzdROoQtQT_JTs31qC0K+LA@mail.gmail.com>
 <l3lb8h$2ul$1@ger.gmane.org> <0481416B-78F2-4320-8A40-975453759C3F@yahoo.com>
 <l3p8gd$kbr$1@ger.gmane.org> <l3pmj2$lou$1@ger.gmane.org>
Message-ID: <l3q1br$ukm$1@ger.gmane.org>



On 10/17/2013 04:59 PM, Terry Reedy wrote:
> On 10/17/2013 1:58 PM, Ron Adam wrote:
>>
>>
>> On 10/16/2013 08:13 PM, Andrew Barnert wrote:
>>> On Oct 15, 2013, at 23:21, Ron
>>> Adam<ron3200 at gmail.com>  wrote:
>>>
>>>> >A little digression...
>>>> >
>>>> >I've been thinking that pythons byte code could be a little higher
>>>> level and still be efficient.  So I wrote a little interpreter (in
>>>> python) to test some ideas.
>
> If you are not familiar with http://code.google.com/p/wpython2/
> "A wordcode-based Python implementation"
> you might find it interesting.

Thanks, that looks interesting, I hadn't seen it before.  I glanced over 
the following...

http://wpython2.googlecode.com/files/Cleanup%20and%20new%20optimizations%20in%20WPython%201.1.pdf

I'm looking for a path that will also lead to creating stand alone 
executables easily.  This is a taste of that ...

https://groups.google.com/forum/#!topic/clojure/CsGhwc3oyUQ

I'm not sure how difficult it will be, but it looks interesting.  Note that 
they talk about outputting scheme code... clojure is very near scheme, so 
that's not surprising.  Python outputs byte_code which is a bit harder to 
convert.  We might be able to improve that part of it.

Cheers,
    Ron


From techtonik at gmail.com  Mon Oct 21 06:27:25 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Mon, 21 Oct 2013 07:27:25 +0300
Subject: [Python-ideas] Python 3.4 should include docopt as-is
In-Reply-To: <l26753$pc1$1@ger.gmane.org>
References: <CAPkN8xJ_-jq-iji9gZ+9x2QmWax+5MEeb+DUT2J_iWhV7VzZpg@mail.gmail.com>
 <l26753$pc1$1@ger.gmane.org>
Message-ID: <CAPkN8xK0T3zM6f4Aku8FLezcg+AyCCcNjXecDixHBXYxGU8www@mail.gmail.com>

On Sat, Sep 28, 2013 at 12:22 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> On 28/09/2013 05:44, anatoly techtonik wrote:
>>
>> This - http://docopt.org/ - should be included with Python 3.4
>> distribution.
>> --
>> anatoly t.
>>
>
> Have you had the courtesy to ask the maintainer of this library their
> opinions prior to placing this?

Courtesy - yes. Time - no.

License permits the code to be included in Python distribution.
Vladimir is also quite open about it. If docopt name should not
be settled to reference to stdlib module, it can be renamed.
--
anatoly t.

From techtonik at gmail.com  Mon Oct 21 06:39:09 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Mon, 21 Oct 2013 07:39:09 +0300
Subject: [Python-ideas] Python 3.4 should include docopt as-is
In-Reply-To: <5246B2A2.4080508@nedbatchelder.com>
References: <CAPkN8xJ_-jq-iji9gZ+9x2QmWax+5MEeb+DUT2J_iWhV7VzZpg@mail.gmail.com>
 <5246B2A2.4080508@nedbatchelder.com>
Message-ID: <CAPkN8xJZWGZ=XEEh0yoHeS9DcMwSm6jAGcR_z9_H2OvjZYGUPQ@mail.gmail.com>

On Sat, Sep 28, 2013 at 1:42 PM, Ned Batchelder <ned at nedbatchelder.com> wrote:
> On 9/28/13 12:44 AM, anatoly techtonik wrote:
>
> This - http://docopt.org/ - should be included with Python 3.4 distribution.
>
>
> In addition to the other questions already asked, you haven't answered the
> fundamental one: Why should docopt be included in the stdlib?

Because it is the easiest and most intuitive way to quickly build
command line parser with a less amount of writing. It also provides
synced help, custom formatting, really short parser definition syntax
and subcommands out of the box.

But the main reason that it is a  'fastest way ever to expose script
functions to command line user interface'. Writing it is as fast as
10x times on average compared to argparse, optparse and getopt
interfaces. 50 minutes on argparse with debug and 5 on docopt. 5
minutes regardless of your experience. For a newbie getting what
argparse does may take more that 50 minutes on average, and it is
still probably the same 5 minutes for docopt.

> It's right
> there in PyPI where any one can get it.  Why is it better in the stdlib than
> in PyPI?

Because you need a Python on your machine. Language with batteries
included. Not a C or Java where probably need to download libraries
even to work with strings. Setting docopt on every machine where you
need to quickly give some variations for execution flow to your
one-time command line script is akin to launching the C compiler with
appropriate include paths.
--
anatoly t.

From techtonik at gmail.com  Mon Oct 21 06:43:42 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Mon, 21 Oct 2013 07:43:42 +0300
Subject: [Python-ideas] Python 3.4 should include docopt as-is
In-Reply-To: <CAPkN8xJZWGZ=XEEh0yoHeS9DcMwSm6jAGcR_z9_H2OvjZYGUPQ@mail.gmail.com>
References: <CAPkN8xJ_-jq-iji9gZ+9x2QmWax+5MEeb+DUT2J_iWhV7VzZpg@mail.gmail.com>
 <5246B2A2.4080508@nedbatchelder.com>
 <CAPkN8xJZWGZ=XEEh0yoHeS9DcMwSm6jAGcR_z9_H2OvjZYGUPQ@mail.gmail.com>
Message-ID: <CAPkN8xK=y7EF-oF31HKMyEYsg40ggc60fw-tc9SqtydX+5j-CA@mail.gmail.com>

On Mon, Oct 21, 2013 at 7:39 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> On Sat, Sep 28, 2013 at 1:42 PM, Ned Batchelder <ned at nedbatchelder.com> wrote:
>> On 9/28/13 12:44 AM, anatoly techtonik wrote:
>>
>> This - http://docopt.org/ - should be included with Python 3.4 distribution.
>>
>>
>> In addition to the other questions already asked, you haven't answered the
>> fundamental one: Why should docopt be included in the stdlib?
>
> Because it is the easiest and most intuitive way to quickly build
> command line parser with a less amount of writing. It also provides
> synced help, custom formatting, really short parser definition syntax
> and subcommands out of the box.
>
> But the main reason that it is a  'fastest way ever to expose script
> functions to command line user interface'. Writing it is as fast as
> 10x times on average compared to argparse, optparse and getopt
> interfaces. 50 minutes on argparse with debug and 5 on docopt. 5
> minutes regardless of your experience. For a newbie getting what
> argparse does may take more that 50 minutes on average, and it is
> still probably the same 5 minutes for docopt.
>
>> It's right
>> there in PyPI where any one can get it.  Why is it better in the stdlib than
>> in PyPI?
>
> Because you need a Python on your machine. Language with batteries
> included. Not a C or Java where probably need to download libraries
> even to work with strings. Setting docopt on every machine where you
> need to quickly give some variations for execution flow to your
> one-time command line script is akin to launching the C compiler with
> appropriate include paths.

Also, the command line interface definition (help on script abilities)
in docopt can be easily read by any human without the need to run the
script or decipher argparse/optparge function call parameters.
--
anatoly t.

From techtonik at gmail.com  Mon Oct 21 06:45:09 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Mon, 21 Oct 2013 07:45:09 +0300
Subject: [Python-ideas] PyPi per-file download counters
In-Reply-To: <CAFYqXL8o6FjwSxfOkuQE3k8A-imogimpaGoqV1ay=o_jTKBCGQ@mail.gmail.com>
References: <CAFYqXL82hHpZ5PaY3NK9_FZ0kxGmhmTAQ0j2erc6cwz15B==cg@mail.gmail.com>
 <CAFYqXL8o6FjwSxfOkuQE3k8A-imogimpaGoqV1ay=o_jTKBCGQ@mail.gmail.com>
Message-ID: <CAPkN8xLfqyathFTrKkp9DMKFpm8xeBuAX+9VOFiK2o6mN5rVAA@mail.gmail.com>

This probably should be forwarded to infrastructure@
--
anatoly t.


On Sat, Sep 28, 2013 at 2:57 PM, Giampaolo Rodola' <g.rodola at gmail.com> wrote:
> ...also, it seems the current counters are broken.
> I uploaded those files this morning and the page says there were over 5000
> downloads in the last month.
>
>
> --- Giampaolo
> http://code.google.com/p/pyftpdlib/
> http://code.google.com/p/psutil/
> http://code.google.com/p/pysendfile/
>
>
> On Sat, Sep 28, 2013 at 1:48 PM, Giampaolo Rodola' <g.rodola at gmail.com>
> wrote:
>>
>> I recently moved psutil .tar.gz and .exe files from Google Code to PyPi
>> and noticed it doesn't show total per-file download counters:
>> https://pypi.python.org/pypi?:action=display&name=psutil#downloads
>> Why don't we add them? Thoughts?
>>
>> --- Giampaolo
>> http://code.google.com/p/pyftpdlib/
>> http://code.google.com/p/psutil/
>> http://code.google.com/p/pysendfile/
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>

From antony.lee at berkeley.edu  Mon Oct 21 07:51:40 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Sun, 20 Oct 2013 22:51:40 -0700
Subject: [Python-ideas] Enum and random.choice
Message-ID: <CAGRr6BGFEUdeHGfZFtv9V7nNPs-jL_ZUvDupf3aHB1woc46ZUA@mail.gmail.com>

random.choice and Enum interact poorly, because indexing an enum class
doesn't work as choice expects:

>>> import enum, random
>>> class C(enum.Enum): a, b = 1, 2
...
>>> random.choice(C)
<traceback...>
KeyError: 1

Of course one can do `random.choice(list(C.__members__.values()))` but this
feels a bit awkward.

The source for random.choice is

def choice(self, seq):
    """Choose a random element from a non-empty sequence."""
    try:
        i = self._randbelow(len(seq))
    except ValueError:
        raise IndexError('Cannot choose from an empty sequence')
    return seq[i]

Adding seq = list(seq) should be enough to fix this.

Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131020/0b9c7efd/attachment.html>

From carlopires at gmail.com  Mon Oct 21 13:36:58 2013
From: carlopires at gmail.com (Carlo Pires)
Date: Mon, 21 Oct 2013 09:36:58 -0200
Subject: [Python-ideas] Python 3.4 should include docopt as-is
In-Reply-To: <CAPkN8xK=y7EF-oF31HKMyEYsg40ggc60fw-tc9SqtydX+5j-CA@mail.gmail.com>
References: <CAPkN8xJ_-jq-iji9gZ+9x2QmWax+5MEeb+DUT2J_iWhV7VzZpg@mail.gmail.com>
 <5246B2A2.4080508@nedbatchelder.com>
 <CAPkN8xJZWGZ=XEEh0yoHeS9DcMwSm6jAGcR_z9_H2OvjZYGUPQ@mail.gmail.com>
 <CAPkN8xK=y7EF-oF31HKMyEYsg40ggc60fw-tc9SqtydX+5j-CA@mail.gmail.com>
Message-ID: <CAO6hKotZvBhdDVBTR+jC3MRWwm0cgjYOnNfyaHj2YAbF5Mr0Og@mail.gmail.com>

+1
docopt is the killer module "de facto" for command line programs.


2013/10/21 anatoly techtonik <techtonik at gmail.com>

> On Mon, Oct 21, 2013 at 7:39 AM, anatoly techtonik <techtonik at gmail.com>
> wrote:
> > On Sat, Sep 28, 2013 at 1:42 PM, Ned Batchelder <ned at nedbatchelder.com>
> wrote:
> >> On 9/28/13 12:44 AM, anatoly techtonik wrote:
> >>
> >> This - http://docopt.org/ - should be included with Python 3.4
> distribution.
> >>
> >>
> >> In addition to the other questions already asked, you haven't answered
> the
> >> fundamental one: Why should docopt be included in the stdlib?
> >
> > Because it is the easiest and most intuitive way to quickly build
> > command line parser with a less amount of writing. It also provides
> > synced help, custom formatting, really short parser definition syntax
> > and subcommands out of the box.
> >
> > But the main reason that it is a  'fastest way ever to expose script
> > functions to command line user interface'. Writing it is as fast as
> > 10x times on average compared to argparse, optparse and getopt
> > interfaces. 50 minutes on argparse with debug and 5 on docopt. 5
> > minutes regardless of your experience. For a newbie getting what
> > argparse does may take more that 50 minutes on average, and it is
> > still probably the same 5 minutes for docopt.
> >
> >> It's right
> >> there in PyPI where any one can get it.  Why is it better in the stdlib
> than
> >> in PyPI?
> >
> > Because you need a Python on your machine. Language with batteries
> > included. Not a C or Java where probably need to download libraries
> > even to work with strings. Setting docopt on every machine where you
> > need to quickly give some variations for execution flow to your
> > one-time command line script is akin to launching the C compiler with
> > appropriate include paths.
>
> Also, the command line interface definition (help on script abilities)
> in docopt can be easily read by any human without the need to run the
> script or decipher argparse/optparge function call parameters.
> --
> anatoly t.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>



-- 
  Carlo Pires
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/eab1075a/attachment.html>

From ncoghlan at gmail.com  Mon Oct 21 13:45:44 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 21 Oct 2013 21:45:44 +1000
Subject: [Python-ideas] Enum and random.choice
In-Reply-To: <CAGRr6BGFEUdeHGfZFtv9V7nNPs-jL_ZUvDupf3aHB1woc46ZUA@mail.gmail.com>
References: <CAGRr6BGFEUdeHGfZFtv9V7nNPs-jL_ZUvDupf3aHB1woc46ZUA@mail.gmail.com>
Message-ID: <CADiSq7efeVjHBn-xRvToPAebW6utg=fTF_napZxUBYuAa9aysg@mail.gmail.com>

On 21 Oct 2013 15:52, "Antony Lee" <antony.lee at berkeley.edu> wrote:
>
> random.choice and Enum interact poorly, because indexing an enum class
doesn't work as choice expects:
>
> >>> import enum, random
> >>> class C(enum.Enum): a, b = 1, 2
> ...
> >>> random.choice(C)
> <traceback...>
> KeyError: 1
>
> Of course one can do `random.choice(list(C.__members__.values()))` but
this feels a bit awkward.
>
> The source for random.choice is
>
> def choice(self, seq):
>     """Choose a random element from a non-empty sequence."""
>     try:
>         i = self._randbelow(len(seq))
>     except ValueError:
>         raise IndexError('Cannot choose from an empty sequence')
>     return seq[i]
>
> Adding seq = list(seq) should be enough to fix this.

How is this different from passing a dict to random.choice? Enum classes
are iterables, not sequences.

Cheers,
Nick.

>
> Antony
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/37ec19b5/attachment.html>

From kristjan at ccpgames.com  Mon Oct 21 15:55:28 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Mon, 21 Oct 2013 13:55:28 +0000
Subject: [Python-ideas] A different kind of context manager
Message-ID: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>

Hello there!
This is a rehash of something that I wrote on stackless-dev recently.  Perhaps this has been suggested in the past.  If so, please excuse my ignorance.

It irks me sometimes how inflexible context managers can be.  For example, wouldn't it be nice to be able to write
with as_subprocess():
    do_stuff()

or in StacklessPython:
with different_tasklet():
  do_stuff()

This is currently impossible because context managers are implemented as __enter__()/__exit__() methods running in the current scope.
There is no callable function site to pass ot a subprocess module, or a tasklet scheduler.


I have another favorite pet-peeve, which is that for some things, a pair of context managers are needed, since a single context manager cannot silence it's own exception:
  with IgnoreError, LockResourceOrRaiseIfBusy(resource):
    do_stuff

cannot be collapsed into:
  with LockResourceOrPassIfBusy(resource):
    do_stuff.


But another thing is also interesting:  Even though context managers are an __enter__() / __exit__() pair, the most common idiom these days is to write:
@contextlib.contextmanager
def mycontextmanager():
  setup()
  try:
    yield
  finally():
    teardown()

or similar.
There are a million reasons for this.  Mostly it is because this layout is easier to figure out and plays nicer in the head.  It also simplifies error handling, because regular try/except clauses can be used.  If you are writing a "raw" context manager, you have to explicitly maintain some state between the __enter__() and __exit__() methods to know what to clean up and how, depending on the error conditions.  This quickly becomes tedious.

And what does the above code look like?  Well, the place of the "yield", could just as well be a call site.  I mean, the decorated "contextmanager" function simply looks like a wrapper function around a function call.  You write it exactly as you would write a wrapper function ,except where you would call the function, you use the "yield" statement (and you _have_ to call yield.  Can't skip it for whatever reason).

So, If this is the way people like to think about context managers, like writing wrapper functoins, why don't we turn them into proper wrapper functions?

What if a context manager were given a _callable_, representing the code?

like this:



class NewContextManager(object):

  # A context manager that locks a resource, then executes the code only if it is not recursing

  def __init__(self, lock):

    self.lock = lock

  def __contextcall__(self, code):

    with lock:

      if lock.active:

        return  # This is where @contextmanager will stop you, you can't skip the 'yield'

      lock.active = True

      try:

        return code(None) # optionally pass value to the code as in "with foo() as X"

      finally:

        lock.active = False





The cool thing here though, is that "code" could, for example, be run on a different tasklet.  Or a different thread.  Or a different universe:

def TaskletContextManager(object):

  def __contextcall__(self, code):

    return stacklesslib.run_on_tasklet(code)



def ThreadContextManager(object):

  def __contextcall__(self, code):

    result = []

    def helper():

      result.append(code())

    t = threading.Thread(target=helper)

    t.start()

    t.join()

    return result[0]





This sort of thing would need compiler and syntax support, of course.  The compiler would need to create an anonymous function object.   The return value out of "code" would be some token that could be special if the code returned....



To illustrate, let's see how this can be done manually:
This code here:
  with foo() as bar :
     if condition:
        return stuff
      do_stuff(bar)

can be re-written like this:

  def _code(_arg):
    bar = _arg
    if condition:
      return True, stuff  # early return
   do_stuff(bar)
   return False, None # no return

  token, value = foo(bar).__contextcall__(_code):
  if token is True
   return value

where:
  class foo(object):
    def __init__(self, arg):
      self.arg = arg
    def __contextcall__(self, _code):
   set_up()
    try:
      return _code(None) #pass it some value
    finally:
      tear_down()







Compiler support for this sort of thing would entail the automatic creation of the "_code" function as an anonymous function with special semantics for a "return" value.  This function is then passed to the __contextcall__() method of the "new" context manager, where the context manager treats it as any other callable, which' return value it must return.



The "early return" can also be done as a special kind of exception, ContextManagerReturn(value).



So, anyway.  Context manager syntax is really nice for so many reasons, which is why we have it in the language, instead of wrapper functions.  But if it _were_ just syntactic sugar for actual wrapper functions, they would be even awesomer.



K





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/fe78a029/attachment-0001.html>

From rosuav at gmail.com  Mon Oct 21 16:41:16 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 22 Oct 2013 01:41:16 +1100
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>

On Tue, Oct 22, 2013 at 12:55 AM, Kristj?n Valur J?nsson
<kristjan at ccpgames.com> wrote:
> This sort of thing would need compiler and syntax support, of course.  The
> compiler would need to create an anonymous function object.   The return
> value out of "code" would be some token that could be special if the code
> returned....

Possible problem: A function creates a new scope, a with block
doesn't. Imagine this:

with different_tasklet():
  foo = 1
print(foo)

In current Python, whatever different_tasklet does, those two foos are
the same foo. If the body becomes a callable, that could get messy. Do
you have to declare 'nonlocal foo'  when you enter the with block?
That'd be a nasty backward-compatibility break. Or is this callable
somehow part of the previous scope? Could work in theory, but would
need a fair amount of magic.

ChrisA

From ncoghlan at gmail.com  Mon Oct 21 16:51:45 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 22 Oct 2013 00:51:45 +1000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CADiSq7fRNO-GAG9ZhHPXRnUW2pHW_QLN-eScN0DzyvOdUgMeXg@mail.gmail.com>

No, it wouldn't. Read PEP 340 (the one Guido wrote *before* descoping it in
PEP  343).

The problem with offering true blocks is that they immediately create
multiple ways to do a lot of different things, and this callback based
variant also plays merry hell with the scoping rules.

That said, something I *have* been thinking might work better than the
status quo is permitting a variant of tuple unpacking that enters each
context manager as it is produced and provides a tuple of the results. So
this would work properly:

    with *(open(name) for name in names) as files:
       ...

And you could factor out with statement skipping as a function returning a
2-tuple (although unpacking the value would be a little annoying in that
case).

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/7450fdb2/attachment.html>

From kristjan at ccpgames.com  Mon Oct 21 16:50:39 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Mon, 21 Oct 2013 14:50:39 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>
Message-ID: <EFE3877620384242A686D52278B7CCD38136CC6D@RKV-IT-EXCH104.ccp.ad.local>



> -----Original Message-----
> Possible problem: A function creates a new scope, a with block doesn't.
> Imagine this:
> 
> with different_tasklet():
>   foo = 1
> print(foo)
> 
> In current Python, whatever different_tasklet does, those two foos are the
> same foo. If the body becomes a callable, that could get messy. Do you have
> to declare 'nonlocal foo'  when you enter the with block?
> That'd be a nasty backward-compatibility break. Or is this callable somehow
> part of the previous scope? Could work in theory, but would need a fair
> amount of magic.

Well, yes, like I said, it could be a new kind of callable if necessary.  But the scope problem is easily solved using "cell" variables, the same way as closures are implemented today.  The compiler, which is building the anonymous function, makes sure to bind local variables to the parent's scope, using cells.

K


From abarnert at yahoo.com  Mon Oct 21 18:14:52 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 21 Oct 2013 09:14:52 -0700
Subject: [Python-ideas] Python 3.4 should include docopt as-is
In-Reply-To: <CAPkN8xJZWGZ=XEEh0yoHeS9DcMwSm6jAGcR_z9_H2OvjZYGUPQ@mail.gmail.com>
References: <CAPkN8xJ_-jq-iji9gZ+9x2QmWax+5MEeb+DUT2J_iWhV7VzZpg@mail.gmail.com>
 <5246B2A2.4080508@nedbatchelder.com>
 <CAPkN8xJZWGZ=XEEh0yoHeS9DcMwSm6jAGcR_z9_H2OvjZYGUPQ@mail.gmail.com>
Message-ID: <C6B35637-AAA8-4B05-98C2-87E2967E96A2@yahoo.com>

On Oct 20, 2013, at 21:39, anatoly techtonik <techtonik at gmail.com> wrote:

> Because you need a Python on your machine. Language with batteries
> included. Not a C or Java where probably need to download libraries
> even to work with strings.

I take it you've never worked with C or Java? Java has multiple string and related classes that have, if anything, too much functionality for a novice to learn. C comes with an extensive set of string functions that are the inspiration (both positive and negative) for the string classes/modules/etc. of most languages that followed.

From abarnert at yahoo.com  Mon Oct 21 18:25:52 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 21 Oct 2013 09:25:52 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136CC6D@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136CC6D@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <0C56C125-F3EB-4302-8B82-F75972EDB34C@yahoo.com>

On Oct 21, 2013, at 7:50, Kristj?n Valur J?nsson <kristjan at ccpgames.com> wrote:

> 
> 
>> -----Original Message-----
>> Possible problem: A function creates a new scope, a with block doesn't.
>> Imagine this:
>> 
>> with different_tasklet():
>>  foo = 1
>> print(foo)
>> 
>> In current Python, whatever different_tasklet does, those two foos are the
>> same foo. If the body becomes a callable, that could get messy. Do you have
>> to declare 'nonlocal foo'  when you enter the with block?
>> That'd be a nasty backward-compatibility break. Or is this callable somehow
>> part of the previous scope? Could work in theory, but would need a fair
>> amount of magic.
> 
> Well, yes, like I said, it could be a new kind of callable if necessary.  But the scope problem is easily solved using "cell" variables, the same way as closures are implemented today.  The compiler, which is building the anonymous function, makes sure to bind local variables to the parent's scope, using cells.

A paradigm case for with is creating a new variable in a context to use outside of it:

    with open(path) as f:
        rows = list(reader(f))
    for row in rows[1:]:

    with lock:
        connections = self.connections
    for connection in connections:

The compiler would not make rows or connections into a closure variable, but a local. So you'd have to write nonlocal in most with statements.

And if you changed the rules so everything was nonlocal by default in a context function, we'd need a new keyword to declare local variables, which would be (a) very different from the rest of python, and (b) hard to come up with a name for that didn't conflict with thousands of different programs.

> 
> K
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas

From bruce at leapyear.org  Mon Oct 21 19:21:31 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 21 Oct 2013 10:21:31 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>

On Mon, Oct 21, 2013 at 6:55 AM, Kristj?n Valur J?nsson <
kristjan at ccpgames.com> wrote:

> So, If this is the way people like to think about context managers, like
> writing wrapper functoins, why don?t we turn them into proper wrapper
> functions?****
>
> <...> The cool thing here though, is that "code" could, for example, be
> run on a different tasklet.  Or a different thread.  Or a different
> universe.
>
Cool, sure. But what are the use cases that need this and can't be done
easily with the existing design?

class NewContextManager(object):
>
  # A context manager that locks a resource, then executes the code only if
> it is not recursing
>   def __init__(self, lock):
>     self.lock = lock
>   def __contextcall__(self, code):
>     with lock:
>       if lock.active:
>         return  # This is where @contextmanager will stop you, you can?t
> skip the ?yield?
>       lock.active = True
>       try:
>         return code(None) # optionally pass value to the code as in "with
> foo() as X"
>       finally:
>         lock.active = False


You can do that with current context managers:

with lock_once(x) as lock_acquired:
    if lock_acquired:  # was not already locked
        do_stuff_once()

@contextmanager
def lock_once(lock):
    if lock.active:
        yield False
    else:
        lock.active = True
        try:
            yield True
        finally:
            lock.active = False


Note that I'm mimicking your lock/unlock code which of course is not the
proper way to acquire/release a lock, but it gets the idea across. How
would making the code inside the with block a callable improve this? I
think this code is easier to read than yours as the logic of whether or not
the do_stuff_once block is executed is where it belongs -- not hidden in
the context manager. Note that my version also allows me to do this, which
I can't easily do with your context manager:

with lock_once(x) as lock_acquired:
    if lock_acquired:  # was not already locked
        do_stuff_once()
    else:
        log('Lock %r was already acquired', x)
    do_stuff_every_time()



--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/d52e5db0/attachment-0001.html>

From antony.lee at berkeley.edu  Mon Oct 21 21:50:58 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Mon, 21 Oct 2013 12:50:58 -0700
Subject: [Python-ideas] Enum and random.choice
In-Reply-To: <CADiSq7efeVjHBn-xRvToPAebW6utg=fTF_napZxUBYuAa9aysg@mail.gmail.com>
References: <CAGRr6BGFEUdeHGfZFtv9V7nNPs-jL_ZUvDupf3aHB1woc46ZUA@mail.gmail.com>
 <CADiSq7efeVjHBn-xRvToPAebW6utg=fTF_napZxUBYuAa9aysg@mail.gmail.com>
Message-ID: <CAGRr6BGe_vWaCgMspSkAM7cbXamsa=EmtENphr1Aa=eR=rFfJA@mail.gmail.com>

Comparing with dicts is a good argument against my suggested change, I
missed that point.
Antony


2013/10/21 Nick Coghlan <ncoghlan at gmail.com>

>
> On 21 Oct 2013 15:52, "Antony Lee" <antony.lee at berkeley.edu> wrote:
> >
> > random.choice and Enum interact poorly, because indexing an enum class
> doesn't work as choice expects:
> >
> > >>> import enum, random
> > >>> class C(enum.Enum): a, b = 1, 2
> > ...
> > >>> random.choice(C)
> > <traceback...>
> > KeyError: 1
> >
> > Of course one can do `random.choice(list(C.__members__.values()))` but
> this feels a bit awkward.
> >
> > The source for random.choice is
> >
> > def choice(self, seq):
> >     """Choose a random element from a non-empty sequence."""
> >     try:
> >         i = self._randbelow(len(seq))
> >     except ValueError:
> >         raise IndexError('Cannot choose from an empty sequence')
> >     return seq[i]
> >
> > Adding seq = list(seq) should be enough to fix this.
>
> How is this different from passing a dict to random.choice? Enum classes
> are iterables, not sequences.
>
> Cheers,
> Nick.
>
> >
> > Antony
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/d513fcd9/attachment.html>

From matt at whoosh.ca  Mon Oct 21 22:57:38 2013
From: matt at whoosh.ca (Matt Chaput)
Date: Mon, 21 Oct 2013 16:57:38 -0400
Subject: [Python-ideas] Enum and random.choice
In-Reply-To: <CAGRr6BGe_vWaCgMspSkAM7cbXamsa=EmtENphr1Aa=eR=rFfJA@mail.gmail.com>
References: <CAGRr6BGFEUdeHGfZFtv9V7nNPs-jL_ZUvDupf3aHB1woc46ZUA@mail.gmail.com>
 <CADiSq7efeVjHBn-xRvToPAebW6utg=fTF_napZxUBYuAa9aysg@mail.gmail.com>
 <CAGRr6BGe_vWaCgMspSkAM7cbXamsa=EmtENphr1Aa=eR=rFfJA@mail.gmail.com>
Message-ID: <52659542.3030304@whoosh.ca>

On 10/21/2013 3:50 PM, Antony Lee wrote:
> Comparing with dicts is a good argument against my suggested change, I
> missed that point.

Also AFAIK list(seq) would do a full copy of the list every time you 
called choice(), which could murder performance with big lists and tight 
loops.

Matt


From kristjan at ccpgames.com  Mon Oct 21 23:05:32 2013
From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=)
Date: Mon, 21 Oct 2013 21:05:32 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <0C56C125-F3EB-4302-8B82-F75972EDB34C@yahoo.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136CC6D@RKV-IT-EXCH104.ccp.ad.local>
 <0C56C125-F3EB-4302-8B82-F75972EDB34C@yahoo.com>
Message-ID: <EFE3877620384242A686D52278B7CCD38136D10E@RKV-IT-EXCH104.ccp.ad.local>

>And if you changed the rules so everything was nonlocal by default in a context function, we'd need a new keyword to declare local variables, which would be (a) very different from the > rest of python, and (b) hard to come up with a name for that didn't conflict with thousands of different programs.



Well no.  You are not defining a function, so all variables are in the same scope as the existing code.  Binding does not change.  ?nonlocal? has exactly the same meaning, meaning outside the function you are writing.  The ?anonymous callable? exists only technically, not syntactically.  It has no local variables, no (visible) arguments.

Semantics stay exactly the same, only the method of invoking the executable code changes.



K





-----Original Message-----
From: Andrew Barnert [mailto:abarnert at yahoo.com]
Sent: 21. okt?ber 2013 16:26
To: Kristj?n Valur J?nsson
Cc: Chris Angelico; python-ideas
Subject: Re: [Python-ideas] A different kind of context manager

A paradigm case for with is creating a new variable in a context to use outside of it:



   with open(path) as f:

        rows = list(reader(f))

    for row in rows[1:]:



    with lock:

        connections = self.connections

    for connection in connections:



The compiler would not make rows or connections into a closure variable, but a local. So you'd have to write nonlocal in most with statements.



And if you changed the rules so everything was nonlocal by default in a context function, we'd need a new keyword to declare local variables, which would be (a) very different from the rest of python, and (b) hard to come up with a name for that didn't conflict with thousands of different programs.



>

> K

>

> _______________________________________________

> Python-ideas mailing list

> Python-ideas at python.org<mailto:Python-ideas at python.org>

> https://mail.python.org/mailman/listinfo/python-ideas


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/024406aa/attachment.html>

From kristjan at ccpgames.com  Mon Oct 21 23:16:54 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Mon, 21 Oct 2013 21:16:54 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
Message-ID: <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>

?  Cool, sure. But what are the use cases that need this and can't be done easily with the existing design?
Exactly those I listed.  Any form of execution that requires a "function" to be run.  This include all existing threading/multiprocessing designs.


?   How would making the code inside the with block a callable improve this?
It allows the "if" statement to be part of the context manager


?  I think this code is easier to read than yours as the logic of whether or not the do_stuff_once block is executed is where it belongs -- not hidden in the context manager.
That is a matter of taste.  Maybe it would make sense to pull other things out of the context manager too?  But taste should not, IMHO, limit our options.  If this is a common idiom, locking and executing once, why shouldn't we be able to write a clever macro/context manger/syntactic sugar to help us with that?  Why insist on this verbosity?

Also, consider my argument that most context managers are written using the @contextmanager paradigm.  We like this idiom so much because this is what we really want to do, call the code from a wrapper function.
If you look at the design of that context manager, it is not exactly straightforward.  This suggests to me that maybe we took a wrong turn deciding on a context manager design.  Maybe we should have selected one in which this sort of coding is its native, natural, form, rather than having this intermarriage kludge which turns an imperative-looking generator into the traditional context manager.


From: Bruce Leban [mailto:bruce at leapyear.org]
Sent: 21. okt?ber 2013 17:22
To: Kristj?n Valur J?nsson
Cc: python-ideas at python.org
Subject: Re: [Python-ideas] A different kind of context manager


On Mon, Oct 21, 2013 at 6:55 AM, Kristj?n Valur J?nsson <kristjan at ccpgames.com<mailto:kristjan at ccpgames.com>> wrote:
So, If this is the way people like to think about context managers, like writing wrapper functoins, why don't we turn them into proper wrapper functions?

<...> The cool thing here though, is that "code" could, for example, be run on a different tasklet.  Or a different thread.  Or a different universe.
Cool, sure. But what are the use cases that need this and can't be done easily with the existing design?


class NewContextManager(object):
  # A context manager that locks a resource, then executes the code only if it is not recursing
  def __init__(self, lock):
    self.lock = lock
  def __contextcall__(self, code):
    with lock:
      if lock.active:
        return  # This is where @contextmanager will stop you, you can't skip the 'yield'
      lock.active = True
      try:
        return code(None) # optionally pass value to the code as in "with foo() as X"
      finally:
        lock.active = False

You can do that with current context managers:

with lock_once(x) as lock_acquired:
    if lock_acquired:  # was not already locked
        do_stuff_once()

@contextmanager
def lock_once(lock):
    if lock.active:
        yield False
    else:
        lock.active = True
        try:
            yield True
        finally:
            lock.active = False

Note that I'm mimicking your lock/unlock code which of course is not the proper way to acquire/release a lock, but it gets the idea across. How would making the code inside the with block a callable improve this? I think this code is easier to read than yours as the logic of whether or not the do_stuff_once block is executed is where it belongs -- not hidden in the context manager. Note that my version also allows me to do this, which I can't easily do with your context manager:

with lock_once(x) as lock_acquired:
    if lock_acquired:  # was not already locked
        do_stuff_once()
    else:
        log('Lock %r was already acquired', x)
    do_stuff_every_time()


--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com<http://www.vroospeak.com/>
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/6574f619/attachment-0001.html>

From haoyi.sg at gmail.com  Mon Oct 21 23:26:06 2013
From: haoyi.sg at gmail.com (Haoyi Li)
Date: Mon, 21 Oct 2013 14:26:06 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>

> Maybe we should have selected one in which this sort of coding is its
native, natural, form, rather than having this intermarriage kludge which
turns an imperative-looking generator into the traditional context manager.

I agree with this 100%. Unfortunately, Python picked the current style of
`with` statements a while ago, and this would be a pretty huge
change/additional feature.

FWIW, other languages with easy anonymous functions (e.g. ruby, scala) have
this, and it does provide all the benefits that you describe. For example,
spinning off parallel tasklets inline is just a matter of `async{ ... }`
where `async` is just a context manager.

As much as I'd like to see python follow suite, getting it into python
would be a pretty large upheaval and not something that I expect to happen
in the near future.

And the last option, if you're crazy, is to use MacroPy and write your own
tasklet/forking context managers! It's surprisingly easy (< 50 lines).


On Mon, Oct 21, 2013 at 2:16 PM, Kristj?n Valur J?nsson <
kristjan at ccpgames.com> wrote:

>  **?  **Cool, sure. But what are the use cases that need this and can't
> be done easily with the existing design?****
>
> Exactly those I listed.  Any form of execution that requires a ?function?
> to be run.  This include all existing threading/multiprocessing designs.**
> **
>
> ** **
>
> **?  ** How would making the code inside the with block a callable
> improve this?****
>
> It allows the ?if? statement to be part of the context manager ****
>
> ** **
>
> **?  **I think this code is easier to read than yours as the logic of
> whether or not the do_stuff_once block is executed is where it belongs --
> not hidden in the context manager.****
>
> That is a matter of taste.  Maybe it would make sense to pull other things
> out of the context manager too?  But taste should not, IMHO, limit our
> options.  If this is a common idiom, locking and executing once, why
> shouldn?t we be able to write a clever macro/context manger/syntactic sugar
> to help us with that?  Why insist on this verbosity?****
>
> ** **
>
> Also, consider my argument that most context managers are written using
> the @contextmanager paradigm.  We like this idiom so much because this is
> what we really want to do, call the code from a wrapper function.****
>
> If you look at the design of that context manager, it is not exactly
> straightforward.  This suggests to me that maybe we took a wrong turn
> deciding on a context manager design.  Maybe we should have selected one in
> which this sort of coding is its native, natural, form, rather than having
> this intermarriage kludge which turns an imperative-looking generator into
> the traditional context manager.****
>
> ** **
>
> ** **
>
> *From:* Bruce Leban [mailto:bruce at leapyear.org]
> *Sent:* 21. okt?ber 2013 17:22
> *To:* Kristj?n Valur J?nsson
> *Cc:* python-ideas at python.org
>
> *Subject:* Re: [Python-ideas] A different kind of context manager****
>
> ** **
>
> ** **
>
> On Mon, Oct 21, 2013 at 6:55 AM, Kristj?n Valur J?nsson <
> kristjan at ccpgames.com> wrote:****
>
> So, If this is the way people like to think about context managers, like
> writing wrapper functoins, why don?t we turn them into proper wrapper
> functions?****
>
> <...> The cool thing here though, is that "code" could, for example, be
> run on a different tasklet.  Or a different thread.  Or a different
> universe.****
>
> Cool, sure. But what are the use cases that need this and can't be done
> easily with the existing design?****
>
> ** **
>
> class NewContextManager(object):****
>
>    # A context manager that locks a resource, then executes the code only
> if it is not recursing
>   def __init__(self, lock):
>     self.lock = lock
>   def __contextcall__(self, code):
>     with lock:
>       if lock.active:
>         return  # This is where @contextmanager will stop you, you can?t
> skip the ?yield?
>       lock.active = True
>       try:
>         return code(None) # optionally pass value to the code as in "with
> foo() as X"
>       finally:
>         lock.active = False****
>
>  ** **
>
> You can do that with current context managers:****
>
> ** **
>
>   with lock_once(x) as lock_acquired:****
>
>     if lock_acquired:  # was not already locked****
>
>         do_stuff_once()****
>
> ** **
>
> @contextmanager****
>
> def lock_once(lock):****
>
>     if lock.active:****
>
>         yield False****
>
>     else:****
>
>         lock.active = True****
>
>         try:****
>
>             yield True****
>
>         finally:****
>
>             lock.active = False****
>
>   ** **
>
> Note that I'm mimicking your lock/unlock code which of course is not the
> proper way to acquire/release a lock, but it gets the idea across. How
> would making the code inside the with block a callable improve this? I
> think this code is easier to read than yours as the logic of whether or not
> the do_stuff_once block is executed is where it belongs -- not hidden in
> the context manager. Note that my version also allows me to do this, which
> I can't easily do with your context manager:****
>
> ** **
>
>    with lock_once(x) as lock_acquired:****
>
>     if lock_acquired:  # was not already locked****
>
>         do_stuff_once()****
>
>     else:****
>
>         log('Lock %r was already acquired', x)****
>
>     do_stuff_every_time()****
>
>   ** **
>
> ** **
>
> --- Bruce
> I'm hiring: http://www.cadencemd.com/info/jobs****
>
> Latest blog post: Alice's Puzzle Page http://www.vroospeak.com****
>
> Learn how hackers think: http://j.mp/gruyere-security****
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/56b472ef/attachment.html>

From ethan at stoneleaf.us  Mon Oct 21 23:26:37 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 21 Oct 2013 14:26:37 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <52659C0D.1070308@stoneleaf.us>

Kristj?n,

Your replies would be much easier to read if trimmed the previous email.

Thanks.

--
~Ethan~

From abarnert at yahoo.com  Tue Oct 22 01:36:42 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 21 Oct 2013 16:36:42 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136D10E@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136CC6D@RKV-IT-EXCH104.ccp.ad.local>
 <0C56C125-F3EB-4302-8B82-F75972EDB34C@yahoo.com>
 <EFE3877620384242A686D52278B7CCD38136D10E@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <EE4BC259-BEB4-4FB7-A693-B289D8EC6183@yahoo.com>

On Oct 21, 2013, at 14:05, Kristj?n Valur J?nsson <kristjan at ccpgames.com> wrote:

> >And if you changed the rules so everything was nonlocal by default in a context function, we'd need a new keyword to declare local variables, which would be (a) very different from the > rest of python, and (b) hard to come up with a name for that didn't conflict with thousands of different programs.
> 
>  
> 
> Well no.  You are not defining a function, so all variables are in the same scope as the existing code.  Binding does not change.  ?nonlocal? has exactly the same meaning, meaning outside the function you are writing.  The ?anonymous callable? exists only technically, not syntactically.  It has no local variables, no (visible) arguments.
> 
> Semantics stay exactly the same, only the method of invoking the executable code changes.
> 
Obviously semantics don't stay exactly the same or there would be no benefits to the change. The whole point is that you're creating a function with closure and visibly passing it to a method of the context manager.

It's either one or the other: either every variable is implicitly nonlocal whether you want it to be or not, or every variable is implicitly local and you have to nonlocal them to perform common context manager idioms.

Compare with comprehensions. Changing them to use functions under the covers had no effect (other than breaking a rare use case with StopIteration passing, which I believe has been fixed), but that's only because the comprehension variable(s) were already explicitly prevented from replacing existing bindings. (The fact that there's no way to explicitly bind a variable in a comprehension helps too--no potential surprised about what "x=2" might do when statements aren't allowed in the first place.) That's obviously not true for with statements.

If you think that every variable being implicitly nonlocal is a good thing, that's certainly arguable (maybe no existing code would ever notice the difference, and new code that did wouldn't be surprised by it?), but only of you make that case instead of trying to argue that there isn't an issue in the first place.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/b3af62dc/attachment-0001.html>

From anntzer.lee at gmail.com  Tue Oct 22 02:18:22 2013
From: anntzer.lee at gmail.com (Antony Lee)
Date: Mon, 21 Oct 2013 17:18:22 -0700 (PDT)
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <3402edcf-1e4e-41f4-be25-f097f89cad07@googlegroups.com>

You can get the desired behavior by (ab)using function decorators, by 
rewriting

with as_subprocess():
>
>     do_stuff()
>
as

@as_subprocess
def _():
   <do stuff>

Yes, it's not very elegant syntactically but gets the work done (and this 
technique is generalizable to most uses of Ruby-style blocks, I believe).

Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131021/cd6d8efd/attachment.html>

From ncoghlan at gmail.com  Tue Oct 22 05:31:41 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 22 Oct 2013 13:31:41 +1000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
Message-ID: <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>

On 22 Oct 2013 07:27, "Haoyi Li" <haoyi.sg at gmail.com> wrote:
>
> > Maybe we should have selected one in which this sort of coding is its
native, natural, form, rather than having this intermarriage kludge which
turns an imperative-looking generator into the traditional context manager.
>
> I agree with this 100%. Unfortunately, Python picked the current style of
`with` statements a while ago, and this would be a pretty huge
change/additional feature.
>
> FWIW, other languages with easy anonymous functions (e.g. ruby, scala)
have this, and it does provide all the benefits that you describe. For
example, spinning off parallel tasklets inline is just a matter of `async{
... }` where `async` is just a context manager.

It's not a coincidence that Ruby (at least - I don't know scala) just
treats for loops and context management as special cases of anonymous
callbacks - the latter is a powerful, more general construct.

By contrast, Python chose the path of providing dedicated syntax for both
iteration *and* context management, and hence requires that callbacks that
don't fit in a single expression be defined prior to use.

I think this makes those constructs easier to understand in many ways, but
it *also* means that we *don't* currently have a clean syntax for single
use callbacks.

Hence the time I've put into PEP 403 and 3150 over the years - a key
objective for both of them is providing a cleaner solution for the problem
of single use callbacks (including those that modify local variables of the
containing function).

In addition to scoping, the other problem single use callbacks need to
handle sensibly is the behaviour of the simple flow control statements:
return, yield, break, continue, and raise.

Building single use callbacks on top of the "def" statement has the
advantage of *not* needing to define any new scoping or control flow
semantics (as they're just ordinary nested scopes).

Defining them any other way makes things far more complicated. It would
certainly be close to impossible to repurpose any of the other existing
compound statements without breaking backwards compatibility.

A completely new keyword is also a possibility, but then it's necessary to
find a good one, and explain it's use cases in a fashion similar to PEP 403.

Cheers,
Nick.

> As much as I'd like to see python follow suite, getting it into python
would be a pretty large upheaval and not something that I expect to happen
in the near future.

>
> And the last option, if you're crazy, is to use MacroPy and write your
own tasklet/forking context managers! It's surprisingly easy (< 50 lines).
>
>
> On Mon, Oct 21, 2013 at 2:16 PM, Kristj?n Valur J?nsson <
kristjan at ccpgames.com> wrote:
>>
>> ?  Cool, sure. But what are the use cases that need this and can't be
done easily with the existing design?
>>
>> Exactly those I listed.  Any form of execution that requires a
?function? to be run.  This include all existing threading/multiprocessing
designs.
>>
>>
>>
>> ?   How would making the code inside the with block a callable improve
this?
>>
>> It allows the ?if? statement to be part of the context manager
>>
>>
>>
>> ?  I think this code is easier to read than yours as the logic of
whether or not the do_stuff_once block is executed is where it belongs --
not hidden in the context manager.
>>
>> That is a matter of taste.  Maybe it would make sense to pull other
things out of the context manager too?  But taste should not, IMHO, limit
our options.  If this is a common idiom, locking and executing once, why
shouldn?t we be able to write a clever macro/context manger/syntactic sugar
to help us with that?  Why insist on this verbosity?
>>
>>
>>
>> Also, consider my argument that most context managers are written using
the @contextmanager paradigm.  We like this idiom so much because this is
what we really want to do, call the code from a wrapper function.
>>
>> If you look at the design of that context manager, it is not exactly
straightforward.  This suggests to me that maybe we took a wrong turn
deciding on a context manager design.  Maybe we should have selected one in
which this sort of coding is its native, natural, form, rather than having
this intermarriage kludge which turns an imperative-looking generator into
the traditional context manager.
>>
>>
>>
>>
>>
>> From: Bruce Leban [mailto:bruce at leapyear.org]
>> Sent: 21. okt?ber 2013 17:22
>> To: Kristj?n Valur J?nsson
>> Cc: python-ideas at python.org
>>
>>
>> Subject: Re: [Python-ideas] A different kind of context manager
>>
>>
>>
>>
>>
>> On Mon, Oct 21, 2013 at 6:55 AM, Kristj?n Valur J?nsson <
kristjan at ccpgames.com> wrote:
>>
>> So, If this is the way people like to think about context managers, like
writing wrapper functoins, why don?t we turn them into proper wrapper
functions?
>>
>> <...> The cool thing here though, is that "code" could, for example, be
run on a different tasklet.  Or a different thread.  Or a different
universe.
>>
>> Cool, sure. But what are the use cases that need this and can't be done
easily with the existing design?
>>
>>
>>>
>>> class NewContextManager(object):
>>>
>>>   # A context manager that locks a resource, then executes the code
only if it is not recursing
>>>   def __init__(self, lock):
>>>     self.lock = lock
>>>   def __contextcall__(self, code):
>>>     with lock:
>>>       if lock.active:
>>>         return  # This is where @contextmanager will stop you, you
can?t skip the ?yield?
>>>       lock.active = True
>>>       try:
>>>         return code(None) # optionally pass value to the code as in
"with foo() as X"
>>>       finally:
>>>         lock.active = False
>>
>>
>>
>> You can do that with current context managers:
>>
>>
>>>
>>> with lock_once(x) as lock_acquired:
>>>
>>>     if lock_acquired:  # was not already locked
>>>
>>>         do_stuff_once()
>>>
>>>
>>>
>>> @contextmanager
>>>
>>> def lock_once(lock):
>>>
>>>     if lock.active:
>>>
>>>         yield False
>>>
>>>     else:
>>>
>>>         lock.active = True
>>>
>>>         try:
>>>
>>>             yield True
>>>
>>>         finally:
>>>
>>>             lock.active = False
>>
>>
>>
>> Note that I'm mimicking your lock/unlock code which of course is not the
proper way to acquire/release a lock, but it gets the idea across. How
would making the code inside the with block a callable improve this? I
think this code is easier to read than yours as the logic of whether or not
the do_stuff_once block is executed is where it belongs -- not hidden in
the context manager. Note that my version also allows me to do this, which
I can't easily do with your context manager:
>>
>>
>>>
>>> with lock_once(x) as lock_acquired:
>>>
>>>     if lock_acquired:  # was not already locked
>>>
>>>         do_stuff_once()
>>>
>>>     else:
>>>
>>>         log('Lock %r was already acquired', x)
>>>
>>>     do_stuff_every_time()
>>
>>
>>
>>
>>
>> --- Bruce
>> I'm hiring: http://www.cadencemd.com/info/jobs
>>
>> Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
>>
>> Learn how hackers think: http://j.mp/gruyere-security
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/099538f1/attachment.html>

From antony.lee at berkeley.edu  Tue Oct 22 12:10:29 2013
From: antony.lee at berkeley.edu (Antony Lee)
Date: Tue, 22 Oct 2013 03:10:29 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
Message-ID: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>

As the issue has been (indirectly) raised again in the recent context
manager thread, I would like to propose yet another syntax for multiline
lambdas...  I know there has been many but I (foolishly?) hope this one is
simpler.  I do not wish to focus on the usual arguments for and against the
usefulness of such a construct (from the archives of the list, I feel that
the lack of an attractive syntax is the bigger barrier).

Specifically, I suggest that the "def" (and possibly the "class")
keyword(s) may be used in an expression context, if immediately surrounded
by parentheses.  The indentation of the body of the function is given by
the indentation of the first line after the def.

target = (def f(arg):
    <...>)

add_callback(
    def callback(arg):
        <...>)

callbacks = {
    "foo":
        (def foo():
            <...>), # note how the parentheses disambiguate where the last
comma belongs
    "bar":
        (def bar():
            <...>)
}

As an extra suggestion, one may want to allow not specifying a function
name in these contexts:

callbacks = {
    "foo":
        (def ():
            <...>)
}

(in which case the function would be named "<lambda>", of course) but I do
not think that this is a critical feature (and it could always be added
later).

Feel free to critique,

Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/a6a8df9a/attachment.html>

From pyideas at rebertia.com  Tue Oct 22 12:33:58 2013
From: pyideas at rebertia.com (Chris Rebert)
Date: Tue, 22 Oct 2013 03:33:58 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
Message-ID: <CAMZYqRSnSwCynFrVaMHqcvRk0F385oz=uoUXg=aFnzV+iUvZ0w@mail.gmail.com>

The obvious first step in assessing this particular proposal is to compare
it against
JSON (just-some-ordinary-notation),
CSV (callables-sequestered-as-values),
and XML (eXtended-multiline-lambdas).

*dodges rotten tomato*
--
This is (Monty) Python-ideas, right?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/dd1d518d/attachment.html>

From rosuav at gmail.com  Tue Oct 22 12:43:58 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 22 Oct 2013 21:43:58 +1100
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAMZYqRSnSwCynFrVaMHqcvRk0F385oz=uoUXg=aFnzV+iUvZ0w@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <CAMZYqRSnSwCynFrVaMHqcvRk0F385oz=uoUXg=aFnzV+iUvZ0w@mail.gmail.com>
Message-ID: <CAPTjJmoTtgcajGsc9wqS4AbQkZZAQqxv8GXsPEU8W2ZWLSB7hA@mail.gmail.com>

On Tue, Oct 22, 2013 at 9:33 PM, Chris Rebert <pyideas at rebertia.com> wrote:
> The obvious first step in assessing this particular proposal is to compare
> it against
> JSON (just-some-ordinary-notation),
> CSV (callables-sequestered-as-values),
> and XML (eXtended-multiline-lambdas).

The proposal doesn't matter, as long as it has the right TLA... or, as
in this case, ETLA (extended...). After all, there are only two
fundamentally difficult problems in computing: cache invalidation, and
naming things, and off-by-one errors. And nice red uniforms...

ChrisA

From steve at pearwood.info  Tue Oct 22 14:39:03 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 22 Oct 2013 23:39:03 +1100
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
Message-ID: <20131022123903.GL7989@ando>

On Tue, Oct 22, 2013 at 03:10:29AM -0700, Antony Lee wrote:

> Specifically, I suggest that the "def" (and possibly the "class")
> keyword(s) may be used in an expression context, if immediately surrounded
> by parentheses.  

I don't think there is any need to allow class in an expression, since 
we already have type().

> The indentation of the body of the function is given by
> the indentation of the first line after the def.

I presume that means you can't do this:

value = [1, 2, 3, (def spam(arg): do_this()
                        do_that()
                        ), 4, 5, 6]

I think I would be okay with that, since you can't do that with the def 
statement either:

py> def spam(): do_this()
...     do_that()
  File "<stdin>", line 2
    do_that()
    ^
IndentationError: unexpected indent


I suppose, like the statement version, a single-line function body could 
be inline:

value = [1, 2, 3, (def spam(arg): do_this()), 4, 5, 6]

which would then make lambda redundant.

In each of your examples, you have the body indented further to the 
right than the def. Is that mandatory, or would you allow something like 
this?

value = [1, 2, 3, (def spam(arg):
    do_this()
    do_that()
    if condition():
        do_something_else()
    ),
    4, 5, 6,
    ]

That is, relative to the def itself, the body is outdented.


I can't see any reason to justify prohibiting the above in the language, 
although I'd probably frown upon it in style-guides. I think that should 
be written as:

value = [1, 2, 3, 
         (def spam(arg):
              do_this()
              do_that()
              if condition():
                  do_something_else()
              ),
         4, 5, 6,
         ]

sort of thing. But I don't think that having the first indent be to the 
left of the def should be prohibited. However, that can lead to some 
pretty ugly, and misleading, constructions:

def spam(arg, func=(def ham(a,
                            x, 
                            y, 
                            z):
    fe(a)
    fi(x)
    fo(y)
    fum(z)
    ),
        ):
    ...


I'm not sure if there is a question buried in this, or just an 
observation. I hardly ever miss having multi-line lambdas, and I fear 
that they will make code harder to read and understand.


-- 
Steven

From flying-sheep at web.de  Tue Oct 22 15:33:38 2013
From: flying-sheep at web.de (Philipp A.)
Date: Tue, 22 Oct 2013 15:33:38 +0200
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <20131022123903.GL7989@ando>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando>
Message-ID: <CAN8d9gn2vqj-8+7Stpzhx2CETTMPAaMWyo5GkS-gNv0s7FVFsA@mail.gmail.com>

2013/10/22 Steven D?Aprano <steve at pearwood.info>

which would then make lambda redundant.

no, lambda still doesn?t need a return statement, returning the result of
its only expression.

so we could do

x = (def (a): a)assert x('b') == None

and

x = lambda a: aassert x('b') == 'b'

and about your ?misleading constructions?: the ability to put arguments of
function definitions on their own lines is already ugly.

we basically have no switch statement because it would be indented twice or
half, but we indent twice or half when defining functions all the time. and
if your default argument is a function, you?ll definitely want it to be a
simple one anyway or would use the tried

def a(f=None):
    if default is None:
        def f():
            ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/cbef2415/attachment-0001.html>

From abarnert at yahoo.com  Tue Oct 22 18:27:59 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 22 Oct 2013 09:27:59 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <20131022123903.GL7989@ando>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando>
Message-ID: <4357C0CF-FBF3-40F1-BE6E-7308677894CD@yahoo.com>

On Oct 22, 2013, at 5:39, Steven D'Aprano <steve at pearwood.info> wrote:

> On Tue, Oct 22, 2013 at 03:10:29AM -0700, Antony Lee wrote:
> 
>> Specifically, I suggest that the "def" (and possibly the "class")
>> keyword(s) may be used in an expression context, if immediately surrounded
>> by parentheses.  
> 
> I don't think there is any need to allow class in an expression, since 
> we already have type().

But type doesn't allow you to do most of what you can do in a class definition. This is like arguing that we don't need expression def because we already have types.FunctionType.

>> The indentation of the body of the function is given by
>> the indentation of the first line after the def.
> 
> I presume that means you can't do this:
> 
> value = [1, 2, 3, (def spam(arg): do_this()
>                        do_that()
>                        ), 4, 5, 6]
> 
> I think I would be okay with that, since you can't do that with the def 
> statement either:
> 
> py> def spam(): do_this()
> ...     do_that()
>  File "<stdin>", line 2
>    do_that()
>    ^
> IndentationError: unexpected indent
> 
> 
> I suppose, like the statement version, a single-line function body could 
> be inline:
> 
> value = [1, 2, 3, (def spam(arg): do_this()), 4, 5, 6]
> 
> which would then make lambda redundant.
> 
> In each of your examples, you have the body indented further to the 
> right than the def. Is that mandatory, or would you allow something like 
> this?
> 
> value = [1, 2, 3, (def spam(arg):
>    do_this()
>    do_that()
>    if condition():
>        do_something_else()
>    ),
>    4, 5, 6,
>    ]
> 
> That is, relative to the def itself, the body is outdented.
> 
> 
> I can't see any reason to justify prohibiting the above in the language, 
> although I'd probably frown upon it in style-guides. I think that should 
> be written as:
> 
> value = [1, 2, 3, 
>         (def spam(arg):
>              do_this()
>              do_that()
>              if condition():
>                  do_something_else()
>              ),
>         4, 5, 6,
>         ]
> 
> sort of thing. But I don't think that having the first indent be to the 
> left of the def should be prohibited. However, that can lead to some 
> pretty ugly, and misleading, constructions:
> 
> def spam(arg, func=(def ham(a,
>                            x, 
>                            y, 
>                            z):
>    fe(a)
>    fi(x)
>    fo(y)
>    fum(z)
>    ),
>        ):
>    ...
> 
> 
> I'm not sure if there is a question buried in this, or just an 
> observation. I hardly ever miss having multi-line lambdas, and I fear 
> that they will make code harder to read and understand.
> 
> 
> -- 
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas

From abarnert at yahoo.com  Tue Oct 22 18:52:31 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 22 Oct 2013 09:52:31 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
Message-ID: <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>

On Oct 22, 2013, at 3:10, Antony Lee <antony.lee at berkeley.edu> wrote:

> Specifically, I suggest that the "def" (and possibly the "class") keyword(s) may be used in an expression context, if immediately surrounded by parentheses.  The indentation of the body of the function is given by the indentation of the first line after the def.

I don't think this idea will fit well with Python's OO.

Partly this is based on JS experience, where you have to write .bind(this) all over the place because your callback--or, worse, some callback later in the chain--needs access to this. 

In Python, f.bind(this) is spelled types.MethodType(f, self, type(self)), and classes are used more than in JS rather than less. Especially since the paradigm cases for callbacks--network transports, GUI events, etc.--are also paradigm cases for objects--protocols, controllers, etc. At best this will lead to two completely different styles for doing things that are hard to tie together. At worst people will tie them together the same way they usually do in JS, leading to ugly circumlocutions where people bind self to other variables in closures (like the common "that = this" in JS), wrap functions in extra lambdas just to turn them into methods, or use partial to do the same, etc., to avoid the need to bind(this) at every step on the chain. And the fact that you can often rely on capturing self directly in a closure doesn't make things better, but worse, because it's not nearly often enough to count on.

In fact, I'd go so far as to say that not having inline def is part of the reason python server code is usually easier to read than node server code, despite node having an API not that far from Python frameworks like Twisted.

As a secondary concern, this would make it harder to partially parse Python the way IDEs and other code managing tools often do.

From rymg19 at gmail.com  Tue Oct 22 19:57:54 2013
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Tue, 22 Oct 2013 12:57:54 -0500
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
Message-ID: <CAO41-mMcgPHGo8NQp=B9UVg+xtnxAd9SM5riNTZCLVPtCBy90A@mail.gmail.com>

Looks like Lisp(Lots of Irritating Stupid Parenthesis). Hehehe.


On Tue, Oct 22, 2013 at 5:10 AM, Antony Lee <antony.lee at berkeley.edu> wrote:

> As the issue has been (indirectly) raised again in the recent context
> manager thread, I would like to propose yet another syntax for multiline
> lambdas...  I know there has been many but I (foolishly?) hope this one is
> simpler.  I do not wish to focus on the usual arguments for and against the
> usefulness of such a construct (from the archives of the list, I feel that
> the lack of an attractive syntax is the bigger barrier).
>
> Specifically, I suggest that the "def" (and possibly the "class")
> keyword(s) may be used in an expression context, if immediately surrounded
> by parentheses.  The indentation of the body of the function is given by
> the indentation of the first line after the def.
>
> target = (def f(arg):
>     <...>)
>
> add_callback(
>     def callback(arg):
>         <...>)
>
> callbacks = {
>     "foo":
>         (def foo():
>             <...>), # note how the parentheses disambiguate where the last
> comma belongs
>     "bar":
>         (def bar():
>             <...>)
> }
>
> As an extra suggestion, one may want to allow not specifying a function
> name in these contexts:
>
> callbacks = {
>     "foo":
>         (def ():
>             <...>)
> }
>
> (in which case the function would be named "<lambda>", of course) but I do
> not think that this is a critical feature (and it could always be added
> later).
>
> Feel free to critique,
>
> Antony
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/af331aea/attachment.html>

From steve at pearwood.info  Tue Oct 22 20:07:29 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 23 Oct 2013 05:07:29 +1100
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <4357C0CF-FBF3-40F1-BE6E-7308677894CD@yahoo.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando> <4357C0CF-FBF3-40F1-BE6E-7308677894CD@yahoo.com>
Message-ID: <20131022180729.GN7989@ando>

On Tue, Oct 22, 2013 at 09:27:59AM -0700, Andrew Barnert wrote:
> On Oct 22, 2013, at 5:39, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > On Tue, Oct 22, 2013 at 03:10:29AM -0700, Antony Lee wrote:
> > 
> >> Specifically, I suggest that the "def" (and possibly the "class")
> >> keyword(s) may be used in an expression context, if immediately surrounded
> >> by parentheses.  
> > 
> > I don't think there is any need to allow class in an expression, since 
> > we already have type().
> 
> But type doesn't allow you to do most of what you can do in a class definition. This is like arguing that we don't need expression def because we already have types.FunctionType.

Given multi-line lambda, what could you do in a class definition that 
you couldn't do with type?

class Spam(SpamBase, HamBase):
    x = 1
    def eggs(self, arg):
        return arg+self.x

classes = [int, str, Spam, float]


would become:


classes = [int, str, 
           type('Spam', (SpamBase, HamBase), 
                {'x': 1, 
                 'eggs': (def eggs(self, arg):
                              return arg+self.x
                              ),
                }
               ),
           float,
           ]


which I personally don't consider an improvement, but some people might. 
You could even handle metaclasses and extra arguments:

class Spam(SpamBase, metaclass=MetaSpam, extrakw="extra"):
    ...

becomes:

MetaSpam('Spam', (SpamBase,), {...}, extrakw="extra")

so given a def expression, I don't think we also need a class 
expression. What's missing?



-- 
Steven

From storchaka at gmail.com  Tue Oct 22 20:23:40 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 22 Oct 2013 21:23:40 +0300
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
Message-ID: <l46fr3$e3i$1@ger.gmane.org>

22.10.13 13:10, Antony Lee ???????(??):
> As the issue has been (indirectly) raised again in the recent context
> manager thread, I would like to propose yet another syntax for multiline
> lambdas...

Most of use cases for multiline lambdas actually don't needed multiline 
lambdas.

> target = (def f(arg):
>      <...>)

def target(arg):
     ...

> callbacks = {
>      "foo":
>          (def foo():
>              <...>), # note how the parentheses disambiguate where the
> last comma belongs
>      "bar":
>          (def bar():
>              <...>)
> }

def class_to_map(cls):
     return {n: f for n, f in cls.__dict__items() if n[0] != '_'}

@class_to_map
class callbacks:
     @staticmethod
     def foo():
         ...
     @staticmethod
     def bar():
         ...



From anntzer.lee at gmail.com  Tue Oct 22 21:00:16 2013
From: anntzer.lee at gmail.com (Antony Lee)
Date: Tue, 22 Oct 2013 12:00:16 -0700 (PDT)
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <20131022123903.GL7989@ando>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando>
Message-ID: <f7f7a40f-e4a2-467e-bb0a-1dc306a3e1da@googlegroups.com>


On Tuesday, October 22, 2013 5:39:03 AM UTC-7, Steven D'Aprano wrote:
>
> On Tue, Oct 22, 2013 at 03:10:29AM -0700, Antony Lee wrote: 
>
> > Specifically, I suggest that the "def" (and possibly the "class") 
> > keyword(s) may be used in an expression context, if immediately 
> surrounded 
> > by parentheses.   
>
> I don't think there is any need to allow class in an expression, since 
> we already have type(). 
>

I don't actually think that extending the "class" statement is that useful, 
but felt that given the similarity between class and def, I may as well 
mention it.  The main point here is really the "def" statement.

>
> > The indentation of the body of the function is given by 
> > the indentation of the first line after the def. 
>
> I presume that means you can't do this: 
>
> value = [1, 2, 3, (def spam(arg): do_this() 
>                         do_that() 
>                         ), 4, 5, 6] 
>
> I think I would be okay with that, since you can't do that with the def 
> statement either: 
>
> py> def spam(): do_this() 
> ...     do_that() 
>   File "<stdin>", line 2 
>     do_that() 
>     ^ 
> IndentationError: unexpected indent 
>

Yes, that is the intent. 

>
>
> I suppose, like the statement version, a single-line function body could 
> be inline: 
>
> value = [1, 2, 3, (def spam(arg): do_this()), 4, 5, 6] 
>
> which would then make lambda redundant. 
>
 
While TOOWTDI, I think it is somewhat agreed that lambda's syntax is a bit 
a kludge, so perhaps we should try to find a replacement.

>
> In each of your examples, you have the body indented further to the 
> right than the def. Is that mandatory, or would you allow something like 
> this? 
>
> value = [1, 2, 3, (def spam(arg): 
>     do_this() 
>     do_that() 
>     if condition(): 
>         do_something_else() 
>     ), 
>     4, 5, 6, 
>     ] 
>
> That is, relative to the def itself, the body is outdented. 
>

I would allow this on grounds of grammar simplicity (but perhaps strongly 
discourage it in a style guide). 

>
>
> I can't see any reason to justify prohibiting the above in the language, 
> although I'd probably frown upon it in style-guides. I think that should 
> be written as: 
>
> value = [1, 2, 3, 
>          (def spam(arg): 
>               do_this() 
>               do_that() 
>               if condition(): 
>                   do_something_else() 
>               ), 
>          4, 5, 6, 
>          ] 
>
> sort of thing. But I don't think that having the first indent be to the 
> left of the def should be prohibited. However, that can lead to some 
> pretty ugly, and misleading, constructions: 
>
> def spam(arg, func=(def ham(a, 
>                             x, 
>                             y, 
>                             z): 
>     fe(a) 
>     fi(x) 
>     fo(y) 
>     fum(z) 
>     ), 
>         ): 
>     ... 
>
>
> I'm not sure if there is a question buried in this, or just an 
> observation. I hardly ever miss having multi-line lambdas, and I fear 
> that they will make code harder to read and understand. 
>
>
> -- 
> Steven 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/8271f775/attachment.html>

From techtonik at gmail.com  Tue Oct 22 21:12:18 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Tue, 22 Oct 2013 22:12:18 +0300
Subject: [Python-ideas] 'from os.path import FILE,
 DIR' or internal structure of filenames
In-Reply-To: <CADiSq7eS9AYqUJ=z=T4ypU4Xys=hCbAnbv0UR9vc0NMFGartpw@mail.gmail.com>
References: <CAPkN8xL4AofRrB-LE9HnZ1wnmRxCRU278pcBhqoT=T_z-98kdA@mail.gmail.com>
 <20130928232641.GA15985@cskk.homeip.net>
 <CADiSq7fyxBEAm48KMqf6=v_Ud565Z1rQdztucObzui5RKb3_mw@mail.gmail.com>
 <CAEfz+TxNd1YxZbC6BdPU4c9_91xjL=V-5Q96CGW8NJhe-mMJjw@mail.gmail.com>
 <CADiSq7fJa5J5SFm-QYZ=ZH-bvs=iR0UTPJXyJbDfT3pXSRg98g@mail.gmail.com>
 <CAPkN8xKiigOomcenMeOfHvT7bNE1KfxSmNEeT2-Qq-x_XZ8USA@mail.gmail.com>
 <l29u5h$k0j$1@ger.gmane.org>
 <CADiSq7eS9AYqUJ=z=T4ypU4Xys=hCbAnbv0UR9vc0NMFGartpw@mail.gmail.com>
Message-ID: <CAPkN8xLyv2gMowboijDckw30ehPh-42PeFQCCFzo5QUmMJSYcA@mail.gmail.com>

On Mon, Sep 30, 2013 at 1:18 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> On 30 Sep 2013 05:14, "Terry Reedy" <tjreedy at udel.edu> wrote:
>>
>> On 9/29/2013 1:36 PM, anatoly techtonik wrote:
>>>
>>> On Sun, Sep 29, 2013 at 9:15 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>>>
>>>> On 29 September 2013 13:28, INADA Naoki <songofacandy at gmail.com> wrote:
>>>>>
>>>>> os.path.abspath(__file__) returns wrong path after chdir.
>>>>> So I don't think abspath of module can be trivially and reliably
>>>>> derived
>>>>> from existing values.
>>>>
>>>>
>>>> Hence the part about any remaining instances of non-absolute __file__
>>>> values being considered a bug in the import system.
>>>
>>>
>>> Bug that will not be fixed, i.e. a wart.
>>
>>
>> Nick said "we tend not to fix them in maintenance releases", which I take
>> to mean we can fix in new versions.
>
> Correct, it's the kind of arguably backwards incompatible bug fix that users
> will generally tolerate in a feature release but would be justifiably upset
> about in a maintenance release.
>
> Cheers,
> Nick.
>
>>
>>
>>> And as a result we don't have a way to reliably reference filename
>>> of the current script and its directory. Hence the proposal.
>>
>>
>> The proposed addition would not happen in maintenance releases either.

Python 3.4?
--
anatoly t.

From anntzer.lee at gmail.com  Tue Oct 22 21:13:54 2013
From: anntzer.lee at gmail.com (Antony Lee)
Date: Tue, 22 Oct 2013 12:13:54 -0700 (PDT)
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
Message-ID: <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>

I don't have any experience with JS, so perhaps I don't understand the 
issues well... but if you define a callback in a method, you don't need to 
call "bind" as "self" will be captured as a closure variable anyways (in 
particular, can you explain what you mean by "this would make things 
worse"?).  Also, if you really need to bind the first variable and don't 
want to use partial, you can also use __get__, which exactly creates bound 
methods: (lambda x: x).__get__(1)() ==> 1.
Also, the syntax (in particular the extra parentheses) was specifically 
chosen to make parsing relatively easy (avoiding awkward issues of "when 
does the 'def' end?").

PS: I'm using the Google Groups interface and feel like I'm making a mess 
of the email layouts.  Any hints on whether this can be avoided -- or do I 
have to go through mailman instead?

Antony

On Tuesday, October 22, 2013 9:52:31 AM UTC-7, Andrew Barnert wrote:
>
> On Oct 22, 2013, at 3:10, Antony Lee <anton... at berkeley.edu <javascript:>> 
> wrote: 
>
> > Specifically, I suggest that the "def" (and possibly the "class") 
> keyword(s) may be used in an expression context, if immediately surrounded 
> by parentheses.  The indentation of the body of the function is given by 
> the indentation of the first line after the def. 
>
> I don't think this idea will fit well with Python's OO. 
>
> Partly this is based on JS experience, where you have to write .bind(this) 
> all over the place because your callback--or, worse, some callback later in 
> the chain--needs access to this. 
>
> In Python, f.bind(this) is spelled types.MethodType(f, self, type(self)), 
> and classes are used more than in JS rather than less. Especially since the 
> paradigm cases for callbacks--network transports, GUI events, etc.--are 
> also paradigm cases for objects--protocols, controllers, etc. At best this 
> will lead to two completely different styles for doing things that are hard 
> to tie together. At worst people will tie them together the same way they 
> usually do in JS, leading to ugly circumlocutions where people bind self to 
> other variables in closures (like the common "that = this" in JS), wrap 
> functions in extra lambdas just to turn them into methods, or use partial 
> to do the same, etc., to avoid the need to bind(this) at every step on the 
> chain. And the fact that you can often rely on capturing self directly in a 
> closure doesn't make things better, but worse, because it's not nearly 
> often enough to count on. 
>
> In fact, I'd go so far as to say that not having inline def is part of the 
> reason python server code is usually easier to read than node server code, 
> despite node having an API not that far from Python frameworks like 
> Twisted. 
>
> As a secondary concern, this would make it harder to partially parse 
> Python the way IDEs and other code managing tools often do. 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131022/d893d5c0/attachment-0001.html>

From abarnert at yahoo.com  Wed Oct 23 09:53:13 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 23 Oct 2013 00:53:13 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <20131022180729.GN7989@ando>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando> <4357C0CF-FBF3-40F1-BE6E-7308677894CD@yahoo.com>
 <20131022180729.GN7989@ando>
Message-ID: <96FA203F-E350-4B4A-B50B-089C92ECCD2B@yahoo.com>

On Oct 22, 2013, at 11:07, Steven D'Aprano <steve at pearwood.info> wrote:

> On Tue, Oct 22, 2013 at 09:27:59AM -0700, Andrew Barnert wrote:
>> On Oct 22, 2013, at 5:39, Steven D'Aprano <steve at pearwood.info> wrote:
>> 
>>> On Tue, Oct 22, 2013 at 03:10:29AM -0700, Antony Lee wrote:
>>> 
>>>> Specifically, I suggest that the "def" (and possibly the "class")
>>>> keyword(s) may be used in an expression context, if immediately surrounded
>>>> by parentheses.  
>>> 
>>> I don't think there is any need to allow class in an expression, since 
>>> we already have type().
>> 
>> But type doesn't allow you to do most of what you can do in a class definition. This is like arguing that we don't need expression def because we already have types.FunctionType.
> 
> Given multi-line lambda, what could you do in a class definition that 
> you couldn't do with type?

Write code that's somewhat readable and looks somewhat like Python?

> class Spam(SpamBase, HamBase):
>    x = 1
>    def eggs(self, arg):
>        return arg+self.x
> 
> classes = [int, str, Spam, float]
> 
> 
> would become:
> 
> 
> classes = [int, str, 
>           type('Spam', (SpamBase, HamBase), 
>                {'x': 1, 
>                 'eggs': (def eggs(self, arg):
>                              return arg+self.x
>                              ),
>                }
>               ),
>           float,
>           ]
> 
> 
> which I personally don't consider an improvement, but some people might. 

Anyone who finds that an improvement clearly would rather be using a different language than python in the first place.

> You could even handle metaclasses and extra arguments:
> 
> class Spam(SpamBase, metaclass=MetaSpam, extrakw="extra"):
>    ...
> 
> becomes:
> 
> MetaSpam('Spam', (SpamBase,), {...}, extrakw="extra")
> 
> so given a def expression, I don't think we also need a class 
> expression. What's missing?
> 
> 
> 
> -- 
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas

From abarnert at yahoo.com  Wed Oct 23 09:57:46 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 23 Oct 2013 00:57:46 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
Message-ID: <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>

On Oct 22, 2013, at 12:13, Antony Lee <anntzer.lee at gmail.com> wrote:

> Also, if you really need to bind the first variable and don't want to use partial, you can also use __get__, which exactly creates bound methods: (lambda x: x).__get__(1)() ==> 1.

Do you think this is something that most Python users would understand today, or do you think Python would be a better language if it were a recognized idiom?

I'll answer the other part when I'm in front of a computer.

From steve at pearwood.info  Wed Oct 23 14:50:22 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 23 Oct 2013 23:50:22 +1100
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <96FA203F-E350-4B4A-B50B-089C92ECCD2B@yahoo.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando> <4357C0CF-FBF3-40F1-BE6E-7308677894CD@yahoo.com>
 <20131022180729.GN7989@ando> <96FA203F-E350-4B4A-B50B-089C92ECCD2B@yahoo.com>
Message-ID: <20131023125021.GQ7989@ando>

On Wed, Oct 23, 2013 at 12:53:13AM -0700, Andrew Barnert wrote:

> >> But type doesn't allow you to do most of what you can do in a class 
> >> definition. This is like arguing that we don't need expression def 
> >> because we already have types.FunctionType.
> > 
> > Given multi-line lambda, what could you do in a class definition that 
> > you couldn't do with type?
> 
> Write code that's somewhat readable and looks somewhat like Python?

No, that's not it. You can certainly write somewhat readable code that 
looks like Python using type. But definition, calls to type() look like 
Python code, because they *are* Python code.

So I'm still curious as to what you can do in a class statement that 
couldn't be done in a call to type. The only thing I can think of is 
that class statement introduces a new scope, while type does not, so you 
can compute class attributes without affecting names in the enclosing 
scope:

x = 42
y = 23

class Spam:
    x = "something"
    y = x.upper() + "!"

assert x == 42 and y == 23


In the most general case, this would be rather tricky using type, 
without introducing extraneous variables. But I don't see this as a real 
problem for the suggestion.

Nobody has come up with any major problems with this suggestion other 
than (1) it isn't strictly necessary, and (2) non-trivial 
function-expressions don't look very nice. I'm afraid I'm no different: 
although I can't find anything obviously wrong with the suggestion, nor 
can I develop much enthusiasm for it. I haven't used Ruby enough to 
really see the big deal for anonymous multi-line code blocks.

I will say one thing: I do like the fact that this suggestion gives the 
functions a name, rather than making them anonymous. One disadvantage of 
lambda is that the lack of name makes it harder to debug them. That's 
not a problem in this case.

So...

+0 on named def expressions; 

-0 on anonymous def expressions;

-1 on class expressions.



-- 
Steven

From kristjan at ccpgames.com  Wed Oct 23 16:55:19 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 23 Oct 2013 14:55:19 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
Message-ID: <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>

Thanks for your detailed reply, Nick.
Good to see that I am not completely bonkers and that swimming entirely alone against the flow.
I realise of course that we are Python and not Ruby (I got to learn Ruby before Python, btw) and that this is not particularly likely to come to anything.

But remember how Python retro-fitted Ruby's object model into its "new-style" classes?  Ruby was written that way all along.  (CS buffs out there will likely point out to me that this was not an original Matz invention).
Perhaps, with persistent dripping, we can slowly hollow the proverbial stone.

Cheers,

K

From: Nick Coghlan [mailto:ncoghlan at gmail.com]
Sent: 22. okt?ber 2013 03:32
To: Haoyi Li
Cc: python-ideas at python.org; Kristj?n Valur J?nsson
Subject: Re: [Python-ideas] A different kind of context manager


On 22 Oct 2013 07:27, "Haoyi Li" <haoyi.sg at gmail.com<mailto:haoyi.sg at gmail.com>> wrote:
>
> > Maybe we should have selected one in which this sort of coding is its native, natural, form, rather than having this intermarriage kludge which turns an imperative-looking generator into the traditional context manager.
>
> I agree with this 100%. Unfortunately, Python picked the current style of `with` statements a while ago, and this would be a pretty huge change/additional feature.
>
> FWIW, other languages with easy anonymous functions (e.g. ruby, scala) have this, and it does provide all the benefits that you describe. For example, spinning off parallel tasklets inline is just a matter of `async{ ... }` where `async` is just a context manager.

It's not a coincidence that Ruby (at least - I don't know scala) just treats for loops and context management as special cases of anonymous callbacks - the latter is a powerful, more general construct.

By contrast, Python chose the path of providing dedicated syntax for both iteration *and* context management, and hence requires that callbacks that don't fit in a single expression be defined prior to use.

I think this makes those constructs easier to understand in many ways, but it *also* means that we *don't* currently have a clean syntax for single use callbacks.

Hence the time I've put into PEP 403 and 3150 over the years - a key objective for both of them is providing a cleaner solution for the problem of single use callbacks (including those that modify local variables of the containing function).

In addition to scoping, the other problem single use callbacks need to handle sensibly is the behaviour of the simple flow control statements: return, yield, break, continue, and raise.

Building single use callbacks on top of the "def" statement has the advantage of *not* needing to define any new scoping or control flow semantics (as they're just ordinary nested scopes).

Defining them any other way makes things far more complicated. It would certainly be close to impossible to repurpose any of the other existing compound statements without breaking backwards compatibility.

A completely new keyword is also a possibility, but then it's necessary to find a good one, and explain it's use cases in a fashion similar to PEP 403.

Cheers,


>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org<mailto:Python-ideas at python.org>
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131023/03e00bba/attachment.html>

From kristjan at ccpgames.com  Wed Oct 23 16:57:52 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 23 Oct 2013 14:57:52 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <52659C0D.1070308@stoneleaf.us>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <52659C0D.1070308@stoneleaf.us>
Message-ID: <EFE3877620384242A686D52278B7CCD3813729A8@RKV-IT-EXCH104.ccp.ad.local>



> -----Original Message-----
> Kristj?n,
> 
> Your replies would be much easier to read if trimmed the previous email.
> 
> Thanks.

Yes.  I realise that I am sometimes not adhering to form with my replies.  That's when I'm working from a bad email client.  I apologise.
K


From kristjan at ccpgames.com  Wed Oct 23 16:47:45 2013
From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=)
Date: Wed, 23 Oct 2013 14:47:45 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <3402edcf-1e4e-41f4-be25-f097f89cad07@googlegroups.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <3402edcf-1e4e-41f4-be25-f097f89cad07@googlegroups.com>
Message-ID: <EFE3877620384242A686D52278B7CCD38137293A@RKV-IT-EXCH104.ccp.ad.local>

Nice suggestion.  And I can also do:
  def _():
    do_stuff()
  result = execute_subprocess(_)

i.e. just do it manually, rather than with the decorator.

Your suggestion, though, is actually not such a bad pattern, but it has a few drawbacks:

1)      You need the ?nonlocal? qualifier to pass values out of it

2)      It has the side effect of setting _

3)      It is  a bit non-intuitive, particularly when decorators start taking arguments.  When is the decorator run?  This is not always immediately clear.  Well, it is simpler than a regular decorator, since it will invoke the target function itself?

4)      The syntax is not nice.

Decorators themselves were invented as syntactic sugar to get rid of the
def foo():
  ?
foo = bar()

pattern.  Maybe I should revise my suggestion then?  A new syntax that does the above, i.e.:

new_with cm as bar:  # or whatever keyword is deemed appropriate.
   do_stuff()

compiles to:

@cm
def _( _bar):
  pragma(?nonlocal?, 1) # moves binding one step upwards
  bar = _bar
  do_stuff()

But with _ and _bar magically hidden.

The only thing really needed, then is the support for ?pragma(?nonlocal?, 1)?  or an equivalent way of changing the default binding of variables, and compiler magic for syntax.

K



From: Antony Lee [mailto:anntzer.lee at gmail.com]
Sent: 22. okt?ber 2013 00:18
To: python-ideas at googlegroups.com
Cc: python-ideas at python.org; Kristj?n Valur J?nsson
Subject: Re: [Python-ideas] A different kind of context manager

You can get the desired behavior by (ab)using function decorators, by rewriting
with as_subprocess():
    do_stuff()
as

@as_subprocess
def _():
   <do stuff>

Yes, it's not very elegant syntactically but gets the work done (and this technique is generalizable to most uses of Ruby-style blocks, I believe).

Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131023/9a137166/attachment-0001.html>

From kristjan at ccpgames.com  Wed Oct 23 16:51:05 2013
From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=)
Date: Wed, 23 Oct 2013 14:51:05 +0000
Subject: [Python-ideas] A different kind of context manager
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <3402edcf-1e4e-41f4-be25-f097f89cad07@googlegroups.com> 
Message-ID: <EFE3877620384242A686D52278B7CCD381372961@RKV-IT-EXCH104.ccp.ad.local>

Ah, I forgot about the changed flow control  with ?return?, ?break?, ?continue?.
The decorator way does not allow that.  A special ?block? callable would be needed, with special return opcodes.
K

From: Kristj?n Valur J?nsson
Sent: 23. okt?ber 2013 14:48
To: 'Antony Lee'; python-ideas at googlegroups.com
Cc: python-ideas at python.org
Subject: RE: [Python-ideas] A different kind of context manager

Nice suggestion.  And I can also do:
  def _():
    do_stuff()
  result = execute_subprocess(_)

i.e. just do it manually, rather than with the decorator.

Your suggestion, though, is actually not such a bad pattern, but it has a few drawbacks:

1)      You need the ?nonlocal? qualifier to pass values out of it

2)      It has the side effect of setting _

3)      It is  a bit non-intuitive, particularly when decorators start taking arguments.  When is the decorator run?  This is not always immediately clear.  Well, it is simpler than a regular decorator, since it will invoke the target function itself?

4)      The syntax is not nice.

Decorators themselves were invented as syntactic sugar to get rid of the
def foo():
  ?
foo = bar()

pattern.  Maybe I should revise my suggestion then?  A new syntax that does the above, i.e.:

new_with cm as bar:  # or whatever keyword is deemed appropriate.
   do_stuff()

compiles to:
@cm
def _( _bar):
  pragma(?nonlocal?, 1) # moves binding one step upwards
  bar = _bar
  do_stuff()

But with _ and _bar magically hidden.

The only thing really needed, then is the support for ?pragma(?nonlocal?, 1)?  or an equivalent way of changing the default binding of variables, and compiler magic for syntax.

K



From: Antony Lee [mailto:anntzer.lee at gmail.com]
Sent: 22. okt?ber 2013 00:18
To: python-ideas at googlegroups.com<mailto:python-ideas at googlegroups.com>
Cc: python-ideas at python.org<mailto:python-ideas at python.org>; Kristj?n Valur J?nsson
Subject: Re: [Python-ideas] A different kind of context manager

You can get the desired behavior by (ab)using function decorators, by rewriting
with as_subprocess():
    do_stuff()
as

@as_subprocess
def _():
   <do stuff>

Yes, it's not very elegant syntactically but gets the work done (and this technique is generalizable to most uses of Ruby-style blocks, I believe).

Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131023/4abe3217/attachment.html>

From kristjan at ccpgames.com  Wed Oct 23 16:24:54 2013
From: kristjan at ccpgames.com (=?utf-8?B?S3Jpc3Rqw6FuIFZhbHVyIErDs25zc29u?=)
Date: Wed, 23 Oct 2013 14:24:54 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EE4BC259-BEB4-4FB7-A693-B289D8EC6183@yahoo.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAPTjJmrVEixFKm=aDdS4zwfc00aug-ekD3b=24ZtY8OXdcSFVw@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136CC6D@RKV-IT-EXCH104.ccp.ad.local>
 <0C56C125-F3EB-4302-8B82-F75972EDB34C@yahoo.com>
 <EFE3877620384242A686D52278B7CCD38136D10E@RKV-IT-EXCH104.ccp.ad.local>
 <EE4BC259-BEB4-4FB7-A693-B289D8EC6183@yahoo.com>
Message-ID: <EFE3877620384242A686D52278B7CCD3813728EB@RKV-IT-EXCH104.ccp.ad.local>

Syntax semantics stay the same.

?  It's either one or the other: either every variable is implicitly nonlocal whether you want it to be or not, or every variable is implicitly local and you have to nonlocal them to perform common context manager idioms.
?Local? or ?nonlocal? only has meaning within a function definition.  one that starts with ?def function():?
I?m not suggesting new syntax. The code remains just a regular block, inside the ?with? keyword and the variables there have the same binding as if the ?with? statement were removed.  You are not defining a function, but the compiler _is_ producing a new kind of callable.  A ?block? object, perhaps.  It might not _need_ to be technically a new kind of callable, perhaps such a block is implementable within the existing ?function? type.  But that is merely an implementation detail.

My proposal thus has no changes on syntax, merely on how the block is invoked.  I suggest the code block be invoked explicitly by a ?new-style context manager? rather than implicitly by the interpreter, inside the frame of __enter__/__exit__

K


From: Andrew Barnert [mailto:abarnert at yahoo.com]
Sent: 21. okt?ber 2013 23:37
To: Kristj?n Valur J?nsson
Cc: Chris Angelico; python-ideas
Subject: Re: [Python-ideas] A different kind of context manager

Obviously semantics don't stay exactly the same or there would be no benefits to the change. The whole point is that you're creating a function with closure and visibly passing it to a method of the context manager.

It's either one or the other: either every variable is implicitly nonlocal whether you want it to be or not, or every variable is implicitly local and you have to nonlocal them to perform common context manager idioms.

Compare with comprehensions. Changing them to use functions under the covers had no effect (other than breaking a rare use case with StopIteration passing, which I believe has been fixed), but that's only because the comprehension variable(s) were already explicitly prevented from replacing existing bindings. (The fact that there's no way to explicitly bind a variable in a comprehension helps too--no potential surprised about what "x=2" might do when statements aren't allowed in the first place.) That's obviously not true for with statements.

If you think that every variable being implicitly nonlocal is a good thing, that's certainly arguable (maybe no existing code would ever notice the difference, and new code that did wouldn't be surprised by it?), but only of you make that case instead of trying to argue that there isn't an issue in the first place.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131023/2b53f170/attachment-0001.html>

From masklinn at masklinn.net  Wed Oct 23 17:17:14 2013
From: masklinn at masklinn.net (Masklinn)
Date: Wed, 23 Oct 2013 17:17:14 +0200
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>

On 2013-10-23, at 16:55 , Kristj?n Valur J?nsson wrote:

> (CS buffs out there will likely point out to me that this was not an original Matz invention).

You don't need CS buffs to point it out, it was an implementation detail
leaking into semantic incompatibility between types implemented in C and
classes implemented in Python:
http://python-history.blogspot.be/2010/06/new-style-classes.html

Python was fairly unique in having this dichotomy between built-in and
user-defined types[0].

[0] but not anymore, Go has repeated this mistake, amongst others.

From ron3200 at gmail.com  Wed Oct 23 21:10:54 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Wed, 23 Oct 2013 14:10:54 -0500
Subject: [Python-ideas] Mini language prototype... possibilities?
Message-ID: <l496vl$ikr$1@ger.gmane.org>


There was some interest here in the little language experiment I mentioned 
in another thread, so I'm posting a google project link here for those who 
might be interested in taking a look at it.

      http://code.google.com/p/metap/

It's just a toy at this stage written in python, but some people here might 
be interested in playing around with it.

I don't think we should take up too much space on this board until there is 
at least a proof of concept for anything derived from it.  It's way too 
general and preliminary an idea to be a serious suggestion at this stage. 
It's ok to email me directly.


Here's what it's about in general...

The idea that prompted this is my feelings that pythons byte code could be 
a bit higher level, and that there may be some advantages to doing that.

The experiment was to see what it might look like if instead of a stack 
based byte code, it was a list based object code.

What I ended up with is very near scheme with better support for 
programming in a more explicit imperative style (more like python), while 
still keeping both the parser and eval loop very small and simple.


Example python code ...

 >>> def fact(n):
...     r = n
...     while n > 1:
...         n -= 1
...         r *= n
...     return r
...
 >>> fact(3)
6


The byte code ...

 >>> dis.dis(fact)
   2           0 LOAD_FAST                0 (n)
               3 STORE_FAST               1 (r)

   3           6 SETUP_LOOP              36 (to 45)
         >>    9 LOAD_FAST                0 (n)
              12 LOAD_CONST               1 (1)
              15 COMPARE_OP               4 (>)
              18 POP_JUMP_IF_FALSE       44

   4          21 LOAD_FAST                0 (n)
              24 LOAD_CONST               1 (1)
              27 INPLACE_SUBTRACT
              28 STORE_FAST               0 (n)

   5          31 LOAD_FAST                1 (r)
              34 LOAD_FAST                0 (n)
              37 INPLACE_MULTIPLY
              38 STORE_FAST               1 (r)
              41 JUMP_ABSOLUTE            9
         >>   44 POP_BLOCK

   6     >>   45 LOAD_FAST                1 (r)
              48 RETURN_VALUE


That's pretty low level.  If we go to a list based model where instructions 
are nested lists of symbols and names, and give it better access to the 
frames name space, it may look like the following.


Example list based code...

        1           DEF "fact n"
        3           [
        3.1           LET r n
        3.4           LOOP
        3.5           [
        3.5.1           IF (> n 1)
        3.5.3           [
        3.5.3.1            BREAK
                        ]
        3.5.4           DEC n
        3.5.6           LET n (* r n)
                      ]
        4             RETURN r
                    ]


That's almost python, but not quite.  (Seems like a good thing to me.)

It's still very simple and easy to parse by an eval loop. It's just nested 
sequences of symbols and names.

As it turns out, this simple language would be right between python and 
scheme.  Scheme code can be compiled to efficient C code on a wide variety 
of platforms including android, and i-phone.  I think that may offer some 
interesting possibilities.

I just don't want to write applications in scheme.  ;-)

Cheers,
    Ron




From abarnert at yahoo.com  Thu Oct 24 04:19:55 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 23 Oct 2013 19:19:55 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <20131023125021.GQ7989@ando>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <20131022123903.GL7989@ando> <4357C0CF-FBF3-40F1-BE6E-7308677894CD@yahoo.com>
 <20131022180729.GN7989@ando> <96FA203F-E350-4B4A-B50B-089C92ECCD2B@yahoo.com>
 <20131023125021.GQ7989@ando>
Message-ID: <98C09CD8-4E47-4F79-8271-C8D0EB7489F6@yahoo.com>

On Oct 23, 2013, at 5:50, Steven D'Aprano <steve at pearwood.info> wrote:

>> 
>> Write code that's somewhat readable and looks somewhat like Python?
> 
> No, that's not it. You can certainly write somewhat readable code that 
> looks like Python using type. But definition, calls to type() look like 
> Python code, because they *are* Python code.

Not everything that is executable as python looks like python. That's why we have the word "pythonic". Defining a class by building a dict to pass to the type function is not the way you define classes in Python, except in uncommon cases where you need to create classes based on dynamic information. Suggesting that it should be the idiomatic way of creating "local" or "inline" classes is like suggesting that exec('='.join(map(chr,range(97,40,-47)))) or locals().update(dict([map(chr,range(97,40,-47))])) or something should be the idiomatic way to assign 2 to x in some context. (After all, the only semantic difference is a minor scope-related issue--if this is the only assignment to x in a scope the compiler won't be able to tell that it's a local variable.)

Unless this is a roundabout way of arguing that there shouldn't be an idiomatic way to define inline classes in the first place?

From anntzer.lee at gmail.com  Thu Oct 24 07:02:25 2013
From: anntzer.lee at gmail.com (Antony Lee)
Date: Wed, 23 Oct 2013 22:02:25 -0700 (PDT)
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
Message-ID: <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>

I'm sorry I raised the inline class idea, which is distracting the main 
issue here -- inline functions (again, all I wanted to do was to point out 
the syntactic similarity).

I certainly don't think that creating bound methods through __get__ is a 
particularly recognized idiom (although one may note that descriptors were 
(I believe) introduced exactly for that purpose); on the other hand as I 
mentioned earlier I don't really see the need for binding lambdas to 
objects in the first place.

Antony

On Wednesday, October 23, 2013 12:57:46 AM UTC-7, Andrew Barnert wrote:
>
> On Oct 22, 2013, at 12:13, Antony Lee <anntz... at gmail.com <javascript:>> 
> wrote: 
>
> > Also, if you really need to bind the first variable and don't want to 
> use partial, you can also use __get__, which exactly creates bound methods: 
> (lambda x: x).__get__(1)() ==> 1. 
>
> Do you think this is something that most Python users would understand 
> today, or do you think Python would be a better language if it were a 
> recognized idiom? 
>
> I'll answer the other part when I'm in front of a computer. 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131023/4870a648/attachment.html>

From stephen at xemacs.org  Thu Oct 24 12:10:04 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 24 Oct 2013 19:10:04 +0900
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
 <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
Message-ID: <871u3bq8sj.fsf@xemacs.org>

Antony Lee writes:

 > as I mentioned earlier I don't really see the need

It's not a *need*, it's a *style*.  Ie, it's just the way Pythonistas
do things.

Many fledgling Pythonistas, especially those who come from other
languages, hanker after anonymous functions or Ruby blocks (a somewhat
different concept AFAICT), and some experienced Pythonistas sympathize
with them.  But there's no real reason why some functions shouldn't
have a name, and Guido (the ultimate authority on "Pythonic") isn't a
fan of lambda, so (unless a pressing need or a really good syntax
appears) there seems to be little chance of the existing feature
(expressions as lambdas) being extended.

 > for binding lambdas to objects in the first place.

I think you've misspoken here.  Lambdas *are* objects, and that's why
names can be bound to them (and then they're called "functions").
What people complain about is the fact that the normal way to create a
lambda (callable function object or something like that) is "def",
which also binds a name to the object.  They think that is wasteful or
something.

Steve

From kristjan at ccpgames.com  Thu Oct 24 12:36:02 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Thu, 24 Oct 2013 10:36:02 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
 <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
Message-ID: <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>



> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-
> bounces+kristjan=ccpgames.com at python.org] On Behalf Of Masklinn
> Sent: 23. okt?ber 2013 15:17
> To: python-ideas ideas
> Subject: Re: [Python-ideas] A different kind of context manager
> > (CS buffs out there will likely point out to me that this was not an original
> Matz invention).
> 
> You don't need CS buffs to point it out, it was an implementation detail
> leaking into semantic incompatibility between types implemented in C and
> classes implemented in Python:
> http://python-history.blogspot.be/2010/06/new-style-classes.html
> 
> Python was fairly unique in having this dichotomy between built-in and user-
> defined types[0].
> 
> [0] but not anymore, Go has repeated this mistake, amongst others.

That's not what I was referring to, rather the class model that blew my mind when learing Ruby back in 2000.  I'm not a CS, so this was new to me (knowing OOP only from C++):
- Classes, metaclasses, methor resolution order.
- All classes are subclasses of "object"
- object itself is an instance of "type",  type being a "metaclass".
- type, too, is a subclass of object.  Type is its own metaclass, so "type" is an instance of "type".

Ruby was designed with this model in mind, it only arrived later into Python.

K


From guido at python.org  Thu Oct 24 17:26:15 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Oct 2013 08:26:15 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
 <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
 <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CAP7+vJJ7fpWtZ6Ze91qeeSD0rEHFoaDwz19+pt1T68jLVUXsEQ@mail.gmail.com>

On Thu, Oct 24, 2013 at 3:36 AM, Kristj?n Valur J?nsson <
kristjan at ccpgames.com> wrote:

>
>
> > -----Original Message-----
> > From: Python-ideas [mailto:python-ideas-
> > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Masklinn
> > Sent: 23. okt?ber 2013 15:17
> > To: python-ideas ideas
> > Subject: Re: [Python-ideas] A different kind of context manager
> > > (CS buffs out there will likely point out to me that this was not an
> original
> > Matz invention).
> >
> > You don't need CS buffs to point it out, it was an implementation detail
> > leaking into semantic incompatibility between types implemented in C and
> > classes implemented in Python:
> > http://python-history.blogspot.be/2010/06/new-style-classes.html
> >
> > Python was fairly unique in having this dichotomy between built-in and
> user-
> > defined types[0].
> >
> > [0] but not anymore, Go has repeated this mistake, amongst others.
>
> That's not what I was referring to, rather the class model that blew my
> mind when learing Ruby back in 2000.  I'm not a CS, so this was new to me
> (knowing OOP only from C++):
> - Classes, metaclasses, methor resolution order.
> - All classes are subclasses of "object"
> - object itself is an instance of "type",  type being a "metaclass".
> - type, too, is a subclass of object.  Type is its own metaclass, so
> "type" is an instance of "type".
>
> Ruby was designed with this model in mind, it only arrived later into
> Python.
>

 Are you sure? I wrote about metaclasses in Python in 1998:
http://www.python.org/doc/essays/metaclasses/

New-style classes were just the second or third iteration of the idea.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131024/e35cce24/attachment.html>

From kristjan at ccpgames.com  Thu Oct 24 17:59:10 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Thu, 24 Oct 2013 15:59:10 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <CAP7+vJJ7fpWtZ6Ze91qeeSD0rEHFoaDwz19+pt1T68jLVUXsEQ@mail.gmail.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
 <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
 <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>
 <CAP7+vJJ7fpWtZ6Ze91qeeSD0rEHFoaDwz19+pt1T68jLVUXsEQ@mail.gmail.com>
Message-ID: <EFE3877620384242A686D52278B7CCD381373C58@RKV-IT-EXCH104.ccp.ad.local>

I'm not sure about anything :).  In particular, I don't know where Ruby's object model originates.
And Ruby 1.0 came out in 1996.
I'm sure that the model of "object" and "type" (or other equivalent names) is older, though.  Could be a simplification of
Smalltalk's object model, for example.   Well, looking this up, this is what Wikipedia says, in fact.
But I recall someone, somewhere, mentioning that this system is based on a proper Paper by someone :)
But Python and Ruby's models are quite similar in structure.
I don't know if Python's new-style classes were inspired by Ruby or not, perhaps it is a case of
convergent evolution.

Cheers,
K

From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of Guido van Rossum
Sent: 24. okt?ber 2013 15:26
To: Kristj?n Valur J?nsson
Cc: python-ideas ideas
Subject: Re: [Python-ideas] A different kind of context manager

On Thu, Oct 24, 2013 at 3:36 AM, Kristj?n Valur J?nsson <kristjan at ccpgames.com<mailto:kristjan at ccpgames.com>> wrote:



Ruby was designed with this model in mind, it only arrived later into Python.

 Are you sure? I wrote about metaclasses in Python in 1998: http://www.python.org/doc/essays/metaclasses/
New-style classes were just the second or third iteration of the idea.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131024/8097f78b/attachment-0001.html>

From anntzer.lee at gmail.com  Thu Oct 24 18:51:44 2013
From: anntzer.lee at gmail.com (Antony Lee)
Date: Thu, 24 Oct 2013 09:51:44 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <871u3bq8sj.fsf@xemacs.org>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
 <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
 <871u3bq8sj.fsf@xemacs.org>
Message-ID: <CAGRr6BEXhoDzbeA-Edxv9roHJtrCjThmSTsvmqEZ05TB5SG_OQ@mail.gmail.com>

2013/10/24 Stephen J. Turnbull <stephen at xemacs.org>

> Antony Lee writes:
>
>  > as I mentioned earlier I don't really see the need
>
> It's not a *need*, it's a *style*.  Ie, it's just the way Pythonistas
> do things.
>
> Many fledgling Pythonistas, especially those who come from other
> languages, hanker after anonymous functions or Ruby blocks (a somewhat
> different concept AFAICT), and some experienced Pythonistas sympathize
> with them.  But there's no real reason why some functions shouldn't
> have a name, and Guido (the ultimate authority on "Pythonic") isn't a
> fan of lambda, so (unless a pressing need or a really good syntax
> appears) there seems to be little chance of the existing feature
> (expressions as lambdas) being extended.
>

I think you've misunderstood the aim of my proposal.  While my proposal
does contain the possibility, as a syntax extension, to define anonymous
lambdas (or functions, if you prefer), in the middle of an expression (as
opposed to "in a statement"), the core idea is about *named* lambdas (or
functions, etc.).  From the very beginning I said that allowing the
anonymous form ("(def (...): ...)" instead of "(def name(...): ...)") was
not a critical part of the proposal.


>
>  > for binding lambdas to objects in the first place.
>
> I think you've misspoken here.  Lambdas *are* objects, and that's why
> names can be bound to them (and then they're called "functions").
>

I was specifically referring to Andrew Barnaert's reply, which I quote here:

Partly this is based on JS experience, where you have to write .bind(this)
> all over the place because your callback--or, worse, some callback later in
> the chain--needs access to this.
>

Specifically, I was asking his to clarify the relevance of such an issue as
a locally defined lambda (etc.) already captures "self" without the need
for a call to a JS-like bind (which is effectively spelled __get__ in
Python).


> What people complain about is the fact that the normal way to create a
> lambda (callable function object or something like that) is "def",
> which also binds a name to the object.  They think that is wasteful or
> something.


No, that is not what I complain about.  Just look at the other currently
active thread on context manager semantics, where Nick Coghlan said

it *also* means that we *don't* currently have a clean syntax for single
> use callbacks.
>
> Hence the time I've put into PEP 403 and 3150 over the years - a key
> objective for both of them is providing a cleaner solution for the problem
> of single use callbacks (including those that modify local variables of the
> containing function).
>

Another example is when you want to provide a dictionary of callbacks (e.g.
to be triggered by various command line options), say at global scope,
without putting the callbacks themselves in that scope).   See Serhiy
Storchaka's suggestion on why multiline lambdas are not essential here:

> callbacks = {
> >      "foo":
> >          (def foo():
> >              <...>), # note how the parentheses disambiguate where the
> > last comma belongs
> >      "bar":
> >          (def bar():
> >              <...>)
> > }
>
> def class_to_map(cls):
>      return {n: f for n, f in cls.__dict__items() if n[0] != '_'}
>
> @class_to_map
> class callbacks:
>      @staticmethod
>      def foo():
>          ...
>      @staticmethod
>      def bar():
>          ...
>

I don't think abusing the class statement is particularly elegant either
(and good luck if you want to preserve order... what, I have to provide a
metaclass that overrides __prepare__ for doing this?), and moreover this
solution doesn't even work as it is written (it needs a small fix... quick,
can you spot it)?


> Steve
>

Antony
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131024/529574fa/attachment.html>

From guido at python.org  Thu Oct 24 18:54:38 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Oct 2013 09:54:38 -0700
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD381373C58@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
 <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
 <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>
 <CAP7+vJJ7fpWtZ6Ze91qeeSD0rEHFoaDwz19+pt1T68jLVUXsEQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD381373C58@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CAP7+vJ+f6=GeX9uO0cL3V2AwFZPvi0EHobBb+smh4EMzjbeJdA@mail.gmail.com>

As log as we are speculating about the origins of language features, I feel
the need to set the record straight.

I was not inspired by Ruby at that point (or ever :-). Ruby was in fact
inspired by Python. Mats once told me that his inspiration was 20% Python,
80% Perl, and that Larry Wall is his hero.

I was inspired to implement new-style classes by a very specific book,
"Putting Metaclasses to Work" by Ira Forman and Scott Danforth (
http://www.amazon.com/Putting-Metaclasses-Work-Ira-Forman/dp/0201433052).

But even Python's original design (in 1990, published in 1991) had the
notion that 'type' was itself an object. The type pointer in any object has
always been a pointer to a special object, whose "data" was a bunch of C
function pointers implementing the behavior of other objects, similar to a
C++ vtable. The type of a type was always a special type object, which you
could call a meta-type, to be recognized because it was its own type.

I was only vaguely aware of Smalltalk at the time; I remember being
surprised by its use of metaclasses when I read about them much later.
Smalltalk's bytecode was a bigger influence of Python's bytecode though.
I'd read about it in a book by Adele Goldberg and others, I believe
"Smalltalk-80: The Language and its Implementation" (
http://www.amazon.com/Smalltalk-80-The-Language-its-Implementation/dp/0201113716
).


On Thu, Oct 24, 2013 at 8:59 AM, Kristj?n Valur J?nsson <
kristjan at ccpgames.com> wrote:

>  I?m not sure about anything J.  In particular, I don?t know where Ruby?s
> object model originates.****
>
> And Ruby 1.0 came out in 1996.****
>
> I?m sure that the model of ?object? and ?type? (or other equivalent names)
> is older, though.  Could be a simplification of****
>
> Smalltalk?s object model, for example.   Well, looking this up, this is
> what Wikipedia says, in fact.****
>
> But I recall someone, somewhere, mentioning that this system is based on a
> proper Paper by someone J****
>
> But Python and Ruby?s models are quite similar in structure.****
>
> I don?t know if Python?s new-style classes were inspired by Ruby or not,
> perhaps it is a case of****
>
> convergent evolution. ****
>
> ** **
>
> Cheers,****
>
> K****
>
> ** **
>
> *From:* gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] *On Behalf Of *Guido
> van Rossum
> *Sent:* 24. okt?ber 2013 15:26
> *To:* Kristj?n Valur J?nsson
> *Cc:* python-ideas ideas
>
> *Subject:* Re: [Python-ideas] A different kind of context manager****
>
>  ** **
>
> On Thu, Oct 24, 2013 at 3:36 AM, Kristj?n Valur J?nsson <
> kristjan at ccpgames.com> wrote:****
>
>
>
> ****
>
>
> Ruby was designed with this model in mind, it only arrived later into
> Python.****
>
> ** **
>
>  Are you sure? I wrote about metaclasses in Python in 1998:
> http://www.python.org/doc/essays/metaclasses/****
>
> New-style classes were just the second or third iteration of the idea.****
>
> ** **
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131024/bc17a566/attachment-0001.html>

From abarnert at yahoo.com  Thu Oct 24 19:06:08 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 24 Oct 2013 10:06:08 -0700
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <871u3bq8sj.fsf@xemacs.org>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
 <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
 <871u3bq8sj.fsf@xemacs.org>
Message-ID: <1AAC5C33-5BD1-4470-9A65-28007FB40676@yahoo.com>

On Oct 24, 2013, at 3:10, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Antony Lee writes:
> 
>> as I mentioned earlier I don't really see the need
> 
> It's not a *need*, it's a *style*.  Ie, it's just the way Pythonistas
> do things.
> 
> Many fledgling Pythonistas, especially those who come from other
> languages, hanker after anonymous functions or Ruby blocks (a somewhat
> different concept AFAICT), and some experienced Pythonistas sympathize
> with them.  But there's no real reason why some functions shouldn't
> have a name, and Guido (the ultimate authority on "Pythonic") isn't a
> fan of lambda, so (unless a pressing need or a really good syntax
> appears) there seems to be little chance of the existing feature
> (expressions as lambdas) being extended.
> 
>> for binding lambdas to objects in the first place.
> 
> I think you've misspoken here.  Lambdas *are* objects, and that's why
> names can be bound to them (and then they're called "functions").
> What people complain about is the fact that the normal way to create a
> lambda (callable function object or something like that) is "def",
> which also binds a name to the object.  They think that is wasteful or
> something.

I think the desire for this feature is less about the name issue, and more about two other issues.

The big issue is that you can't put a statement in an expression. Python has a much stricter statement/expression barrier than most languages. Python also has a nice separation between its declarative subset and the rest of the language. And it ties these two things together: composable expressions don't have side effects. All of this contributes a lot to Python's readability. But it can be limiting. If you want to throw a side effect like updating a variable into the middle of an expression--whether it's a listcomp or a lambda callback--you can't. In Ruby, JavaScript, the main .NET languages, and even some of the impure traditional functional languages, doing that kind of thing is not just allowed, but idiomatic. That's why all of these proposals are about a way to embed a statement context (a "multiline lambda" or "inline def") into the middle of an expression.

There's also an order-of-definition problem. You want to put code where it belongs, and to avoid highlighting code that isn't the important part of your logic. When you write a call to async_read() or Button() with a trivial callback, it's easier to understand if the callback is inside the call (which is why we have lambdas in the first place). Nick Coghlan's two @in PEPs are about providing this feature without the previous one (and I think it's significant that they've attracted much less dislike, but also less positive excitement).



From ncoghlan at gmail.com  Fri Oct 25 00:10:59 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Oct 2013 08:10:59 +1000
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <1AAC5C33-5BD1-4470-9A65-28007FB40676@yahoo.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
 <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
 <871u3bq8sj.fsf@xemacs.org>
 <1AAC5C33-5BD1-4470-9A65-28007FB40676@yahoo.com>
Message-ID: <CADiSq7eTMFrYWYs=9Lz1NhKE0_EABewDHGE-6Nexs5vACwW6Sg@mail.gmail.com>

On 25 Oct 2013 03:06, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>
> On Oct 24, 2013, at 3:10, "Stephen J. Turnbull" <stephen at xemacs.org>
wrote:
>
> > Antony Lee writes:
> >
> >> as I mentioned earlier I don't really see the need
> >
> > It's not a *need*, it's a *style*.  Ie, it's just the way Pythonistas
> > do things.
> >
> > Many fledgling Pythonistas, especially those who come from other
> > languages, hanker after anonymous functions or Ruby blocks (a somewhat
> > different concept AFAICT), and some experienced Pythonistas sympathize
> > with them.  But there's no real reason why some functions shouldn't
> > have a name, and Guido (the ultimate authority on "Pythonic") isn't a
> > fan of lambda, so (unless a pressing need or a really good syntax
> > appears) there seems to be little chance of the existing feature
> > (expressions as lambdas) being extended.
> >
> >> for binding lambdas to objects in the first place.
> >
> > I think you've misspoken here.  Lambdas *are* objects, and that's why
> > names can be bound to them (and then they're called "functions").
> > What people complain about is the fact that the normal way to create a
> > lambda (callable function object or something like that) is "def",
> > which also binds a name to the object.  They think that is wasteful or
> > something.
>
> I think the desire for this feature is less about the name issue, and
more about two other issues.
>
> The big issue is that you can't put a statement in an expression. Python
has a much stricter statement/expression barrier than most languages.
Python also has a nice separation between its declarative subset and the
rest of the language. And it ties these two things together: composable
expressions don't have side effects. All of this contributes a lot to
Python's readability. But it can be limiting. If you want to throw a side
effect like updating a variable into the middle of an expression--whether
it's a listcomp or a lambda callback--you can't. In Ruby, JavaScript, the
main .NET languages, and even some of the impure traditional functional
languages, doing that kind of thing is not just allowed, but idiomatic.
That's why all of these proposals are about a way to embed a statement
context (a "multiline lambda" or "inline def") into the middle of an
expression.
>
> There's also an order-of-definition problem. You want to put code where
it belongs, and to avoid highlighting code that isn't the important part of
your logic. When you write a call to async_read() or Button() with a
trivial callback, it's easier to understand if the callback is inside the
call (which is why we have lambdas in the first place). Nick Coghlan's two
@in PEPs are about providing this feature without the previous one (and I
think it's significant that they've attracted much less dislike, but also
less positive excitement).

Right, when people get excited about Ruby's "blocks" and say Python should
have something equivalent, they can actually be talking about two subtly
different things:
- the block statement (do/end) which handles single use callbacks nicely
- inline blocks which let you embed callbacks with side effects on name
bindings in the local namespace inside expressions

The functional programming influence on Python at the expression level is
strong enough for me to say "no, we almost certainly don't want to go
there" (but see the postscript below).

The first would definitely be a nice problem to resolve, though.

Cheers,
Nick.

P.S.
http://python-notes.curiousefficiency.org/en/latest/pep_ideas/suite_expr.html

>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131025/30358b6d/attachment.html>

From stephen at xemacs.org  Fri Oct 25 02:11:11 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 25 Oct 2013 09:11:11 +0900
Subject: [Python-ideas] YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BEXhoDzbeA-Edxv9roHJtrCjThmSTsvmqEZ05TB5SG_OQ@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
 <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
 <871u3bq8sj.fsf@xemacs.org>
 <CAGRr6BEXhoDzbeA-Edxv9roHJtrCjThmSTsvmqEZ05TB5SG_OQ@mail.gmail.com>
Message-ID: <87wql2xl9c.fsf@uwakimon.sk.tsukuba.ac.jp>

Antony Lee <anntzer.lee at gmail.com> writes:
>>>>> 2013/10/24 Stephen J. Turnbull <stephen at xemacs.org>

 > I think you've misunderstood the aim of my proposal.

No, I misunderstood the language in your most recent post, not to
mention the consistent mis-specification of the whole idea in *every*
subject line.  Python doesn't currently have a notion of "binding"
lambdas to objects, and you here don't specify how a lambda (object)
is being "bound" to what (other kind of) object.  Python does have a
notion of binding names to objects.  Confusion is natural.

If you're *primarily* talking about defining functions in the "right
place" in general, what's wrong with using the well-defined Python
terms for those objects?

 > Specifically, I was asking his to clarify the relevance of such an
 > issue as a locally defined lambda (etc.) already captures "self"
 > without the need for a call to a JS-like bind (which is effectively
 > spelled __get__ in Python).

Sure.  So there is no "need" in the first place.  I don't understand
what you're talking about.  IMO Nick's use case, described next, is
compelling.  Why not just say "this syntax does the trick for that use
case"?  Are you saying anything else?  That's what I can't figure out.

 > Just look at the other currently active thread on context manager
 > semantics, where Nick Coghlan said

 >> it *also* means that we *don't* currently have a clean syntax for
 >> single use callbacks.

 > Another example is when you want to provide a dictionary of
 > callbacks (e.g.  to be triggered by various command line options),
 > say at global scope, without putting the callbacks themselves in
 > that scope).

Eh, "there you go again."  That's the same example, isn't it?  It's a
good one and stands repeating, but it's not different.  In both cases
you have a callback and a desire to put the def in an appropriate
"place" (namespace, and often lexical position in the source).  The
dictionary of callbacks idiom is familiar (at least to those of us to
have the misfortune to program with Xt), and I certainly understand
the desire to define callbacks in an appropriate scope.  Again, do you
have *more* to say than "I think my syntax does this nicely,
concisely, and precisely"?

If you have other compelling use cases, that would be useful.  But I
think the "defining callbacks in the right place" use case would be
enough to get this proposal in Python 3.5 (if it stands up to issues
like the LL constraint on the languages, and the objection to
YAAP[1]).  Mixing this up with "lambda" and "binding" is unhelpful
AFAICS.

 > I don't think abusing the class statement is particularly elegant
 > either (and good luck if you want to preserve order... what, I have
 > to provide a metaclass that overrides __prepare__ for doing this?),

I doubt you need to use a metaclass (storing the callbacks in an
OrderedDict and then defining an appropriate __index__ can probably be
done with a decorator or two), but the alternatives are hardly
prettier I guess.


Footnotes: 
[1]  Yet Another Abuse of Parentheses, also spelled "YAAPMILL"
("... Making It Like LISP").


From anntzer.lee at gmail.com  Fri Oct 25 08:28:21 2013
From: anntzer.lee at gmail.com (Antony Lee)
Date: Thu, 24 Oct 2013 23:28:21 -0700
Subject: [Python-ideas] Fwd:  YAML (yet-another-multiline-lambda)
In-Reply-To: <CAGRr6BENEmDi-y-fBTGzH33SaEn1h92G6eJUXdg5V_NO3=C9fw@mail.gmail.com>
References: <CAGRr6BFcH=x8CF-kkVfRUy8=VtHAitjU-Aor15jcf9xyJs9fjA@mail.gmail.com>
 <C58D04B1-E1B3-43BE-9013-828566809F84@yahoo.com>
 <dfa439d4-94c1-4812-b28b-44caf002c5c6@googlegroups.com>
 <914D3110-F0FA-49E5-80C5-666F263574F1@yahoo.com>
 <82fd9861-88c2-4223-9409-c63334327295@googlegroups.com>
 <871u3bq8sj.fsf@xemacs.org>
 <CAGRr6BEXhoDzbeA-Edxv9roHJtrCjThmSTsvmqEZ05TB5SG_OQ@mail.gmail.com>
 <87wql2xl9c.fsf@uwakimon.sk.tsukuba.ac.jp>
 <CAGRr6BENEmDi-y-fBTGzH33SaEn1h92G6eJUXdg5V_NO3=C9fw@mail.gmail.com>
Message-ID: <CAGRr6BG52E0+1z7MnP6RQUj3OzjLPue_emnQxPjzxUYm=2Nw4A@mail.gmail.com>

Sorry Stephen, forgot to reply-to-all the first time.
Antony

2013/10/24 Stephen J. Turnbull <stephen at xemacs.org>

> Antony Lee <anntzer.lee at gmail.com> writes:
> >>>>> 2013/10/24 Stephen J. Turnbull <stephen at xemacs.org>
>
>  > I think you've misunderstood the aim of my proposal.
>
> No, I misunderstood the language in your most recent post, not to
> mention the consistent mis-specification of the whole idea in *every*
> subject line.  Python doesn't currently have a notion of "binding"
> lambdas to objects, and you here don't specify how a lambda (object)
> is being "bound" to what (other kind of) object.  Python does have a
> notion of binding names to objects.  Confusion is natural.
>
> If you're *primarily* talking about defining functions in the "right
> place" in general, what's wrong with using the well-defined Python
> terms for those objects?
>

To me, in the context of Python, the words lambda and function have the
same meaning at the object representation level (an object of type
types.FunctionType (== types.LambdaType)), the only difference coming at
the syntactic level: a lambda is defined in an expression, a function is
defined in a statement.  Thus the name "multi-line lambda" (and it made a
nice acronym in the title, sure).  But if you prefer, you can replace
"multi-line lambda" by "function defined by multi-line expression"
everywhere.

>
>  > Specifically, I was asking his to clarify the relevance of such an
>  > issue as a locally defined lambda (etc.) already captures "self"
>  > without the need for a call to a JS-like bind (which is effectively
>  > spelled __get__ in Python).
>
> Sure.  So there is no "need" in the first place.  I don't understand
> what you're talking about.  IMO Nick's use case, described next, is
> compelling.  Why not just say "this syntax does the trick for that use
> case"?  Are you saying anything else?  That's what I can't figure out.
>

I am trying to answer the issue Andrew raised:

> I don't think this idea will fit well with Python's OO.
>
> Partly this is based on JS experience, where you have to write .bind(this)
> all over the place because your callback--or, worse, some callback later in
> the chain--needs access to this.
>

Like it or not, to discuss this argument, I need to discuss the need to
write ".bind(this)", or, in Python-speak, create bound(!) methods out of
"functions defined by multi-line expressions".  In fact, if I understood
you well, you seem to agree that it doesn't actually apply to this proposal.

>
>  > Just look at the other currently active thread on context manager
>  > semantics, where Nick Coghlan said
>
>  >> it *also* means that we *don't* currently have a clean syntax for
>  >> single use callbacks.
>
>  > Another example is when you want to provide a dictionary of
>  > callbacks (e.g.  to be triggered by various command line options),
>  > say at global scope, without putting the callbacks themselves in
>  > that scope).
>
> Eh, "there you go again."  That's the same example, isn't it?  It's a
> good one and stands repeating, but it's not different.  In both cases
> you have a callback and a desire to put the def in an appropriate
> "place" (namespace, and often lexical position in the source).  The
> dictionary of callbacks idiom is familiar (at least to those of us to
> have the misfortune to program with Xt), and I certainly understand
> the desire to define callbacks in an appropriate scope.  Again, do you
> have *more* to say than "I think my syntax does this nicely,
> concisely, and precisely"?
>

If I put my function definition in an expression, the point is of course to
use it "in the appropriate place", so this wording more or less covers
every use I can think of.  But perhaps you can consider the following use
case as "different enough".  If we had this feature, we could probably have
done without the "with" statement:

with open(name) as f:
    ...
===>
opening(name, (def func(f): ...)) # define a new "opening" function
or
open(name, do_and_close=(def func(f): ...)) # add a kwarg to "open"

with lock:
    ...
===>
lock.acquiring(def func(): ...) # define a new "acquiring" method

While I certainly consider the "with" statement to be extremely useful, and
nicer for such use cases, I simply doubt that new syntax would have been
added at all if this construct was available.  Of course one may consider
this as a non-argument as we have the "with" statement now and it's not
going away (which is a good thing).


> If you have other compelling use cases, that would be useful.  But I
> think the "defining callbacks in the right place" use case would be
> enough to get this proposal in Python 3.5 (if it stands up to issues
> like the LL constraint on the languages, and the objection to
> YAAP[1]).  Mixing this up with "lambda" and "binding" is unhelpful
> AFAICS.
>

I'm certainly no expert on parsing but this proposal was certainly designed
to be (relatively) simple to parse: instead of having just having one stack
of indentation levels, keep a stack of such stacks.  When encountering a
"def" token, check that the last token was a parenthesis (or perhaps
brackets and curly braces too... although I don't think using lambdas as
dict keys in a good idea at all :-)), and if so, start parsing the inline
def as if it was at toplevel, using a new stack of indent levels.

As for the abuse of parentheses [1], note that in the most common case (I
would guess) of passing a single inline function (or whatever you want to
call it) to another function, the function call itself already provides
these parentheses so there is no need of adding another level of
parentheses.

>
>  > I don't think abusing the class statement is particularly elegant
>  > either (and good luck if you want to preserve order... what, I have
>  > to provide a metaclass that overrides __prepare__ for doing this?),
>
> I doubt you need to use a metaclass (storing the callbacks in an
> OrderedDict and then defining an appropriate __index__ can probably be
> done with a decorator or two), but the alternatives are hardly
> prettier I guess.
>
>
> Footnotes:
> [1]  Yet Another Abuse of Parentheses, also spelled "YAAPMILL"
> ("... Making It Like LISP").
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131024/a2c06ac1/attachment.html>

From kristjan at ccpgames.com  Fri Oct 25 13:28:26 2013
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 25 Oct 2013 11:28:26 +0000
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <CAP7+vJ+f6=GeX9uO0cL3V2AwFZPvi0EHobBb+smh4EMzjbeJdA@mail.gmail.com>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
 <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
 <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>
 <CAP7+vJJ7fpWtZ6Ze91qeeSD0rEHFoaDwz19+pt1T68jLVUXsEQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD381373C58@RKV-IT-EXCH104.ccp.ad.local>
 <CAP7+vJ+f6=GeX9uO0cL3V2AwFZPvi0EHobBb+smh4EMzjbeJdA@mail.gmail.com>
Message-ID: <EFE3877620384242A686D52278B7CCD381374EA3@RKV-IT-EXCH104.ccp.ad.local>

Thanks, Guido, This is in fact very interesting.
I'll be sure to not wildly speculate out of my posterior on these matters again, but refer to the facts :)
Another data point indicating that convergent evolution does, in fact, exist.

K


From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of Guido van Rossum
Sent: 24. okt?ber 2013 16:55
To: Kristj?n Valur J?nsson
Cc: python-ideas ideas
Subject: Re: [Python-ideas] A different kind of context manager

As log as we are speculating about the origins of language features, I feel the need to set the record straight.

I was not inspired by Ruby at that point (or ever :-). Ruby was in fact inspired by Python. Mats once told me that his inspiration was 20% Python, 80% Perl, and that Larry Wall is his hero.
I was inspired to implement new-style classes by a very specific book, "Putting Metaclasses to Work" by Ira Forman and Scott Danforth (http://www.amazon.com/Putting-Metaclasses-Work-Ira-Forman/dp/0201433052).
But even Python's original design (in 1990, published in 1991) had the notion that 'type' was itself an object. The type pointer in any object has always been a pointer to a special object, whose "data" was a bunch of C function pointers implementing the behavior of other objects, similar to a C++ vtable. The type of a type was always a special type object, which you could call a meta-type, to be recognized because it was its own type.
I was only vaguely aware of Smalltalk at the time; I remember being surprised by its use of metaclasses when I read about them much later. Smalltalk's bytecode was a bigger influence of Python's bytecode though. I'd read about it in a book by Adele Goldberg and others, I believe "Smalltalk-80: The Language and its Implementation" (http://www.amazon.com/Smalltalk-80-The-Language-its-Implementation/dp/0201113716).
--Guido van Rossum (python.org/~guido<http://python.org/~guido>)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131025/e9f3baff/attachment.html>

From flying-sheep at web.de  Fri Oct 25 14:38:14 2013
From: flying-sheep at web.de (Philipp A.)
Date: Fri, 25 Oct 2013 14:38:14 +0200
Subject: [Python-ideas] A different kind of context manager
In-Reply-To: <EFE3877620384242A686D52278B7CCD381374EA3@RKV-IT-EXCH104.ccp.ad.local>
References: <EFE3877620384242A686D52278B7CCD38136CAD4@RKV-IT-EXCH104.ccp.ad.local>
 <CAGu0AntiRcr+0uM9QquYADK8qFBqE6kKH74y1NB9DFZi0cbTTQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38136D149@RKV-IT-EXCH104.ccp.ad.local>
 <CALruUQK1JXLGBsw45+e9BBKi9OgGCCJXFAAHPW6Aog1OAjyh1Q@mail.gmail.com>
 <CADiSq7dzVW-FP=L7Qc5fzFjqFTWoRJFCnTMOd_iKVXuYsXSvmQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD38137298A@RKV-IT-EXCH104.ccp.ad.local>
 <75DBE254-8764-4F34-A28D-AF6D7384517E@masklinn.net>
 <EFE3877620384242A686D52278B7CCD38137348D@RKV-IT-EXCH104.ccp.ad.local>
 <CAP7+vJJ7fpWtZ6Ze91qeeSD0rEHFoaDwz19+pt1T68jLVUXsEQ@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD381373C58@RKV-IT-EXCH104.ccp.ad.local>
 <CAP7+vJ+f6=GeX9uO0cL3V2AwFZPvi0EHobBb+smh4EMzjbeJdA@mail.gmail.com>
 <EFE3877620384242A686D52278B7CCD381374EA3@RKV-IT-EXCH104.ccp.ad.local>
Message-ID: <CAN8d9g=6g-Q3EnRjdHN0nVpqv42iUUjywMHg22-s+jrULOBcAw@mail.gmail.com>

2013/10/25 Kristj?n Valur J?nsson <kristjan at ccpgames.com>

> wildly speculate out of my posterior


just curious: is that ?pull things out of my ass? in polite?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131025/03785b81/attachment.html>

From neurofag at gmail.com  Fri Oct 25 16:12:55 2013
From: neurofag at gmail.com (neuro)
Date: Fri, 25 Oct 2013 18:12:55 +0400
Subject: [Python-ideas] Can you explain me why that is a bad idea?
Message-ID: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>

I wrote to Guido on twitter this question

Can you add this construction for c in list if condition : suite when suite
block very big for list comprehension

This is Guido answer:
Someone on python-ideas at python.org can probably explain why that is a bad
idea.

Can you explain me why that is a bad idea?
with best regards,
Abu Sultanov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131025/f78227e6/attachment.html>

From p.f.moore at gmail.com  Fri Oct 25 16:26:27 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 25 Oct 2013 15:26:27 +0100
Subject: [Python-ideas] Can you explain me why that is a bad idea?
In-Reply-To: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>
References: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>
Message-ID: <CACac1F_DH8aUARnkW3UqNL-KMYO5-cUtqCO0znqV_c7G_sO3Uw@mail.gmail.com>

On 25 October 2013 15:12, neuro <neurofag at gmail.com> wrote:
> Can you add this construction for c in list if condition : suite when suite
> block very big for list comprehension
>
> This is Guido answer:
> Someone on python-ideas at python.org can probably explain why that is a bad
> idea.
>
> Can you explain me why that is a bad idea?

This has been discussed before on this list. I don't recall the
details but if you searchthe archives, you should be able to find it.
Paul

From rosuav at gmail.com  Fri Oct 25 16:44:59 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Sat, 26 Oct 2013 01:44:59 +1100
Subject: [Python-ideas] Can you explain me why that is a bad idea?
In-Reply-To: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>
References: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>
Message-ID: <CAPTjJmpJo5rJXBi2DqyOqPL5RKd08XWMUoMKti-Ftodp5B=ajA@mail.gmail.com>

On Sat, Oct 26, 2013 at 1:12 AM, neuro <neurofag at gmail.com> wrote:
> This is Guido answer:
> Someone on python-ideas at python.org can probably explain why that is a bad
> idea.
>

Well, firstly, it's generally better to take things to a mailing list
than to personally email the BDFL :) You're fortunate - seems you got
a reply (and a courteous and helpful one at that); most busy people
would just delete your email and move on.

As to the "for... in... if" sequence - it's been discussed quite a few
times. It might seem nice and clean in simple cases, but invariably
you come up against some messy edge cases before long. The first one
to deal with is the parser ambiguity with the ternary operator.

ChrisA

From tjreedy at udel.edu  Fri Oct 25 21:54:03 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 25 Oct 2013 15:54:03 -0400
Subject: [Python-ideas] Can you explain me why that is a bad idea?
In-Reply-To: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>
References: <CAJEQnsuprGttESViPOfcT+TbhZS43C27xZWLv+JkGNX3uUAmNA@mail.gmail.com>
Message-ID: <l4ei8n$fmo$1@ger.gmane.org>

On 10/25/2013 10:12 AM, neuro wrote:
> I wrote to Guido on twitter this question
>
> Can you add this construction
 > for c in list if condition :
 >   suite
 > when suite block very big for list comprehension
>
> This is Guido answer:
> Someone on python-ideas at python.org
> <mailto:python-ideas at python.org> can
> probably explain why that is a bad idea.
>
> Can you explain me why that is a bad idea?

To start with, it is completely unnecessary as it would be the same as
for c in list:
   if condition:
     suite

Next add add in while condition blocks and the possibility of more than 
two blocks on a line. Things quickly get confusing.

Now consider adding an else: clause to your example. Would it belong to 
the for clause or to the if clause?

If we start down this road, why not allow 'try' also?

One compound statement header to a line is a good rule.

-- 
Terry Jan Reedy


From guido at python.org  Sun Oct 27 18:04:02 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Oct 2013 10:04:02 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
Message-ID: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>

In the comments of
http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.htmlthere
were some complaints about the interpretation of the bounds for
negative strides, and I have to admin it feels wrong. Where did we go
wrong? For example,

"abcde"[::-1] == "edcba"

as you'd expect, but there is no number you can put as the second bound to
get the same result:

"abcde"[:1:-1] == "edc"
"abcde"[:0:-1] == "edcb"

but

"abcde":-1:-1] == ""

I'm guessing it all comes from the semantics I assigned to negative stride
for range() long ago, unthinkingly combined with the rules for negative
indices.

Are we stuck with this forever? If we want to fix this in Python 4 we'd
have to start deprecating negative stride with non-empty lower/upper bounds
now. And we'd have to start deprecating negative step for range()
altogether, recommending reversed(range(lower, upper)) instead.

Thoughts? Is NumPy also affected?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/dd5c2905/attachment.html>

From shibturn at gmail.com  Sun Oct 27 18:28:39 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Sun, 27 Oct 2013 17:28:39 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <526D4D47.5090106@gmail.com>

On 27/10/2013 5:04pm, Guido van Rossum wrote:
> Are we stuck with this forever? If we want to fix this in Python 4 we'd
> have to start deprecating negative stride with non-empty lower/upper
> bounds now. And we'd have to start deprecating negative step for range()
> altogether, recommending reversed(range(lower, upper)) instead.

Or recommend using None?

 >>> "abcde"[None:None:-1]
'edcba'

-- 
Richard


From elazarg at gmail.com  Sun Oct 27 18:23:39 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Sun, 27 Oct 2013 19:23:39 +0200
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <CAPw6O2QU1ZSfd2o=Ujx4oEnWLb=1v8hkPC+8SL_YJKc2ou4jYQ@mail.gmail.com>

I believe the problem is not about negative strides but about negative
bounds. There should be a notion of "minus zero", something like

"abcde"[:-0:-1] =="edcba".

Here ":-" serves as a special syntax for negative stride; of course it is
not a real proposal.
The same awkwardness results when you take a negative upper bounds to the
limit of 0:

"abcde"[:-2] == "abc"
"abcde"[:-1] == "abcd"
"abcde"[:-0] == ""

(I once filed a bug for it, which was of course correctly rejected:
http://bugs.python.org/issue17287).

Elazar


2013/10/27 Guido van Rossum <guido at python.org>

> In the comments of
> http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.htmlthere were some complaints about the interpretation of the bounds for
> negative strides, and I have to admin it feels wrong. Where did we go
> wrong? For example,
>
> "abcde"[::-1] == "edcba"
>
> as you'd expect, but there is no number you can put as the second bound to
> get the same result:
>
> "abcde"[:1:-1] == "edc"
> "abcde"[:0:-1] == "edcb"
>
> but
>
> "abcde":-1:-1] == ""
>
> I'm guessing it all comes from the semantics I assigned to negative stride
> for range() long ago, unthinkingly combined with the rules for negative
> indices.
>
> Are we stuck with this forever? If we want to fix this in Python 4 we'd
> have to start deprecating negative stride with non-empty lower/upper bounds
> now. And we'd have to start deprecating negative step for range()
> altogether, recommending reversed(range(lower, upper)) instead.
>
> Thoughts? Is NumPy also affected?
>
> --
> --Guido van Rossum (python.org/~guido)
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/d4157d47/attachment.html>

From python at mrabarnett.plus.com  Sun Oct 27 18:40:07 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 27 Oct 2013 17:40:07 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <526D4FF7.6010106@mrabarnett.plus.com>

On 27/10/2013 17:04, Guido van Rossum wrote:
> In the comments of
> http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html
> there were some complaints about the interpretation of the bounds for
> negative strides, and I have to admin it feels wrong. Where did we go
> wrong? For example,
>
> "abcde"[::-1] == "edcba"
>
> as you'd expect, but there is no number you can put as the second bound
> to get the same result:
>
> "abcde"[:1:-1] == "edc"
> "abcde"[:0:-1] == "edcb"
>
> but
>
> "abcde":-1:-1] == ""
>
> I'm guessing it all comes from the semantics I assigned to negative
> stride for range() long ago, unthinkingly combined with the rules for
> negative indices.
>
For a positive stride, omitting the second bound is equivalent to
length + 1:

 >>> "abcde"[:6:1]
'abcde'

For a negative stride, omitting the second bound is equivalent to
-(length + 1):

 >>> "abcde"[:-6:-1]
'edcba'

> Are we stuck with this forever? If we want to fix this in Python 4 we'd
> have to start deprecating negative stride with non-empty lower/upper
> bounds now. And we'd have to start deprecating negative step for range()
> altogether, recommending reversed(range(lower, upper)) instead.
>
> Thoughts? Is NumPy also affected?
>


From storchaka at gmail.com  Sun Oct 27 19:07:36 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sun, 27 Oct 2013 20:07:36 +0200
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <l4jkol$lok$1@ger.gmane.org>

27.10.13 19:04, Guido van Rossum ???????(??):
> In the comments of
> http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html
> there were some complaints about the interpretation of the bounds for
> negative strides, and I have to admin it feels wrong. Where did we go
> wrong? For example,
>
> "abcde"[::-1] == "edcba"
>
> as you'd expect, but there is no number you can put as the second bound
> to get the same result:

But you can put None.

 >>> "abcde"[:None:-1]
'edcba'



From guido at python.org  Sun Oct 27 19:32:29 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Oct 2013 11:32:29 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526D4FF7.6010106@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
Message-ID: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>

On Sun, Oct 27, 2013 at 10:40 AM, MRAB <python at mrabarnett.plus.com> wrote:

> On 27/10/2013 17:04, Guido van Rossum wrote:
>
>> In the comments of
>> http://python-history.**blogspot.com/2013/10/why-**
>> python-uses-0-based-indexing.**html<http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html>
>> there were some complaints about the interpretation of the bounds for
>> negative strides, and I have to admin it feels wrong. Where did we go
>> wrong? For example,
>>
>> "abcde"[::-1] == "edcba"
>>
>> as you'd expect, but there is no number you can put as the second bound
>> to get the same result:
>>
>> "abcde"[:1:-1] == "edc"
>> "abcde"[:0:-1] == "edcb"
>>
>> but
>>
>> "abcde":-1:-1] == ""
>>
>> I'm guessing it all comes from the semantics I assigned to negative
>> stride for range() long ago, unthinkingly combined with the rules for
>> negative indices.
>>
>>  For a positive stride, omitting the second bound is equivalent to
> length + 1:
>
> >>> "abcde"[:6:1]
> 'abcde'
>

Actually, it is equivalent to length; "abcde"[:5:1] == "abcde" too.


> For a negative stride, omitting the second bound is equivalent to
> -(length + 1):
>
> >>> "abcde"[:-6:-1]
> 'edcba'
>

Hm, so the idea is that with a negative stride you you should use negative
indices. Then at least you get a somewhat useful invariant:

if -len(a)-1 <= j <= i <= -1:
    len(a[i:j:-1]) == i-j

which at least somewhat resembles the invariant for positive indexes and
stride:

if 0 <= i <= j <= len(a):
    len(a[i:j:1]) == j-i

For negative indices and stride, we now also get back this nice theorem
about adjacent slices:

if -len(a)-1 <= i <= -1:
    a[:i:-1] + a[i::-1] == a[::-1]

Using negative indices also restores the observation that a[i:j:k] produces
exactly the items corresponding to the values produced by range(i, j, k).

Still, the invariant for negative stride looks less attractive, and the
need to use negative indices confuses the matter. Also we end up with -1
corresponding to the position at one end and -len(a)-1 corresponding to the
position at the other end. The -1 offset feels really wrong here.

I wonder if it would have been simpler if we had defined a[i:j:-1] as the
reverse of a[i:j]?

What are real use cases for negative strides?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/db94780d/attachment-0001.html>

From tim.peters at gmail.com  Sun Oct 27 19:38:05 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 27 Oct 2013 13:38:05 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>

I may have a different slant on this.  I've found that - by far - the
most successful way to "teach slices" to newcomers is to invite them
to view indices as being _between_ sequence elements.

<position 0> <element> <position 1> <element> <position 2> <element>
<position 3> ...

Then i:j selects the elements between position i and position j.

>>> "abcde"[2:4]
'cd'

But for negative strides this is all screwed up ;-)

>>> "abcde"[4:2:-1]
'ed'

They're not getting the elements between "positions" 2 and 4 then,
they're getting the elements between positions 3 and 5.  Why?
"Because that's how it works" - they have to switch from thinking
about positions to thinking about array indexing.

So I would prefer that the i:j in s[i:j:k] _always_ specify the
positions in play:

If i < 0:
    i += len(s)  # same as now
if j < 0:
    j += len(s)  # same as now
if i >= j:
    the slice is empty!  # this is different - the sign of k is irrelevant
else:
    the slice indices selected will be
        i, i + abs(k), i + 2*abs(k), ...
    up to but not including j
    if k is negative, this index sequence will be taken in reverse order

Then "abcde"[4:2:-1} would be "", while "abcde"[2:4:-1] would be "dc",
the reverse of "abcde"[2:4].  And s[0:len(s):-1] would be the same as
reversed(s).

So it's always a semi-open range, inclusive "at the left" and
exclusive "at the right".  But that's more a detail:  the _point_ is
to preserve the mental model of selecting the elements "between
position".  Of course I'd change range() similarly.


[Guido]
> For example,
>
> "abcde"[::-1] == "edcba"
>
> as you'd expect, but there is no number you can put as the second bound to
> get the same result:

Actually, any integer <= -1-len("abcde") = --6 works.  But, yes,
that's bizarre ;-)

>...
> Are we stuck with this forever?

Probably :-(

> ...

From oscar.j.benjamin at gmail.com  Sun Oct 27 20:05:54 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Sun, 27 Oct 2013 19:05:54 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <CAHVvXxQo0VNgY1vPeKYnxF7uxuLVFPZVxgmWN7ALL=DUybpCNA@mail.gmail.com>

On 27 October 2013 18:32, Guido van Rossum <guido at python.org> wrote:
>
> Hm, so the idea is that with a negative stride you you should use negative
> indices.

The same problem arises when using a negative indices and a positive
stride e.g.:

# Chop off last n elements
x_chopped = x[:-n]  # Fails when n == 0

The solution is to use a positive end condition:

x_chopped = x[:len(x)+1-n]


Oscar

From solipsis at pitrou.net  Sun Oct 27 20:11:24 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 27 Oct 2013 20:11:24 +0100
Subject: [Python-ideas] Where did we go wrong with negative stride?
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <20131027201124.6fb33557@fsol>

On Sun, 27 Oct 2013 11:32:29 -0700
Guido van Rossum <guido at python.org> wrote:
> 
> What are real use cases for negative strides?

Reverse a sequence that doesn't have a reverse() method.

Regards

Antoine.


> 




From python at mrabarnett.plus.com  Sun Oct 27 20:23:47 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 27 Oct 2013 19:23:47 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <526D6843.7010307@mrabarnett.plus.com>

On 27/10/2013 18:32, Guido van Rossum wrote:
 > On Sun, Oct 27, 2013 at 10:40 AM, MRAB <python at mrabarnett.plus.com> 
wrote:
 >> On 27/10/2013 17:04, Guido van Rossum wrote:
 >>> In the comments of
 >>> 
http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html
 >>> there were some complaints about the interpretation of the bounds
 >>> for negative strides, and I have to admin it feels wrong. Where did
 >>> we go wrong? For example,
 >>>
 >>> "abcde"[::-1] == "edcba"
 >>>
 >>> as you'd expect, but there is no number you can put as the second
 >>> bound to get the same result:
 >>>
 >>> "abcde"[:1:-1] == "edc"
 >>> "abcde"[:0:-1] == "edcb"
 >>>
 >>> but
 >>>
 >>> "abcde":-1:-1] == ""
 >>>
 >>> I'm guessing it all comes from the semantics I assigned to negative
 >>> stride for range() long ago, unthinkingly combined with the rules
 >>> for negative indices.
 >>
 >> For a positive stride, omitting the second bound is equivalent to
 >> length + 1:
 >>
 >> >>> "abcde"[:6:1]
 >> 'abcde'
 >>
 > Actually, it is equivalent to length; "abcde"[:5:1] == "abcde" too.
 >
 >> For a negative stride, omitting the second bound is equivalent to
 >> -(length + 1):
 >>
 >> >>> "abcde"[:-6:-1]
 >> 'edcba'
 >>
 > Hm, so the idea is that with a negative stride you you should use
 > negative indices. Then at least you get a somewhat useful invariant:
 >
 > if -len(a)-1 <= j <= i <= -1:
 >      len(a[i:j:-1]) == i-j
 >
 > which at least somewhat resembles the invariant for positive indexes
 > and stride:
 >
 > if 0 <= i <= j <= len(a):
 >      len(a[i:j:1]) == j-i
 >
 > For negative indices and stride, we now also get back this nice
 > theorem about adjacent slices:
 >
 > if -len(a)-1 <= i <= -1:
 >      a[:i:-1] + a[i::-1] == a[::-1]
 >
 > Using negative indices also restores the observation that a[i:j:k]
 > produces exactly the items corresponding to the values produced by
 > range(i, j, k).
 >
 > Still, the invariant for negative stride looks less attractive, and
 > the need to use negative indices confuses the matter. Also we end up
 > with -1 corresponding to the position at one end and -len(a)-1
 > corresponding to the position at the other end. The -1 offset feels
 > really wrong here.
 >
The difference might be because the left end is at offset 0 but the
right end is at offset -1.

 > I wonder if it would have been simpler if we had defined a[i:j:-1] as
 > the reverse of a[i:j]?
 >
'range' is defined as range(start, stop, stride).

Some examples from other languages:

BASIC:

     for i = start to stop step stride

Pascal:

     for i := start to stop do
     for i := start downto stop do

The order of start and stop is the same.

If you're slicing in reverse order, then the current order of the start
and stop positions seeks reasonable to me.

 > What are real use cases for negative strides?
 >

From ron3200 at gmail.com  Sun Oct 27 21:02:25 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Sun, 27 Oct 2013 15:02:25 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <l4jrg8$sfp$1@ger.gmane.org>



On 10/27/2013 01:32 PM, Guido van Rossum wrote:


> Still, the invariant for negative stride looks less attractive, and the
> need to use negative indices confuses the matter. Also we end up with -1
> corresponding to the position at one end and -len(a)-1 corresponding to the
> position at the other end. The -1 offset feels really wrong here.

And I've never liked the property where when counting down, and you pass 0, 
it wraps around.   (And the other case of counting up when passing 0.)


> I wonder if it would have been simpler if we had defined a[i:j:-1] as the
> reverse of a[i:j]?

I think that would have been simpler.


Could adding an __rgetitem__() improve things?

      seq[i:j:k]   -->  __getitem__(slice(i:j:k))

      seq-[i:j:k]   -->  __rgetitem__(slice(i:j:k))

Or the sign of K could determine weather __getitem__ or __rgetitem__ is used?


Ron


From bruce at leapyear.org  Sun Oct 27 21:38:31 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Sun, 27 Oct 2013 13:38:31 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526D6843.7010307@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <526D6843.7010307@mrabarnett.plus.com>
Message-ID: <CAGu0Antv-ik3g6PSRifMahxEY21Ow56hmpZ5eHg3cofxuqZN-w@mail.gmail.com>

On Sun, Oct 27, 2013 at 10:04 AM, Guido van Rossum <guido at python.org> wrote:

> "abcde"[::-1] == "edcba"
>
> as you'd expect, but there is no number you can put as the second bound to
> get the same result:
>
> "abcde"[:1:-1] == "edc"
> "abcde"[:0:-1] == "edcb"
>

This isn't really a negative stride issue. [x:y] is a half-open range ==
[x, y) in mathematical notation and therefore you need a value for y that
is one more. As others have pointed out there is a number you can put in
the second bound but it's not a valid index:

'abcde'[:-6:-1] == 'abcde'

But the same thing applies to positive strides:

'abcde'[::1] == 'abcde'[:5:1] == 'abcde'

And the only values you can replace 5 with that work are out of bounds as
well or the special value None. None represents both the left edge and the
right edge and if we deem that confusing slices could be modified to accept
-inf as representing the left edge and inf as representing the right edge.
Thus we'd have:

'abcde'[-inf:inf] == 'abcde'
'abcde'[inf:-inf] == ''


On Sun, Oct 27, 2013 at 12:23 PM, MRAB <python at mrabarnett.plus.com> wrote:

> The difference might be because the left end is at offset 0 but the
> right end is at offset -1.
>

If the left end was offset 1 and the right end was offset -1 then some of
the asymmetry goes away.

On Sun, Oct 27, 2013 at 1:02 PM, Ron Adam <ron3200 at gmail.com> wrote:

> And I've never liked the property where when counting down, and you pass
> 0, it wraps around.   (And the other case of counting up when passing 0.)
>
>
And then when you count down and it passes 0, you'd get an index error.

I'm *not* proposing we change how strings are indexed. I think that might
break a few programs. I'm just pointing out that you can't count from zero
in both directions and that introduces some weirdness.

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/56416566/attachment-0001.html>

From guido at python.org  Sun Oct 27 21:44:50 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Oct 2013 13:44:50 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
Message-ID: <CAP7+vJJA5pyLQoTO_nBYoLz2Ku58pTOUMkabnMrobuB5HS0RNw@mail.gmail.com>

On Sun, Oct 27, 2013 at 11:38 AM, Tim Peters <tim.peters at gmail.com> wrote:

> I may have a different slant on this.


Hardly -- I agree with everything you say here. :-)


> I've found that - by far - the
> most successful way to "teach slices" to newcomers is to invite them
> to view indices as being _between_ sequence elements.
>

Yup.

>
> <position 0> <element> <position 1> <element> <position 2> <element>
> <position 3> ...
>
> Then i:j selects the elements between position i and position j.
>
> >>> "abcde"[2:4]
> 'cd'
>
> But for negative strides this is all screwed up ;-)
>
> >>> "abcde"[4:2:-1]
> 'ed'
>

Right, that's the point of my post.

>
> They're not getting the elements between "positions" 2 and 4 then,
> they're getting the elements between positions 3 and 5.  Why?
> "Because that's how it works" - they have to switch from thinking
> about positions to thinking about array indexing.
>
> So I would prefer that the i:j in s[i:j:k] _always_ specify the
> positions in play:
>
> If i < 0:
>     i += len(s)  # same as now
> if j < 0:
>     j += len(s)  # same as now
> if i >= j:
>     the slice is empty!  # this is different - the sign of k is irrelevant
> else:
>     the slice indices selected will be
>         i, i + abs(k), i + 2*abs(k), ...
>     up to but not including j
>     if k is negative, this index sequence will be taken in reverse order
>
> Then "abcde"[4:2:-1] would be "", while "abcde"[2:4:-1] would be "dc",
> the reverse of "abcde"[2:4].  And s[0:len(s):-1] would be the same as
> reversed(s).
>

Except reversed() returns an iterator. But yes.

This would also make a[i:j:k] == a[i:j][::k].

If I could do it over I would do it this way.

>
> So it's always a semi-open range, inclusive "at the left" and
> exclusive "at the right".  But that's more a detail:  the _point_ is
> to preserve the mental model of selecting the elements "between
> position".  Of course I'd change range() similarly.
>

Which probably would cause more backward incompatibility bugs, since by now
many people have figured out that if you want [4, 3, 2, 1, 0] you have to
write range(4, -1, -1). :-(

>
>
> [Guido]
> > For example,
> >
> > "abcde"[::-1] == "edcba"
> >
> > as you'd expect, but there is no number you can put as the second bound
> to
> > get the same result:
>
> Actually, any integer <= -1-len("abcde") = --6 works.  But, yes,
> that's bizarre ;-)
>

Yup, MRAB pointed this out too.

>
> >...
> > Are we stuck with this forever?
>
> Probably :-(
>

Sadly, I agree. If we wanted to change this in Python 4, we'd probably have
to start deprecating range() with negative stride today to force people to
replace their uses of that with reversed(range(...)).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/2bd8b047/attachment.html>

From ndbecker2 at gmail.com  Sun Oct 27 21:56:35 2013
From: ndbecker2 at gmail.com (Neal Becker)
Date: Sun, 27 Oct 2013 16:56:35 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <l4julq$tnd$1@ger.gmane.org>

One thing I find unfortunate and does trip me up in practice, is that
if you want to do a whole sequence up to k from the end:

u[:-k]

hits a singularity if k=0

Sorry, not exactly related to negative stride


From storchaka at gmail.com  Sun Oct 27 22:19:42 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sun, 27 Oct 2013 23:19:42 +0200
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJA5pyLQoTO_nBYoLz2Ku58pTOUMkabnMrobuB5HS0RNw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <CAP7+vJJA5pyLQoTO_nBYoLz2Ku58pTOUMkabnMrobuB5HS0RNw@mail.gmail.com>
Message-ID: <l4k00r$bds$1@ger.gmane.org>

27.10.13 22:44, Guido van Rossum ???????(??):
> Sadly, I agree. If we wanted to change this in Python 4, we'd probably
> have to start deprecating range() with negative stride today to force
> people to replace their uses of that with reversed(range(...)).

Or with range(...)[::-1] ;-)



From rymg19 at gmail.com  Sun Oct 27 22:27:01 2013
From: rymg19 at gmail.com (Ryan)
Date: Sun, 27 Oct 2013 16:27:01 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>

Reversing strings and tuples easily. For a string, calling ''.join(reversed(mystr)) is just overkill. And tuples aren't quite as bad(tuple(reversed(mytuple))), but nonetheless odd.

If you were to take out negative strides, a reverse method should be.added to strings and tuples:

'abcde'.reverse() => 'edcba'


Guido van Rossum <guido at python.org> wrote:
>On Sun, Oct 27, 2013 at 10:40 AM, MRAB <python at mrabarnett.plus.com>
>wrote:
>
>> On 27/10/2013 17:04, Guido van Rossum wrote:
>>
>>> In the comments of
>>> http://python-history.**blogspot.com/2013/10/why-**
>>>
>python-uses-0-based-indexing.**html<http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html>
>>> there were some complaints about the interpretation of the bounds
>for
>>> negative strides, and I have to admin it feels wrong. Where did we
>go
>>> wrong? For example,
>>>
>>> "abcde"[::-1] == "edcba"
>>>
>>> as you'd expect, but there is no number you can put as the second
>bound
>>> to get the same result:
>>>
>>> "abcde"[:1:-1] == "edc"
>>> "abcde"[:0:-1] == "edcb"
>>>
>>> but
>>>
>>> "abcde":-1:-1] == ""
>>>
>>> I'm guessing it all comes from the semantics I assigned to negative
>>> stride for range() long ago, unthinkingly combined with the rules
>for
>>> negative indices.
>>>
>>>  For a positive stride, omitting the second bound is equivalent to
>> length + 1:
>>
>> >>> "abcde"[:6:1]
>> 'abcde'
>>
>
>Actually, it is equivalent to length; "abcde"[:5:1] == "abcde" too.
>
>
>> For a negative stride, omitting the second bound is equivalent to
>> -(length + 1):
>>
>> >>> "abcde"[:-6:-1]
>> 'edcba'
>>
>
>Hm, so the idea is that with a negative stride you you should use
>negative
>indices. Then at least you get a somewhat useful invariant:
>
>if -len(a)-1 <= j <= i <= -1:
>    len(a[i:j:-1]) == i-j
>
>which at least somewhat resembles the invariant for positive indexes
>and
>stride:
>
>if 0 <= i <= j <= len(a):
>    len(a[i:j:1]) == j-i
>
>For negative indices and stride, we now also get back this nice theorem
>about adjacent slices:
>
>if -len(a)-1 <= i <= -1:
>    a[:i:-1] + a[i::-1] == a[::-1]
>
>Using negative indices also restores the observation that a[i:j:k]
>produces
>exactly the items corresponding to the values produced by range(i, j,
>k).
>
>Still, the invariant for negative stride looks less attractive, and the
>need to use negative indices confuses the matter. Also we end up with
>-1
>corresponding to the position at one end and -len(a)-1 corresponding to
>the
>position at the other end. The -1 offset feels really wrong here.
>
>I wonder if it would have been simpler if we had defined a[i:j:-1] as
>the
>reverse of a[i:j]?
>
>What are real use cases for negative strides?
>
>-- 
>--Guido van Rossum (python.org/~guido)
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/83a99f92/attachment-0001.html>

From guido at python.org  Sun Oct 27 22:31:49 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Oct 2013 14:31:49 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
Message-ID: <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>

I wouldn't take out negative strides completely, but I might consider
deprecating lower and upper bounds other than None (== missing). So a[::-1]
would still work, and a[None:None:-1] would be a verbose way of spelling
the same, but a[-1:-6:-1] would be deprecated. Then we could triumphantly
(re-)introduce upper and lower bounds in Python 4, with the meaning
a[i:j:-1] == a[i:j][::-1].


On Sun, Oct 27, 2013 at 2:27 PM, Ryan <rymg19 at gmail.com> wrote:

> Reversing strings and tuples easily. For a string, calling
> ''.join(reversed(mystr)) is just overkill. And tuples aren't quite as
> bad(tuple(reversed(mytuple))), but nonetheless odd.
>
> If you were to take out negative strides, a reverse method should be.added
> to strings and tuples:
>
> 'abcde'.reverse() => 'edcba'
>
>
> Guido van Rossum <guido at python.org> wrote:
>
>> On Sun, Oct 27, 2013 at 10:40 AM, MRAB <python at mrabarnett.plus.com>wrote:
>>
>>> On 27/10/2013 17:04, Guido van Rossum wrote:
>>>
>>>> In the comments of
>>>> http://python-history.**blogspot.com/2013/10/why-**
>>>> python-uses-0-based-indexing.**html<http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html>
>>>> there were some complaints about the interpretation of the bounds for
>>>> negative strides, and I have to admin it feels wrong. Where did we go
>>>> wrong? For example,
>>>>
>>>> "abcde"[::-1] == "edcba"
>>>>
>>>> as you'd expect, but there is no number you can put as the second bound
>>>> to get the same result:
>>>>
>>>> "abcde"[:1:-1] == "edc"
>>>> "abcde"[:0:-1] == "edcb"
>>>>
>>>> but
>>>>
>>>> "abcde":-1:-1] == ""
>>>>
>>>> I'm guessing it all comes from the semantics I assigned to negative
>>>> stride for range() long ago, unthinkingly combined with the rules for
>>>> negative indices.
>>>>
>>>>  For a positive stride, omitting the second bound is equivalent to
>>> length + 1:
>>>
>>> >>> "abcde"[:6:1]
>>> 'abcde'
>>>
>>
>> Actually, it is equivalent to length; "abcde"[:5:1] == "abcde" too.
>>
>>
>>> For a negative stride, omitting the second bound is equivalent to
>>> -(length + 1):
>>>
>>> >>> "abcde"[:-6:-1]
>>> 'edcba'
>>>
>>
>> Hm, so the idea is that with a negative stride you you should use
>> negative indices. Then at least you get a somewhat useful invariant:
>>
>> if -len(a)-1 <= j <= i <= -1:
>>     len(a[i:j:-1]) == i-j
>>
>> which at least somewhat resembles the invariant for positive indexes and
>> stride:
>>
>> if 0 <= i <= j <= len(a):
>>     len(a[i:j:1]) == j-i
>>
>> For negative indices and stride, we now also get back this nice theorem
>> about adjacent slices:
>>
>> if -len(a)-1 <= i <= -1:
>>     a[:i:-1] + a[i::-1] == a[::-1]
>>
>> Using negative indices also restores the observation that a[i:j:k]
>> produces exactly the items corresponding to the values produced by range(i,
>> j, k).
>>
>> Still, the invariant for negative stride looks less attractive, and the
>> need to use negative indices confuses the matter. Also we end up with -1
>> corresponding to the position at one end and -len(a)-1 corresponding to the
>> position at the other end. The -1 offset feels really wrong here.
>>
>> I wonder if it would have been simpler if we had defined a[i:j:-1] as the
>> reverse of a[i:j]?
>>
>> What are real use cases for negative strides?
>>
>
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/1b3b5948/attachment.html>

From tim.peters at gmail.com  Sun Oct 27 22:56:34 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 27 Oct 2013 16:56:34 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
Message-ID: <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>

[Guido]
> I wouldn't take out negative strides completely, but I might consider
> deprecating lower and upper bounds other than None (== missing). So a[::-1]
> would still work, and a[None:None:-1] would be a verbose way of spelling the
> same,

Happy idea.

> but a[-1:-6:-1] would be deprecated.

Not sure I've _ever_ seen that in real life.  Where it comes up is on
places like stackoverflow, when somebody (mistakely) suggests using
seq[:-1:-1] to do a reverse slice.  Then it's pointed out that this
doesn't work like range(len(seq), -1, -1).  Then some wiseass with too
much obscure knowledge of implementation details ;-) points out that
seq[:-len(seq)-1:-1] does work (well, in CPython - I don't know
whether all implementations follow this quirk - although the docs
imply that they should).

> Then we could triumphantly (re-)introduce upper and lower bounds in Python 4, with
> the meaning a[i:j:-1] == a[i:j][::-1].

+1.

From solipsis at pitrou.net  Mon Oct 28 00:20:09 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 28 Oct 2013 00:20:09 +0100
Subject: [Python-ideas] Where did we go wrong with negative stride?
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
Message-ID: <20131028002009.6e88487b@fsol>

On Sun, 27 Oct 2013 16:56:34 -0500
Tim Peters <tim.peters at gmail.com> wrote:
> [Guido]
> > I wouldn't take out negative strides completely, but I might consider
> > deprecating lower and upper bounds other than None (== missing). So a[::-1]
> > would still work, and a[None:None:-1] would be a verbose way of spelling the
> > same,
> 
> Happy idea.
> 
> > but a[-1:-6:-1] would be deprecated.
> 
> Not sure I've _ever_ seen that in real life.

If it's never seen in real life, then there's probably no urge to
deprecate it and later replace it with a new thing, IMHO.

Also, I get the feeling it's a bit early to start talking about
Python 4 (is that supposed to happen at all?).

Regards

Antoine.



From rob.cliffe at btinternet.com  Mon Oct 28 00:34:23 2013
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Sun, 27 Oct 2013 23:34:23 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <526DA2FF.6070905@btinternet.com>


> I wonder if it would have been simpler if we had defined a[i:j:-1] as 
> the reverse of a[i:j]?
>
Maybe, I'm not venturing an opinion.
But if so:  What about negative strides other than -1?  Should a[i:j:-2] 
always be the reverse of a[i:j:2]?
My feeling is not, i.e.
"abcdefghij"[3:8:2] == "dfh"
but I feel that
"abcdefghij"[3:8:-2] under this suggestion should be what 
"abcdefghij"[8:3:-2] is now, i.e. "ige", not "hfd".
I.e. any non-empty start:stop:stride slice where 0 <= slice < length 
should always start with the character indexed by "start".
Rob Cliffe
> What are real use cases for negative strides?
>
> -- 
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
> No virus found in this message.
> Checked by AVG - www.avg.com <http://www.avg.com>
> Version: 2012.0.2242 / Virus Database: 3222/6285 - Release Date: 10/27/13
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/59c0cfa8/attachment-0001.html>

From rob.cliffe at btinternet.com  Mon Oct 28 01:03:22 2013
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Mon, 28 Oct 2013 00:03:22 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <526DA9CA.1040709@btinternet.com>

I'd like to throw in an idea.  Not sure how serious it is (prepared to 
be shot down in flames :-) ), just want to be sure that all 
possibilities are examined.
With positive strides, "start" is inclusive, "end" is exclusive".
Suppose that with negative strides, "start" were exclusive and "end" was 
inclusive.
(I.e. the "lower" bound was always inclusive and the "upper" bound was 
always exclusive.)
Then "abcde"[:2:-1] would be "edc", not "ed".
Then "abcde"[:1:-1] would be "edcb", not "edc".
Then "abcde"[:0:-1] would be "edcba".
I think this fits in with Tim Peters' concept of characters between 
positions, e.g. "abcde"[3:0:-1] would be "cba" (not "dcb" as at 
present), i.e. the characters between positions 0 and 3.
Rob Cliffe

On 27/10/2013 17:04, Guido van Rossum wrote:
> In the comments of 
> http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html 
> there were some complaints about the interpretation of the bounds for 
> negative strides, and I have to admin it feels wrong. Where did we go 
> wrong? For example,
>
> "abcde"[::-1] == "edcba"
>
> as you'd expect, but there is no number you can put as the second 
> bound to get the same result:
>
> "abcde"[:1:-1] == "edc"
> "abcde"[:0:-1] == "edcb"
>
> but
>
> "abcde":-1:-1] == ""
>
> I'm guessing it all comes from the semantics I assigned to negative 
> stride for range() long ago, unthinkingly combined with the rules for 
> negative indices.
>
> Are we stuck with this forever? If we want to fix this in Python 4 
> we'd have to start deprecating negative stride with non-empty 
> lower/upper bounds now. And we'd have to start deprecating negative 
> step for range() altogether, recommending reversed(range(lower, 
> upper)) instead.
>
> Thoughts? Is NumPy also affected?
>
> -- 
> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
> No virus found in this message.
> Checked by AVG - www.avg.com <http://www.avg.com>
> Version: 2012.0.2242 / Virus Database: 3222/6285 - Release Date: 10/27/13
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/6445622d/attachment.html>

From rob.cliffe at btinternet.com  Mon Oct 28 01:08:14 2013
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Mon, 28 Oct 2013 00:08:14 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526DA2FF.6070905@btinternet.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <526DA2FF.6070905@btinternet.com>
Message-ID: <526DAAEE.3030604@btinternet.com>


On 27/10/2013 23:34, Rob Cliffe wrote:
>
>> I wonder if it would have been simpler if we had defined a[i:j:-1] as 
>> the reverse of a[i:j]?
>>
> Maybe, I'm not venturing an opinion.
> But if so:  What about negative strides other than -1?  Should 
> a[i:j:-2] always be the reverse of a[i:j:2]?
> My feeling is not, i.e.
> "abcdefghij"[3:8:2] == "dfh"
> but I feel that
> "abcdefghij"[3:8:-2] under this suggestion should be what 
> "abcdefghij"[8:3:-2] is now, i.e. "ige", not "hfd".
> I.e. any non-empty start:stop:stride slice where 0 <= slice < length 
> should always start with the character indexed by "start".
Correction: I meant "where 0 <= start < length".  On reconsideration, 
please ignore this condition.
> Rob Cliffe
>> What are real use cases for negative strides?
>>
>> -- 
>> --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>)
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>> No virus found in this message.
>> Checked by AVG - www.avg.com <http://www.avg.com>
>> Version: 2012.0.2242 / Virus Database: 3222/6285 - Release Date: 10/27/13
>>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>
> No virus found in this message.
> Checked by AVG - www.avg.com <http://www.avg.com>
> Version: 2012.0.2242 / Virus Database: 3222/6285 - Release Date: 10/27/13
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/40e687ac/attachment.html>

From charleshixsn at earthlink.net  Mon Oct 28 01:42:06 2013
From: charleshixsn at earthlink.net (Charles Hixson)
Date: Sun, 27 Oct 2013 17:42:06 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <20131028002009.6e88487b@fsol>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
Message-ID: <526DB2DE.1080109@earthlink.net>

On 10/27/2013 04:20 PM, Antoine Pitrou wrote:
> On Sun, 27 Oct 2013 16:56:34 -0500
> Tim Peters <tim.peters at gmail.com> wrote:
>> [Guido]
>>> I wouldn't take out negative strides completely, but I might consider
>>> deprecating lower and upper bounds other than None (== missing). So a[::-1]
>>> would still work, and a[None:None:-1] would be a verbose way of spelling the
>>> same,
>> Happy idea.
>>
>>> but a[-1:-6:-1] would be deprecated.
>> Not sure I've _ever_ seen that in real life.
> If it's never seen in real life, then there's probably no urge to
> deprecate it and later replace it with a new thing, IMHO.
>
> Also, I get the feeling it's a bit early to start talking about
> Python 4 (is that supposed to happen at all?).
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
If it's a misfeature and it's not currently being used, then this is the 
perfect time to deprecate it.

-- 
Charles Hixson


From ron3200 at gmail.com  Mon Oct 28 01:50:17 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Sun, 27 Oct 2013 19:50:17 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526DA9CA.1040709@btinternet.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526DA9CA.1040709@btinternet.com>
Message-ID: <l4kcbv$m3$1@ger.gmane.org>



On 10/27/2013 07:03 PM, Rob Cliffe wrote:
> I'd like to throw in an idea.  Not sure how serious it is (prepared to be
> shot down in flames :-) ), just want to be sure that all possibilities are
> examined.

I think your safe.  ;-)

> With positive strides, "start" is inclusive, "end" is exclusive".
> Suppose that with negative strides, "start" were exclusive and "end" was
> inclusive.

With the proposed behaviour, (That you and Tim described), it will be 
easier to think in terms of [left:right:step].


And it fits with Guido's ...

    s[left:right:-step] == s[left:right][::-step]


One of the nice properties is that you can switch directions by just 
changing the step.  With the current slice's, you need to change the start 
and stop as well.  And also recalculate those if you want the same range.

BTW.  A negative step will only be the exact reversed sequence if the last 
item is also (j-1) for that step value.

Cheers,
   Ron

> (I.e. the "lower" bound was always inclusive and the "upper" bound was
> always exclusive.)
> Then "abcde"[:2:-1] would be "edc", not "ed".
> Then "abcde"[:1:-1] would be "edcb", not "edc".
> Then "abcde"[:0:-1] would be "edcba".
> I think this fits in with Tim Peters' concept of characters between
> positions, e.g. "abcde"[3:0:-1] would be "cba" (not "dcb" as at present),
> i.e. the characters between positions 0 and 3.
> Rob Cliffe


From zuo at chopin.edu.pl  Mon Oct 28 02:59:41 2013
From: zuo at chopin.edu.pl (Jan Kaliszewski)
Date: Mon, 28 Oct 2013 02:59:41 +0100
Subject: [Python-ideas]
 =?utf-8?q?Where_did_we_go_wrong_with_negative_stri?=
 =?utf-8?b?ZGU/?=
In-Reply-To: <CAP7+vJJA5pyLQoTO_nBYoLz2Ku58pTOUMkabnMrobuB5HS0RNw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <CAP7+vJJA5pyLQoTO_nBYoLz2Ku58pTOUMkabnMrobuB5HS0RNw@mail.gmail.com>
Message-ID: <7b702c80922799b5371d7c37003dae90@chopin.edu.pl>

2013-10-27, 21:44, Guido van Rossum wrote:

> On Sun, Oct 27, 2013 at 11:38 AM, Tim Peters <tim.peters at gmail.com> 
> wrote:
[...]
> If I could do it over I would do it this way.
>
>> So it's always a semi-open range, inclusive "at the left" and
>> exclusive "at the right".  But that's more a detail:  the _point_ is
>> to preserve the mental model of selecting the elements "between
>> position".  Of course I'd change range() similarly.
>
> Which probably would cause more backward incompatibility bugs,
> since by now many people have figured out that if you want
> [4, 3, 2, 1, 0] you have to write range(4, -1, -1). :-(

Maybe introduction of a new builtin and deprecation of range()
could be the remedy?  The new builtin, named e.g. "scope", could
even be combination of todays range + slice?

     >>> list(scope(0, 5, -1))        # Py 3.5+
     [4, 3, 2, 1, 0]
     >>>> 'abcdef'[scope(0, 5, -1)]   # Py 3.5+
     'edcba'
     >>>> 'abcdef'[0:5:-1]            # Py 4.0+
     'edcba'

It's just a loud thinking...

Cheers.
*j


From greg.ewing at canterbury.ac.nz  Mon Oct 28 00:45:15 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 28 Oct 2013 12:45:15 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4julq$tnd$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org>
Message-ID: <526DA58B.7080504@canterbury.ac.nz>

Neal Becker wrote:
> One thing I find unfortunate and does trip me up in practice, is that
> if you want to do a whole sequence up to k from the end:
> 
> u[:-k]
> 
> hits a singularity if k=0

I think the only way to really fix this cleanly is to have
a different *syntax* for counting from the end, rather than
trying to guess from the value of the argument. I can't
remember ever needing to write code that switches dynamically
between from-start and from-end indexing, or between
forward and reverse iteration direction -- and if I ever
did, I'd be happy to write two code branches.

-- 
Greg

From steve at pearwood.info  Mon Oct 28 03:20:47 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 28 Oct 2013 13:20:47 +1100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
Message-ID: <20131028022046.GU7989@ando>

On Sun, Oct 27, 2013 at 01:38:05PM -0500, Tim Peters wrote:
> I may have a different slant on this.  I've found that - by far - the
> most successful way to "teach slices" to newcomers is to invite them
> to view indices as being _between_ sequence elements.
> 
> <position 0> <element> <position 1> <element> <position 2> <element>
> <position 3> ...
> 
> Then i:j selects the elements between position i and position j.
> 
> >>> "abcde"[2:4]
> 'cd'


I really like that view point, but it has a major problem. As 
beautifully elegant as the "cut between positions" model is for 
stride=1, it doesn't extend to non-unit strides. You cannot think about 
non-contiguous slices in terms of a pair of cuts at position <start> and 
<end>. I believe that the cleanest way to understand non-contiguous 
slices with stride > 1 is to think of array indices. That same model 
works for the negative stride case too.

Further details below.

 
> But for negative strides this is all screwed up ;-)
> 
> >>> "abcde"[4:2:-1]
> 'ed'
> 
> They're not getting the elements between "positions" 2 and 4 then,
> they're getting the elements between positions 3 and 5.

As I suggested above, the "between elements" model doesn't work for 
non-unit strides. Consider this example:

py> s = "abcdefghi"
py> s[1:8:2]
'bdfh'

Here are the labelled between-element positions, best viewed with a 
monospaced font:

|a|b|c|d|e|f|g|h|i|
0 1 2 3 4 5 6 7 8 9


Since the slice here is non-contiguous, we don't have a single pair of 
cuts, but a series of them:

    s[1:8:2] => s[1:2:1] + s[3:4:1] + s[5:6:1] + s[7:8:1]
    => 'bdfh'

that is, start at the <start> position and make a thin (one element)
slice, advance forward by step and repeat until you reach the <end>
position. But that's just a longer way of writing this:

    s[1:8:2] => s[1] + s[3] + s[5] + s[7]
    => 'bdfh'

which I maintain is a cleaner way to think about non-unit step-sizes. 
It's certainly *shorter* to think of indexing rather than repeated thin 
slices, and it avoids the mistake (which I originally made) of thinking 
that each subslice has to be <stride> wide.

    # Not this!
    s[1:8:2] => s[1:3] + s[3:5] + s[5:7] + s[7:9]
    => 'bcdefghi'

    # Or this!
    s[1:8:2] => s[1:3] + s[4:6] + s[7:9]
    'bcefhi'


So I think that the cleanest way of thinking about *positive* non-unit 
strides is terms of array indexing. *Negative* non-unit strides, 
including -1, are no different. First, here's an example with negative 
positions:

py> s[-1:-8:-2]
'igec'

which is just like

    s[-1:-8:-2] => s[-1] + s[-3] + s[-5] + s[-7]
    => 'igec'

which is precisely the same as the positive step case: start at the
<start>, continue until the end, stepping by <step> each time. If you
insist on the "cut between positions" way of thinking, we can do that
too:

    s[-1:-8:-2] => s[-1:-2:-1] + s[-3:-4:-1] + s[-5:-6:-1] + s[-7:-8:-1]
    => 'igec'


10 9 8 7 6 5 4 3 2 1  # all negative positions
 |a|b|c|d|e|f|g|h|i|

The slice from -1 to -2 is "i", from -3 to -4 is "g", and so forth, 
exactly as for positive positions, except that negative positions are 
one-based instead of zero-based.

(If ints could distinguish -0 from 0, we could fix that.)


Here's an example like the one that Tim described as "screwed up", with 
positive start and end positions and -1 stride:

py> s[6:2:-1]
'gfed'


This is not "take the slice from [2:6] and reverse it", which would give 
'fedc'. That doesn't work, because slices are always closed at <start> 
and open at <end>, no matter which direction you go:

* [2:6:1] is closed at 2 (the start), open at 6 (the end)

* [6:2:-1] is closed at 6 (the start), open at 2 (the end)

If you are expecting differently, then (I believe) you are expecting 
that slices are closed on the *left* (lowest number), open on the 
*right* (highest number). But that's not what slices do. (Whether they 
*should* do it is another story.)

However, the array index viewpoint works just fine here too:

    s[6:2:-1] => s[6] + s[5] + s[4] + s[3]

Start at index 6, step down by -1 each time, stop at 2.

As elegant as "cut between elements" is for the common case where the 
stride is 1, it doesn't work for stride != 1. I'm not concerned about 
the model breaking down for negative strides when it breaks down for 
positive non-unit strides too :-)



>  Why?
> "Because that's how it works" - they have to switch from thinking
> about positions to thinking about array indexing.

But you already have to do this as soon as you allow non-unit strides! 
You even do it yourself, below:

    "the slice indices selected will be..."

so you can't get away from indexes even if you try.


 
> So I would prefer that the i:j in s[i:j:k] _always_ specify the
> positions in play:
>
> If i < 0:
>     i += len(s)  # same as now
> if j < 0:
>     j += len(s)  # same as now
> if i >= j:
>     the slice is empty!  # this is different - the sign of k is irrelevant
> else:
>     the slice indices selected will be
>         i, i + abs(k), i + 2*abs(k), ...
>     up to but not including j
>     if k is negative, this index sequence will be taken in reverse order


In other words, you want negative strides to just mean "reverse the 
slice". Perhaps that would have been a good design. But we already have 
two good idioms for reversing slices:

reversed(seq[start:stop:step])
seq[start:stop:step][::-1]



> Then "abcde"[4:2:-1} would be "", while "abcde"[2:4:-1] would be "dc",
> the reverse of "abcde"[2:4].  And s[0:len(s):-1] would be the same as
> reversed(s).
> 
> So it's always a semi-open range, inclusive "at the left" and
> exclusive "at the right".  But that's more a detail:

It isn't a mere detail, it is the core of the change: changing from 
inclusive at the start to inclusive on the left, which are not the same 
thing. This is a significant semantic change.

(Of course it is. You don't like the current semantics, since they trick 
you into off-by-one errors for negative strides. If the change was 
insignificant, it wouldn't help.)

One consequence of this proposed change is that the <start> parameter is 
no longer always the first element returned. Sometimes <start> will be 
last rather than first. That disturbs me.


> the _point_ is
> to preserve the mental model of selecting the elements "between
> position".  Of course I'd change range() similarly.

Currently, this is how you use range to count down from 10 to 1:

    range(10, 0, -1)  # 0 is excluded

To me, this makes perfect sense: I want to start counting at 10, so the 
first argument I give is 10 no matter whether I'm counting up or 
counting down.

With your suggestion, we'd have:

    range(1, 11, -1)  # 11 is excluded

So here I have to put one more than the number I want to start with as 
the *second* argument, and the last number first, just because I'm 
counting down. I don't consider that an improvement. Certainly not an 
improvement worth breaking backwards compatibility for.



> > Are we stuck with this forever?
> 
> Probably :-(

Assuming we want to change -- and I'm not convinced we should -- there's 
always Python 4000, or if necessary 

from __future__ import negative_slices_reverse



-- 
Steven

From bruce at leapyear.org  Mon Oct 28 04:00:23 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Sun, 27 Oct 2013 20:00:23 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526DA58B.7080504@canterbury.ac.nz>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
Message-ID: <CAGu0AnuQtaTn4XFtykGv=Ptr3SiVTMFjvgnmNRcjT4jOyvYR5w@mail.gmail.com>

On Sun, Oct 27, 2013 at 4:45 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> I think the only way to really fix this cleanly is to have
> a different *syntax* for counting from the end, rather than
> trying to guess from the value of the argument.
>

I was thinking the exact same thing today. Suppose the slice syntax was
changed to:

[start:stop:stride:reverse]

where 0 or None or False for reverse leaves the slice in order while any
True value reverses it. This would replace

'abcde'[2:5] == 'bcd'
'abcde'[2:5::True] == 'dcb'
'abcde'[::-2] == 'abcde'[::2:True] == 'eca'
'abcdef'[::-2] == 'fdb'
'abcdef'[::2:True] == 'eca'

As the last three examples, illustrate, sometimes the reverse is equivalent
to a negative stride and sometimes it's not.

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131027/e284267c/attachment.html>

From tim.peters at gmail.com  Mon Oct 28 04:05:11 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Sun, 27 Oct 2013 22:05:11 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <20131028022046.GU7989@ando>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
Message-ID: <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>

]Steven D'Aprano <steve at pearwood.info>]
> ...
> I really like that view point, but it has a major problem. As
> beautifully elegant as the "cut between positions" model is for
> stride=1, it doesn't extend to non-unit strides. You cannot think about
> non-contiguous slices in terms of a pair of cuts at position <start> and
> <end>. I believe that the cleanest way to understand non-contiguous
> slices with stride > 1 is to think of array indices. That same model
> works for the negative stride case too.

"Cut between positions" in my view has nothing to do with the stride.
It's used to determine the portion of the sequence _to which_ the
stride applies.  From that portion, we take the first element, the
first + |stride|'th element, the first + |stride|*2'th element, and so
on.  Finally, if stride < 0, we reverse that sequence.  That's the
proposal.  It's simple and uniform.

> Further details below.

Not really needed - I already know exactly how slicing works in Python today ;-)

> ...
> But that's just a longer way of writing this:
>
>     s[1:8:2] => s[1] + s[3] + s[5] + s[7]
>     => 'bdfh'
>
> which I maintain is a cleaner way to think about non-unit step-sizes.

As above, so do I.  start:stop just delineates the subsequence to
which the stride applies.

> It's certainly *shorter* to think of indexing rather than repeated thin
> slices,

And I don't have "repeated thin slices" in mind at all.

> ....
> If you are expecting differently, then (I believe) you are expecting
> that slices are closed on the *left* (lowest number), open on the
> *right* (highest number). But that's not what slices do. (Whether they
> *should* do it is another story.)

Guido started this thread precisely to ask what they should do.  We
already know what they _do_ do ;-)


>>
>> So I would prefer that the i:j in s[i:j:k] _always_ specify the
>> positions in play:
>>
>>
>> If i < 0:
>>     i += len(s)  # same as now
>> if j < 0:
>>     j += len(s)  # same as now
>> if i >= j:
>>     the slice is empty!  # this is different - the sign of k is irrelevant
>> else:
>>     the slice indices selected will be
>>         i, i + abs(k), i + 2*abs(k), ...
>>     up to but not including j
>>     if k is negative, this index sequence will be taken in reverse order

> In other words, you want negative strides to just mean "reverse the
> slice"

If they're given a ;meaning at all.

>. Perhaps that would have been a good design. But we already have
> two good idioms for reversing slices:
>
> reversed(seq[start:stop:step])

I'm often annoyed by `reversed()`, since it returns an iterator and
doesn't preserve the type of its argument.

>>> reversed('abc')
<reversed object at 0x00C722D0>

Oops!  OK, let's turn it back into a string:

>>> str(_)
'<reversed object at 0x00C722D0>'

LOL!  It's enough to make a guy give up ;-)  Yes, I know ''.join(_)
would have worked.

> seq[start:stop:step][::-1]

That's an improvement over seq[start:stop:-step]?  Na.

>> ...
>> So it's always a semi-open range, inclusive "at the left" and
>> exclusive "at the right".  But that's more a detail:

> It isn't a mere detail,

Not "mere", "more".

> it is the core of the change: changing from inclusive at the start
> to inclusive on the left,

No, the proposal says a[i:j:anything] is _empty_ if (after normalizing
negative i and/or negative j) i >= j.  "The start" and "the left" are
always the same thing under the proposal (where "the start" applies to
the input subsequence - which may be "the end" of the output
subsequence).

> which are not the same thing. This is a significant semantic change.

Yes, it is.

> (Of course it is. You don't like the current semantics, since they trick
> you into off-by-one errors for negative strides.

No, I dislike the current semantics for the same reason it appears
Guido dislikes them:  they're hard to teach, and hard for people to
get right in practice.

> If the change was insignificant, it wouldn't help.)

Bingo ;-)

> One consequence of this proposed change is that the <start> parameter is
> no longer always the first element returned. Sometimes <start> will be
> last rather than first. That disturbs me.

?  <start> is always the first element of the subsequence to which the
stride is applied.  If the stride is negative, then yes, of course the
first element of the source subsequence would be the last element of
the returned subsequence.

>> ...
>> Of course I'd change range() similarly.

> Currently, this is how you use range to count down from 10 to 1:
>
>     range(10, 0, -1)  # 0 is excluded
>
> To me, this makes perfect sense: I want to start counting at 10, so the
> first argument I give is 10 no matter whether I'm counting up or
> counting down.
>
> With your suggestion, we'd have:
>
>     range(1, 11, -1)  # 11 is excluded
>
> So here I have to put one more than the number I want to start with as
> the *second* argument, and the last number first, just because I'm
> counting down. I don't consider that an improvement. Certainly not an
> improvement worth breaking backwards compatibility for.

I agree this one is annoying.  Not _more_ annoying than the current
range(10, -1, -1) to count down from 10 through 0 - which I've seen
people get wrong more often than I can recall - but _as_ annoying.
reversed(range(1, 11)) would work for your case, and
reversed(range(11)) for mine.

From g.brandl at gmx.net  Mon Oct 28 06:33:33 2013
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 28 Oct 2013 06:33:33 +0100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
Message-ID: <l4kssv$omn$1@ger.gmane.org>

Am 27.10.2013 22:56, schrieb Tim Peters:
> [Guido]
>> I wouldn't take out negative strides completely, but I might consider
>> deprecating lower and upper bounds other than None (== missing). So a[::-1]
>> would still work, and a[None:None:-1] would be a verbose way of spelling the
>> same,
> 
> Happy idea.
> 
>> but a[-1:-6:-1] would be deprecated.
> 
> Not sure I've _ever_ seen that in real life.  [...]

Before such a change is considered, I'd like to see the numpy community
consulted; numpy users probably use slicing more than anyone else.  At
least it should be possible to integrate the new slicing without breaking
too many other numpy behavior.

As a datapoint, Matlab negative-stride slicing is similar to Python, but
it is less confusing (IMO) since the slices are inclusive on both ends.
Let "a" be a range from 1 to 10:

> a(2:7)        % slicing is done with parens; 1-based indexing
[2 3 4 5 6 7]
> a(2:2:7)      % the stride is the middle value
[2 4 6]
> a(2:-1:7)     % same as in Python
[]
> a(7:-1:2)     % a(i:-1:j) == reverse of a(j:1:i) due to end-inclusive
[7 6 5 4 3 2]
> a(7:-2:2)     % but obviously not so for non-unity stride
[7 5 3]

cheers,
Georg




From rosuav at gmail.com  Mon Oct 28 07:33:17 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 28 Oct 2013 17:33:17 +1100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526DA58B.7080504@canterbury.ac.nz>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
Message-ID: <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>

On Mon, Oct 28, 2013 at 10:45 AM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Neal Becker wrote:
>>
>> One thing I find unfortunate and does trip me up in practice, is that
>> if you want to do a whole sequence up to k from the end:
>>
>> u[:-k]
>>
>> hits a singularity if k=0
>
>
> I think the only way to really fix this cleanly is to have
> a different *syntax* for counting from the end, rather than
> trying to guess from the value of the argument. I can't
> remember ever needing to write code that switches dynamically
> between from-start and from-end indexing, or between
> forward and reverse iteration direction -- and if I ever
> did, I'd be happy to write two code branches.

If it'd help, you could borrow Pike's syntax for counting-from-end
ranges: <2 means 2 from the end, <0 means 0 from the end. So
"abcdefg"[:<2] would be "abcde", and "abcdefg"[:<0] would be
"abcdefg". Currently that's invalid syntax (putting a binary operator
with no preceding operand), so it'd be safe and unambiguous.

ChrisA

From steve at pearwood.info  Mon Oct 28 13:09:21 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Mon, 28 Oct 2013 23:09:21 +1100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
Message-ID: <20131028120921.GW7989@ando>

On Sun, Oct 27, 2013 at 10:05:11PM -0500, Tim Peters wrote:
> ]Steven D'Aprano <steve at pearwood.info>]
> > ...
> > I really like that view point, but it has a major problem. As
> > beautifully elegant as the "cut between positions" model is for
> > stride=1, it doesn't extend to non-unit strides. You cannot think about
> > non-contiguous slices in terms of a pair of cuts at position <start> and
> > <end>. I believe that the cleanest way to understand non-contiguous
> > slices with stride > 1 is to think of array indices. That same model
> > works for the negative stride case too.
> 
> "Cut between positions" in my view has nothing to do with the stride.
> It's used to determine the portion of the sequence _to which_ the
> stride applies.  From that portion, we take the first element, the
> first + |stride|'th element, the first + |stride|*2'th element, and so
> on.  Finally, if stride < 0, we reverse that sequence.  That's the
> proposal.  It's simple and uniform.

That's quite a nice model, and I can't really say I dislike it. But it 
fails to address your stated complaint about the current behaviour, 
namely that it forces the user to think about array indexing instead of 
cutting between elements. No matter what we do, we still have to think 
about array indexes.


> > But that's just a longer way of writing this:
> >
> >     s[1:8:2] => s[1] + s[3] + s[5] + s[7]
> >     => 'bdfh'
> >
> > which I maintain is a cleaner way to think about non-unit step-sizes.
> 
> As above, so do I.  start:stop just delineates the subsequence to
> which the stride applies.

That's the part that was unclear to me from your earlier post.


[...]
> > In other words, you want negative strides to just mean "reverse the
> > slice"
> 
> If they're given a ;meaning at all.

Is this a serious proposal to prohibit negative slices?


[...]
> > One consequence of this proposed change is that the <start> parameter is
> > no longer always the first element returned. Sometimes <start> will be
> > last rather than first. That disturbs me.
> 
> ?  <start> is always the first element of the subsequence to which the
> stride is applied.  If the stride is negative, then yes, of course the
> first element of the source subsequence would be the last element of
> the returned subsequence.

Right. And that's exactly what I dislike about the proposal.


I have a couple of range-like or slice-like functions which take 
start/stop/stride arguments. I think I'll modify them to have your 
suggested semantics and see how well they work in practice. But in the 
meantime, here are my tentative votes:

-1 on prohibiting negative strides altogether. They're useful.

-1 on deprecating negative strides, for temporary removal, followed by 
reintroduce them again in the future with different semantics. If I'm 
going to be forced to change my code to deal with this, I want to only 
do it once, not twice.

+0 on introducing a __future__ directive to change the semantics of 
negative strides (presumably in Python 3.5, since 3.4 feature-freeze is 
so close), with the expectation that it will probably become the default 
in 3.6 or 3.7.

+0.5 on the status quo.



-- 
Steven

From robert.kern at gmail.com  Mon Oct 28 13:12:28 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Mon, 28 Oct 2013 12:12:28 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
Message-ID: <l4lkb3$kai$1@ger.gmane.org>

On 2013-10-27 18:32, Guido van Rossum wrote:

> What are real use cases for negative strides?

The main use case is numpy, I would wager. Slicing a numpy array returns a view 
on the original array; negative-stride views work just as well as positive 
strides in numpy's memory model. Most other sequences copy when sliced, so 
reversed() tends to work fine for them.

In my experience, the most common use of negative strides is a simple reversal 
of the whole array by leaving out the bounds:

   a[::-stride]

I think I have done the following once (to clip the first `i` and last `j` 
elements and reverse, cleanly handling reasonable values of `i` and `j`):

   a[-j+len(a)-1:-i-len(a)-1:-stride]

But I think I tend to do this more often:

   a[i:-j][::-stride]

(Though really, this needs to start with `a[i:len(a)-j]`, to handle `j==0`, as 
others have pointed out. I run into that problem more commonly.)

Implementation issues aside, the intention is just easier to read and reason 
about with the last option. It doesn't take much experience to get a good 
feeling for what each of those simple operations do and how they would compose 
together. Combining them into one operation, no matter what syntax you pick, is 
just going to be harder to learn.

I don't think the language needs to change. The latter uses are pretty rare in 
my experience, and the last option is a good one. The amount of documentation 
you would need for any new syntax would be about the same as just pointing to 
the last option.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


From ncoghlan at gmail.com  Mon Oct 28 14:00:18 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 28 Oct 2013 23:00:18 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
Message-ID: <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>

On 28 Oct 2013 16:34, "Chris Angelico" <rosuav at gmail.com> wrote:
>
> On Mon, Oct 28, 2013 at 10:45 AM, Greg Ewing
> <greg.ewing at canterbury.ac.nz> wrote:
> > Neal Becker wrote:
> >>
> >> One thing I find unfortunate and does trip me up in practice, is that
> >> if you want to do a whole sequence up to k from the end:
> >>
> >> u[:-k]
> >>
> >> hits a singularity if k=0
> >
> >
> > I think the only way to really fix this cleanly is to have
> > a different *syntax* for counting from the end, rather than
> > trying to guess from the value of the argument. I can't
> > remember ever needing to write code that switches dynamically
> > between from-start and from-end indexing, or between
> > forward and reverse iteration direction -- and if I ever
> > did, I'd be happy to write two code branches.
>
> If it'd help, you could borrow Pike's syntax for counting-from-end
> ranges: <2 means 2 from the end, <0 means 0 from the end. So
> "abcdefg"[:<2] would be "abcde", and "abcdefg"[:<0] would be
> "abcdefg". Currently that's invalid syntax (putting a binary operator
> with no preceding operand), so it'd be safe and unambiguous.

In this vein, I started wondering if it might be worth trying to come up
with a syntax to control whether the ends of a slice were open or closed.

Since mismatched paren types would be too confusing, perhaps abusing some
binary operators as Chris suggested could help:

"[<i:" closed start of slice (default)
"[i<:" open start of slice
":>j]" open end of slice (default)
":j>]" closed end of slice
":>j:k]" open end of slice with step
":j>:k]" closed end of slice with step

Default slice: "[<0:-1>:1]"
Reversed slice: "[<-1:0>:-1]"

This makes it possible to cleanly include the final element as a closed
range, rather than needing to add or subtract 1 (and avoids the zero trap
when indexing from the end).

Cheers,
Nick.

>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/f70cbd3b/attachment.html>

From brett at python.org  Mon Oct 28 14:57:43 2013
From: brett at python.org (Brett Cannon)
Date: Mon, 28 Oct 2013 09:57:43 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <20131028002009.6e88487b@fsol>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
Message-ID: <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>

On Sun, Oct 27, 2013 at 7:20 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sun, 27 Oct 2013 16:56:34 -0500
> Tim Peters <tim.peters at gmail.com> wrote:
> > [Guido]
> > > I wouldn't take out negative strides completely, but I might consider
> > > deprecating lower and upper bounds other than None (== missing). So
> a[::-1]
> > > would still work, and a[None:None:-1] would be a verbose way of
> spelling the
> > > same,
> >
> > Happy idea.
> >
> > > but a[-1:-6:-1] would be deprecated.
> >
> > Not sure I've _ever_ seen that in real life.
>
> If it's never seen in real life, then there's probably no urge to
> deprecate it and later replace it with a new thing, IMHO.
>

I think there is to minimize even the chance someone has done something
like this since it's so wonky. We all know someone has somewhere in code
out in the world.

+1 on doing a deprecation in 3.4.


>
> Also, I get the feeling it's a bit early to start talking about
> Python 4 (is that supposed to happen at all?).
>

Well, I'm sure there will be one after 3.9, but probably more along the
lines of "everything previously deprecated has now been removed" rather
than the 2->3 shift.

-Brett


>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/58f402a1/attachment-0001.html>

From storchaka at gmail.com  Mon Oct 28 15:11:13 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 28 Oct 2013 16:11:13 +0200
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
Message-ID: <l4lr9e$9gj$1@ger.gmane.org>

28.10.13 08:33, Chris Angelico ???????(??):
> If it'd help, you could borrow Pike's syntax for counting-from-end
> ranges: <2 means 2 from the end, <0 means 0 from the end. So
> "abcdefg"[:<2] would be "abcde", and "abcdefg"[:<0] would be
> "abcdefg". Currently that's invalid syntax (putting a binary operator
> with no preceding operand), so it'd be safe and unambiguous.

There are parallels with alignment. C-style formatting uses positive 
width for right-aligned formatting and negative width for right-aligned 
formatting. New-style formatting uses positive '>' for right-aligned 
formatting and '<' for left-aligned formatting.

So '>' should indicate counting from begin (as positive index now) and 
'<' should indicate counting from end (as negative index now). And '^' 
should indicate counting from center.



From rosuav at gmail.com  Mon Oct 28 15:14:02 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 29 Oct 2013 01:14:02 +1100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4lr9e$9gj$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <l4lr9e$9gj$1@ger.gmane.org>
Message-ID: <CAPTjJmqMvPFAtP_Hs-Kn+JX__tRpLGNkQQ2Bh_+hkm9JKUtacQ@mail.gmail.com>

On Tue, Oct 29, 2013 at 1:11 AM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> So '>' should indicate counting from begin (as positive index now) and '<'
> should indicate counting from end (as negative index now). And '^' should
> indicate counting from center.

Sounds good to me! Anywhere else we can index from? ||||||||| to
indicate the Dewey Decimal System (index using floats rather than
ints)?

ChrisA

From solipsis at pitrou.net  Mon Oct 28 15:42:05 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 28 Oct 2013 15:42:05 +0100
Subject: [Python-ideas] Where did we go wrong with negative stride?
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
Message-ID: <20131028154205.1019bb53@pitrou.net>

Le Mon, 28 Oct 2013 09:57:43 -0400,
Brett Cannon <brett at python.org> a ?crit :
> > >
> > > > but a[-1:-6:-1] would be deprecated.
> > >
> > > Not sure I've _ever_ seen that in real life.
> >
> > If it's never seen in real life, then there's probably no urge to
> > deprecate it and later replace it with a new thing, IMHO.
> >
> 
> I think there is to minimize even the chance someone has done
> something like this since it's so wonky. We all know someone has
> somewhere in code out in the world.

But that code probably works anyway. It's slightly unintuitive to write
but it works afterwards. I think there's a tension here between
discouraging new uses of the feature, and breaking existing uses.

Regards

Antoine.



From oscar.j.benjamin at gmail.com  Mon Oct 28 15:49:11 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 28 Oct 2013 14:49:11 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
Message-ID: <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>

On 28 October 2013 13:57, Brett Cannon <brett at python.org> wrote:
>
> On Sun, Oct 27, 2013 at 7:20 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>
>> On Sun, 27 Oct 2013 16:56:34 -0500
>> Tim Peters <tim.peters at gmail.com> wrote:
>> > [Guido]
>> > > I wouldn't take out negative strides completely, but I might consider
>> > > deprecating lower and upper bounds other than None (== missing). So
>> > > a[::-1]
>> > > would still work, and a[None:None:-1] would be a verbose way of
>> > > spelling the
>> > > same,
>> >
>> > Happy idea.
>> >
>> > > but a[-1:-6:-1] would be deprecated.
>> >
>> > Not sure I've _ever_ seen that in real life.
>>
>> If it's never seen in real life, then there's probably no urge to
>> deprecate it and later replace it with a new thing, IMHO.
>
> I think there is to minimize even the chance someone has done something like
> this since it's so wonky. We all know someone has somewhere in code out in
> the world.

Are you saying that any code that depends on the current behaviour is
"wonky" and therefore doesn't properly deserve continued support?

I know I have private (numpy-based) code that depends on this
behaviour. There's nothing wonky about me choosing the limits that I
currently need to in order to get the correct slice.

I think that the numpy mailing lists should be consulted before any
decisions are made. As Antoine says: if you've never noticed this
behaviour before then it obviously doesn't matter to you that much so
why the rush to deprecate it?

> +1 on doing a deprecation in 3.4.

-1 on any deprecation without a clear plan for a better syntax. Simply
changing the semantics of the current syntax would bring in who knows
how many off-by-one errors for virtually no benefit.

Personally I think that negative slicing and indexing are both bad
ideas. I've had many bugs from the wraparound behaviour of both and
I've never had a situation where the wraparound was useful in itself
(if it worked using modulo arithmetic then there would at least be
some uses - but it does not).

Matlab has a much better way of handling this with the end keyword:

% chop last n elements off:
a_chopped = a(1:end-n)

This works even when n is zero because it's not conflating integer
arithmetic with indexing relative to the end.


Oscar

From brett at python.org  Mon Oct 28 16:04:40 2013
From: brett at python.org (Brett Cannon)
Date: Mon, 28 Oct 2013 11:04:40 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
Message-ID: <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>

On Mon, Oct 28, 2013 at 10:49 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com
> wrote:

> On 28 October 2013 13:57, Brett Cannon <brett at python.org> wrote:
> >
> > On Sun, Oct 27, 2013 at 7:20 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> >>
> >> On Sun, 27 Oct 2013 16:56:34 -0500
> >> Tim Peters <tim.peters at gmail.com> wrote:
> >> > [Guido]
> >> > > I wouldn't take out negative strides completely, but I might
> consider
> >> > > deprecating lower and upper bounds other than None (== missing). So
> >> > > a[::-1]
> >> > > would still work, and a[None:None:-1] would be a verbose way of
> >> > > spelling the
> >> > > same,
> >> >
> >> > Happy idea.
> >> >
> >> > > but a[-1:-6:-1] would be deprecated.
> >> >
> >> > Not sure I've _ever_ seen that in real life.
> >>
> >> If it's never seen in real life, then there's probably no urge to
> >> deprecate it and later replace it with a new thing, IMHO.
> >
> > I think there is to minimize even the chance someone has done something
> like
> > this since it's so wonky. We all know someone has somewhere in code out
> in
> > the world.
>
> Are you saying that any code that depends on the current behaviour is
> "wonky" and therefore doesn't properly deserve continued support?
>

I'm saying the current semantics of how the strides work are wonky and we
should fix it so valid code has to jump through less hoops to get the
semantics they would expect/want.


>
> I know I have private (numpy-based) code that depends on this
> behaviour. There's nothing wonky about me choosing the limits that I
> currently need to in order to get the correct slice.
>

Sure, all I'm saying is that you probably had to mentally work through more
to get the right semantics you wanted compared to if this was changed the
way Guido and Tim are suggesting.


>
> I think that the numpy mailing lists should be consulted before any
> decisions are made. As Antoine says: if you've never noticed this
> behaviour before then it obviously doesn't matter to you that much so
> why the rush to deprecate it?
>

I'm not saying not to talk to them, but I also don't think we should
necessarily not change it because no one uses it either. If it's wide
spread then sure, we just live with it. It's always a balancing act of
fixing for future code vs. pain of current code. I'm just saying we
shouldn't dismiss changing this out of hand because you are so far the only
person who has relied on this.

As for the rush, it's because 3.4b1 is approaching and if this slips to 3.5
that's 1.5 years of deprecation time lost for something that doesn't have a
syntactic break to help you discover the change in semantics.


>
> > +1 on doing a deprecation in 3.4.
>
> -1 on any deprecation without a clear plan for a better syntax. Simply
> changing the semantics of the current syntax would bring in who knows
> how many off-by-one errors for virtually no benefit.
>

The deprecation would be in there from now until Python 4 so it wouldn't be
sudden (remember that we are on a roughly 18 month release cycle, so if
this went into 3.4 that's 7.5 years until this changes in Python 4). And
there's already a future-compatible way to change your code to get the same
results in the end that just require more explicit steps/code.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/26a49943/attachment-0001.html>

From joshua at landau.ws  Mon Oct 28 18:10:14 2013
From: joshua at landau.ws (Joshua Landau)
Date: Mon, 28 Oct 2013 17:10:14 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
Message-ID: <CAN1F8qW0mqAvOA_i=tX7gnzUjGO7KxtMefWAxZ2y8vdEO=k6uA@mail.gmail.com>

On 28 October 2013 06:33, Chris Angelico <rosuav at gmail.com> wrote:
> On Mon, Oct 28, 2013 at 10:45 AM, Greg Ewing
> <greg.ewing at canterbury.ac.nz> wrote:
>> Neal Becker wrote:
>>>
>>> One thing I find unfortunate and does trip me up in practice, is that
>>> if you want to do a whole sequence up to k from the end:
>>>
>>> u[:-k]
>>>
>>> hits a singularity if k=0
>>
>>
>> I think the only way to really fix this cleanly is to have
>> a different *syntax* for counting from the end, rather than
>> trying to guess from the value of the argument. I can't
>> remember ever needing to write code that switches dynamically
>> between from-start and from-end indexing, or between
>> forward and reverse iteration direction -- and if I ever
>> did, I'd be happy to write two code branches.
>
> If it'd help, you could borrow Pike's syntax for counting-from-end
> ranges: <2 means 2 from the end, <0 means 0 from the end. So
> "abcdefg"[:<2] would be "abcde", and "abcdefg"[:<0] would be
> "abcdefg". Currently that's invalid syntax (putting a binary operator
> with no preceding operand), so it'd be safe and unambiguous.

Agreed in entirety. I'm not sure that this is the best method, but
it's way better than the status quo. "<" or ">" with negative strides
should raise an error and should be the recommended method until
negatives are phazed out.

*BUT* there is another solution. It's harder to formulate but I think
it's more deeply intuitive.

The simple problem is this mapping:

list:  [x,  x,  x,  x,  x]
index:  0   1   2   3   4
       -5  -4  -3  -2  -1

Which is just odd, 'cause those sequences are off by one. But you can
stop thinking about them as *negative* indexes and start thinking
about NOT'd indexes:

       ~4  ~3  ~2  ~1  ~0

which you have to say looks OK.

Then you design slices around that.

To take the first N elements:

#>>>
"0123456789"[:4]
#>>> '0123'

To take the last three:

#>>>
"0123456789"[~4:] # Currently returns '56789'
#>>> '6789'

For slicing with a mixture:

"0123456789"[1:~1] # Currently returns '1234567'
#>>> '12345678'

"0123456789"[~5:5] # Currently returns '4'
#>>> ''

So the basic idea is that, for X:Y, X is closed iff positive and Y is
open iff positive. If you go over this in your head, it's quite
simple.

For ~6:7;
START: Count 6 from the back, looking at the *signposts* between
items, not the items.
END: Count 3 forward, looking at the *signposts* between items, not the items.

Thus you get, for "0123456789":

"|0|1|2|3|4|5|6|7|8|9|"
             S     E

    and thus, obviously, you get "456".

    And look, it matches our current negative form!

"0123456789"[-6:7]
#>>> '456'

Woah! *BUT* it works without silly coherence problems if you have -N,
because ~0 is -1!

????? said the problem was with negative indexes, not strides, so it's
good that this solves it. So, how does this help with negative
*strides*? Well, Guido's

#>>>
"abcde"[::-1]
#>>> 'edcba'

From joshua at landau.ws  Mon Oct 28 18:15:19 2013
From: joshua at landau.ws (Joshua Landau)
Date: Mon, 28 Oct 2013 17:15:19 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAN1F8qW0mqAvOA_i=tX7gnzUjGO7KxtMefWAxZ2y8vdEO=k6uA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CAN1F8qW0mqAvOA_i=tX7gnzUjGO7KxtMefWAxZ2y8vdEO=k6uA@mail.gmail.com>
Message-ID: <CAN1F8qV1GHwwwwC6Z+3D+nJFHxTEVpn=eaEuMrF9Go0+bGb0BA@mail.gmail.com>

Apologies for the terrible post above; here it is in full and not
riddled with as many editing errors:

On 28 October 2013 06:33, Chris Angelico <rosuav at gmail.com> wrote:
> On Mon, Oct 28, 2013 at 10:45 AM, Greg Ewing
> <greg.ewing at canterbury.ac.nz> wrote:
>> Neal Becker wrote:
>>>
>>> One thing I find unfortunate and does trip me up in practice, is that
>>> if you want to do a whole sequence up to k from the end:
>>>
>>> u[:-k]
>>>
>>> hits a singularity if k=0
>>
>>
>> I think the only way to really fix this cleanly is to have
>> a different *syntax* for counting from the end, rather than
>> trying to guess from the value of the argument. I can't
>> remember ever needing to write code that switches dynamically
>> between from-start and from-end indexing, or between
>> forward and reverse iteration direction -- and if I ever
>> did, I'd be happy to write two code branches.
>
> If it'd help, you could borrow Pike's syntax for counting-from-end
> ranges: <2 means 2 from the end, <0 means 0 from the end. So
> "abcdefg"[:<2] would be "abcde", and "abcdefg"[:<0] would be
> "abcdefg". Currently that's invalid syntax (putting a binary operator
> with no preceding operand), so it'd be safe and unambiguous.

Agreed in entirety. I'm not sure that this is the best method, but
it's way better than the status quo. "<" or ">" with negative strides
should raise an error and should be the recommended method until
negatives are phazed out.

*BUT* there is another solution. It's harder to formulate but I think
it's more deeply intuitive.

The simple problem is this mapping:

list:  [x,  x,  x,  x,  x]
index:  0   1   2   3   4
       -5  -4  -3  -2  -1

Which is just odd. But you can stop thinking about them as *negative*
indexes and start thinking about NOT'd indexes:

       ~4  ~3  ~2  ~1  ~0

which you have to say looks OK.

Then you design slices around that.

To take the first four elements:

#>>>
"0123456789"[:4]
#>>> '0123'

To take the last four:

#>>>
"0123456789"[~4:] # Currently returns '56789'
#>>> '6789'

For slicing with a mixture:

"0123456789"[1:~1] # Currently returns '1234567'
#>>> '12345678'

"0123456789"[~5:5] # Currently returns '4'
#>>> ''

So the basic idea is that, for X:Y, X is closed iff positive and Y is
open iff positive. If you go over this in your head, it's quite
simple.

For ~6:7;
START: Count 6 from the back, looking at the *signposts* between
items, not the items.
END: Count 7 forward, looking at the *signposts* between items, not the items.

Thus you get, for "0123456789":

"|0|1|2|3|4|5|6|7|8|9|"
             S     E

    and thus, obviously, you get "456".

    And look, it matches our current negative form!

"0123456789"[-6:7]
#>>> '456'

Woah! *BUT* it works without silly coherence problems if you have -N,
because ~0 is -1!

????? said the problem was with negative indexes, not strides, so it's
good that this solves it. So, how does this help with negative
*strides*? Well, Guido's

#>>>
"abcde"[::-1]
#>>> 'edcba'

would be hopefully solved by

"abcde"[:0:-1] # Currently returns 'edcb'
#>>> 'edcba'

because you can just *inverse* the "X is closed iff positive and Y is
open iff positive" rule.

Does this pan out nicely?

Really, we want

"abcde"[2:4][::-1] == "abcde"[4:2:-1]

which is exactly what happens.

I'm thinking I'll make a string subclass and try and "intuit" the
answers, but I think this is the right choice.

Anyone with me, even partially?

From ethan at stoneleaf.us  Mon Oct 28 17:58:11 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 28 Oct 2013 09:58:11 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <526E97A3.2050608@stoneleaf.us>

On 10/27/2013 10:04 AM, Guido van Rossum wrote:
>
> Thoughts? Is NumPy also affected?

It seems to me that the issue is not with negative strides, but with negative indexing:

   - off by one errors because the end starts at -1 and not -0
   - calculation errors because the end is -1 and not -0

--
~Ethan~

From barry at python.org  Mon Oct 28 19:03:22 2013
From: barry at python.org (Barry Warsaw)
Date: Mon, 28 Oct 2013 14:03:22 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
Message-ID: <20131028140322.59e344c0@anarchist>

On Oct 28, 2013, at 11:00 PM, Nick Coghlan wrote:

>In this vein, I started wondering if it might be worth trying to come up
>with a syntax to control whether the ends of a slice were open or closed.
>
>Since mismatched paren types would be too confusing, perhaps abusing some
>binary operators as Chris suggested could help:
>
>"[<i:" closed start of slice (default)
>"[i<:" open start of slice
>":>j]" open end of slice (default)
>":j>]" closed end of slice
>":>j:k]" open end of slice with step
>":j>:k]" closed end of slice with step
>
>Default slice: "[<0:-1>:1]"
>Reversed slice: "[<-1:0>:-1]"
>
>This makes it possible to cleanly include the final element as a closed
>range, rather than needing to add or subtract 1 (and avoids the zero trap
>when indexing from the end).

Sorry, I'm -1 here.  I think it's already difficult enough to teach, read, and
comprehend what's going on with slice notation when there are strides
(especially negative ones).  I don't think this syntax will make it easier to
understand at a glance (or even upon some deeper inspection).

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/5d91d65e/attachment.sig>

From tim.peters at gmail.com  Mon Oct 28 19:49:12 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 28 Oct 2013 13:49:12 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <20131028120921.GW7989@ando>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
Message-ID: <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>

...

[Tim]
>> "Cut between positions" in my view has nothing to do with the stride.
>> It's used to determine the portion of the sequence _to which_ the
>> stride applies.  From that portion, we take the first element, the
>> first + |stride|'th element, the first + |stride|*2'th element, and so
>> on.  Finally, if stride < 0, we reverse that sequence.  That's the
>> proposal.  It's simple and uniform.

[Steven D'Aprano]
> That's quite a nice model, and I can't really say I dislike it. But it
> fails to address your stated complaint about the current behaviour,
> namely that it forces the user to think about array indexing instead of
> cutting between elements. No matter what we do, we still have to think
> about array indexes.

As a Python implementer, _I_ do, but not as a user.  As Guido noted,
under the proposal we have:

    s[i:j:k] == s[i:j][::k]

That should (finally?) make it crystal clear that applying the stride
has nothing directly to do with the indices of the selected elements
in the original sequence (`s`).  In the RHS's "[::k]", all knowledge
of s's indices has been lost.  If _you_ want to think of it in terms
of indices into `s`, that's fine, and the implementation obviously
needs to index into `s` to produce the result, but a user can just
think "OK!  start at the start and take every k'th element
thereafter".  As in, e.g., the sieve of Eratosthenes:  "search right
until you find the first integer not yet crossed out.  Call it p.
Then cross out every p'th integer following.  Repeat."  There's no
reference made to "indices" there, and none needed in the mental
model.  An implementation using an array or list will need to know the
index of p, but that's it.  If that index is `i`, then, e.g.,

    array_or_list[i+p: :p] = [False] * len(range(i+p, len(array_or_list), p))

faithfully translates the rest.



[...]
>>> In other words, you want negative strides to just mean "reverse the
>>> slice"

>> If they're given a ;meaning at all.

> Is this a serious proposal to prohibit negative slices?

No.  I do wonder whether negative strides have been "an attractive
nuisance" overall, but guess they'd be more "attractive" than
"nuisance" under the proposal.


[...]
>>> One consequence of this proposed change is that the <start> parameter is
>>> no longer always the first element returned. Sometimes <start> will be
>>> last rather than first. That disturbs me.

>> ?  <start> is always the first element of the subsequence to which the
>> stride is applied.  If the stride is negative, then yes, of course the
>> first element of the source subsequence would be the last element of
>> the returned subsequence.

> Right. And that's exactly what I dislike about the proposal.

OK.  Then use a positive stride ;-)

> I have a couple of range-like or slice-like functions which take
> start/stop/stride arguments. I think I'll modify them to have your
> suggested semantics and see how well they work in practice.

Good idea!  I made only one real use of non-trivial negative strides,
that I can recall, in the last year:

    # nw index is n-1+i-j
    #     in row i, that's n-1+i thru n-1+i-(n-1) = i
    #     the leftmost is irrelevant, so n-1+i-1 = n-2+i thru i
    # ne index is i+j
    #     in row i, that's i thru i+n-1
    #     the rightmost is irrelevant, so i thru i+n-2
    assert nw[n-1+i] == 0
    assert ne[i+n-1] == 0
    codes = [0] * (3*n - 2)
    codes[0::3] = up
    codes[1::3] = ne[i: n-1+i]
    codes[2::3] = nw[n-2+i: i-1: -1] # here

That was excruciating to get right.  Curiously, the required `ne` and
'nw` index sets turned out to be exactly the same, but _end up_
looking different in that code because of the extra fiddling needed to
deal with that the required 'nw` index _sequence_ (as opposed to the
index set) is the reverse of the required `ne` index sequence.  Under
the proposal, the last line would be written:

    codes[2::3] = nw[i: n-1+i: -1]

instead, making it obvious at a glance that the the 'nw` index
sequence is the reverse of the `ne` index sequence.

And that's all the empirical proof I need - LOL ;-)

> ...

From bruce at leapyear.org  Mon Oct 28 20:26:43 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 28 Oct 2013 12:26:43 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
Message-ID: <CAGu0AnuB7FhAhboynCwZycv0UAfdeJT_uYjCGFPQO+CSpRzHcQ@mail.gmail.com>

On Mon, Oct 28, 2013 at 11:49 AM, Tim Peters <tim.peters at gmail.com> wrote:

> As a Python implementer, _I_ do, but not as a user.  As Guido noted,
> under the proposal we have:
>
>     s[i:j:k] == s[i:j][::k]
>
> That should (finally?) make it crystal clear that applying the stride
> has nothing directly to do with the indices of the selected elements
> in the original sequence (`s`).


It's definitely not "finally clear" as it's a change in semantics. What
about negative strides other than -1? Which of these is expected?

(A) '012345678'[::-2] == '86420'
    '0123456789'[::-2] == '97531'

or:

(B) '012345678'[::-2] == '86420'
    '0123456789'[::-2] == '86420'

If (A) I can get the (B) result by writing [::2][::-1] but if (B), I'm
forced to write:

s[0 if len(s) % 2 == 1 else 1::2]

or something equally ugly.

--- Bruce

(Also, (A) is the current behavior and switching to (B) would break any
existing use of strides < -1.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/cd87f881/attachment.html>

From oscar.j.benjamin at gmail.com  Mon Oct 28 21:20:27 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 28 Oct 2013 20:20:27 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
Message-ID: <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>

On 28 October 2013 15:04, Brett Cannon <brett at python.org> wrote:
>
> On Mon, Oct 28, 2013 at 10:49 AM, Oscar Benjamin
> <oscar.j.benjamin at gmail.com> wrote:
>>
>> On 28 October 2013 13:57, Brett Cannon <brett at python.org> wrote:
>>
>> I think that the numpy mailing lists should be consulted before any
>> decisions are made. As Antoine says: if you've never noticed this
>> behaviour before then it obviously doesn't matter to you that much so
>> why the rush to deprecate it?
>
> I'm not saying not to talk to them, but I also don't think we should
> necessarily not change it because no one uses it either. If it's wide spread
> then sure, we just live with it. It's always a balancing act of fixing for
> future code vs. pain of current code. I'm just saying we shouldn't dismiss
> changing this out of hand because you are so far the only person who has
> relied on this.

Anyone who has used negative strides and non-default start/stop is
relying on it. It has been a core language feature since long before I
started using Python.

Also I'm not the only person to point out that a more common problem
is with wraparound when doing something like a[:n]. That is a more
significant problem with slicing and I don't understand why the
emphasis here is all on the negative strides rather then negative
indexing. Using negative indices to mean "from the end" is a mistake
and it leads to things like this:

>>> a = 'abcdefg'
>>> for n in reversed(range(7)):
...     print(n, a[:-n])
...
6 a
5 ab
4 abc
3 abcd
2 abcde
1 abcdef
0

You can do something like a[:-n or None] to correctly handle zero but
that still does the wrong thing when n is negative.

Also why do you get an error when your index goes off one end of the
array but not when it goes off the other?

>>> a[len(a)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> a[-1]
'g'

I have never been in a situation where I was writing code and didn't
know whether I wanted to slice/index from the end or the beginning at
coding time. I would much rather that a[-1] be an error and have an
explicit syntax for indexing from the end.

I have never found this wraparound to be useful and I think that if
there is a proposal to change slicing in a backward incompatible way
then it should be to something that is significantly better by solving
these real problems. I have often had bugs or been forced to write
awkward code because of these. The negative slicing indices may be a
bit harder to reason about but it has never actually caused me any
problems.

Something like the matlab/pike syntax would fix these problems as well
as making it possible to use the same indices for negative stride
slices. That would be worth a deprecation process in my opinion. The
specific suggestion so far does not have enough of an advantage to
justify breaking anyone's code IMO.

> As for the rush, it's because 3.4b1 is approaching and if this slips to 3.5
> that's 1.5 years of deprecation time lost for something that doesn't have a
> syntactic break to help you discover the change in semantics.
>>
>> > +1 on doing a deprecation in 3.4.
>>
>> -1 on any deprecation without a clear plan for a better syntax. Simply
>> changing the semantics of the current syntax would bring in who knows
>> how many off-by-one errors for virtually no benefit.
>
> The deprecation would be in there from now until Python 4 so it wouldn't be
> sudden (remember that we are on a roughly 18 month release cycle, so if this
> went into 3.4 that's 7.5 years until this changes in Python 4). And there's
> already a future-compatible way to change your code to get the same results
> in the end that just require more explicit steps/code.

This argument would be more persuasive if you said: "the new better
syntax that solves many of the slicing problems will be introduced
*now*, and the old style/syntax will be deprecated later".


Oscar

From tim.peters at gmail.com  Mon Oct 28 21:56:59 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 28 Oct 2013 15:56:59 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAGu0AnuB7FhAhboynCwZycv0UAfdeJT_uYjCGFPQO+CSpRzHcQ@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <CAGu0AnuB7FhAhboynCwZycv0UAfdeJT_uYjCGFPQO+CSpRzHcQ@mail.gmail.com>
Message-ID: <CAExdVNkP8SBVdKvdN0R1WwMzPypfUWmkibOCCY89GtSy6oiV3A@mail.gmail.com>

[Tim]
>> As Guido noted, under the proposal we have:
>>
>>     s[i:j:k] == s[i:j][::k]
>>
>> That should (finally?) make it crystal clear that applying the stride
>> has nothing directly to do with the indices of the selected elements
>> in the original sequence (`s`).

[Bruce Leban]
> It's definitely not "finally clear" as it's a change in semantics.

Of course it is.

> What about negative strides other than -1?

All strides start the same way, by (conceptually) selecting s[i:j]
first.  Then the stride is applied to that contiguous slice, starting
with the first element and taking every abs(k) element thereafter.
Finally, if k is negative, that sequence is reversed.  k=1, k=-1, k=2,
k=-2, ..., all the same.

> Which of these is expected?
>
> (A) '012345678'[::-2] == '86420'
>     '0123456789'[::-2] == '97531'
> or:
>
> (B) '012345678'[::-2] == '86420'
>     '0123456789'[::-2] == '86420'

B.


> If (A) I can get the (B) result by writing [::2][::-1] but if (B), I'm
> forced to write:
>
> s[0 if len(s) % 2 == 1 else 1::2]
>
> or something equally ugly.

You're assuming something here you haven't said.  The easiest way to
get '97531' is to type '97531' ;-)  If your agenda is the general
"return every 2nd element starting with the last element", then the
obvious way to do that under the proposal is to write [::-1][::2].
You can't seriously claim that's harder than the "[::2][::-1]" you
presented as the obvious way to "get the (B) result" given (A).


> (Also, (A) is the current behavior and switching to (B) would break any
> existing use of strides < -1.)

Did you notice that Guido titled this thread "Where did we go wrong
with negative stride?".;-)

BTW, do you have use cases for negative strides other than -1?  Not
examples, use cases.  There haven't been any in this thread yet.

From tjreedy at udel.edu  Mon Oct 28 22:06:09 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 28 Oct 2013 17:06:09 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
Message-ID: <l4mjjo$sb$1@ger.gmane.org>

On 10/28/2013 2:49 PM, Tim Peters wrote:

> under the proposal we have:
>
>      s[i:j:k] == s[i:j][::k]

I think where we went wrong with strides was to have the sign of the 
stride affect the interpretation of i and j (in analogy with ranges). 
The change is to correct this by decoupling steps 1. and 2. below. The 
result is that i and j would mean left and right ends of the slice, 
rather than 'start' and 'stop' ends of the slice.

I presume s[::-k], k a count, would continue to mean 'reverse and take 
every kth' (ie, take every kth item from the right instead of the left):

s[i:j:-k] == s[i:j:-1][::k]

(And one would continue to write the alternative, 'take every kth and 
reverse' explicitly as s[i:j:k][::-1].)

Whether selecting or replacing, this proposal makes the rule for 
indicating an arithmetic subsequence to be:

1. indicate the contiguous slice to work on with left and right 
endpoints (left end i, right end j, i <= j after normalization with same 
rules as at present);

2. indicate the starting end and direction of movement, left to right 
(default) or right to left (negate k);

3. indicate whether to pick every member of the slice (k=1, default) or 
every kth (k > 1), starting with the first item at the indicated end (if 
there is one) and moving in the appropriate direction.


---
My quick take on slicing versus indexing. The slice positions of a 
single item are i:(i+1). The average is i.5. Some languages (0-based, 
like Python) round this down to i, others (1-based) round up to i+1.

String 'indexing' is really unit slicing: s[i] == s[i:i+1].  Any 
sequence can sliced.  True indexing requires that the members of the 
sequence either be Python objects (tuples, lists) or usefully convert to 
such (bytes, other arrays, which convert integer members to Python ints).

-- 
Terry Jan Reedy


From joshua at landau.ws  Mon Oct 28 22:06:25 2013
From: joshua at landau.ws (Joshua Landau)
Date: Mon, 28 Oct 2013 21:06:25 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAN1F8qV1GHwwwwC6Z+3D+nJFHxTEVpn=eaEuMrF9Go0+bGb0BA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CAN1F8qW0mqAvOA_i=tX7gnzUjGO7KxtMefWAxZ2y8vdEO=k6uA@mail.gmail.com>
 <CAN1F8qV1GHwwwwC6Z+3D+nJFHxTEVpn=eaEuMrF9Go0+bGb0BA@mail.gmail.com>
Message-ID: <CAN1F8qXEAUTA=fJ_ApDNT0o6taneiAk5X1-rFyk-n7sFKP2KZg@mail.gmail.com>

On 28 October 2013 17:15, Joshua Landau <joshua at landau.ws> wrote:
> <suggested using "~" instead of "-">

# Here's a quick mock-up of my idea.

class NotSliced(list):
    def __getitem__(self, itm):
        if isinstance(itm, slice):
            start, stop, step = itm.start, itm.stop, itm.step

            if start is None: start = 0
            if stop  is None: stop  = ~0
            if step  is None: step  = 1

            if start < 0: start += len(self) + 1
            if stop  < 0: stop  += len(self) + 1

            if step > 0:
                return NotSliced(super().__getitem__(slice(start, stop, step)))

            else:
                return NotSliced(super().__getitem__(slice(stop,
start))[::step])

        else:
            return super().__getitem__(itm)

ns = NotSliced([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

[ns[i] for i in range(10)]
#>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# See why this is a much better mapping?
[list(ns)[-i] for i in range(10)]
[ns[~i] for i in range(10)]
#>>> [0, 9, 8, 7, 6, 5, 4, 3, 2, 1]
#>>> [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

ns[~6:7]
list(ns)[-6:7]
#>>> [4, 5, 6]
#>>> [4, 5, 6]

ns[~4:~0][::-1]
ns[~0:~4:-1]
#>>> []
#>>> [9, 8, 7, 6]

ns[~4:~0][::-2]
ns[~0:~4:-2]
#>>> []
#>>> [9, 7]


# Here's something that makes me really feel this is natural.

ns[2:~2]
#>>> [2, 3, 4, 5, 6, 7]

ns[1:~1]
#>>> [1, 2, 3, 4, 5, 6, 7, 8]

ns[0:~0]
#>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# VERSUS (!!!)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9][2:-2]
#>>> [2, 3, 4, 5, 6, 7]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9][1:-1]
#>>> [1, 2, 3, 4, 5, 6, 7, 8]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9][0:-0]
#>>> []



# And some more...

ns[~6:6:+1]
ns[6:~6:-1]
#>>> [4, 5]
#>>> [5, 4]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9][-6:6:+1]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9][6:-6:-1]
#>>> [4, 5]
#>>> [6, 5]



# Surely you agree this is much more intuitive.


# Another example from the thread

a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
for n in reversed(range(7)):
    print(n, a[:-n])
#>>> 6 [0, 1, 2, 3]
#>>> 5 [0, 1, 2, 3, 4]
#>>> 4 [0, 1, 2, 3, 4, 5]
#>>> 3 [0, 1, 2, 3, 4, 5, 6]
#>>> 2 [0, 1, 2, 3, 4, 5, 6, 7]
#>>> 1 [0, 1, 2, 3, 4, 5, 6, 7, 8]
#>>> 0 []

for n in reversed(range(7)):
    print(n, ns[:~n])
#>>> 6 [0, 1, 2, 3]
#>>> 5 [0, 1, 2, 3, 4]
#>>> 4 [0, 1, 2, 3, 4, 5]
#>>> 3 [0, 1, 2, 3, 4, 5, 6]
#>>> 2 [0, 1, 2, 3, 4, 5, 6, 7]
#>>> 1 [0, 1, 2, 3, 4, 5, 6, 7, 8]
#>>> 0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

From joshua at landau.ws  Mon Oct 28 22:19:32 2013
From: joshua at landau.ws (Joshua Landau)
Date: Mon, 28 Oct 2013 21:19:32 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAN1F8qXEAUTA=fJ_ApDNT0o6taneiAk5X1-rFyk-n7sFKP2KZg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CAN1F8qW0mqAvOA_i=tX7gnzUjGO7KxtMefWAxZ2y8vdEO=k6uA@mail.gmail.com>
 <CAN1F8qV1GHwwwwC6Z+3D+nJFHxTEVpn=eaEuMrF9Go0+bGb0BA@mail.gmail.com>
 <CAN1F8qXEAUTA=fJ_ApDNT0o6taneiAk5X1-rFyk-n7sFKP2KZg@mail.gmail.com>
Message-ID: <CAN1F8qUA7Cu0Po=sfx_DHvE0Nw+gLcmNR10CMVGo7ts=QOYvDw@mail.gmail.com>

On 28 October 2013 21:06, Joshua Landau <joshua at landau.ws> wrote:
> On 28 October 2013 17:15, Joshua Landau <joshua at landau.ws> wrote:
>> <suggested using "~" instead of "-">
>
> # Here's a quick mock-up of my idea.
>
> class NotSliced(list):
...

# And a minor bugfix and correction:

class NotSliced(list):
    def __getitem__(self, itm):
        if isinstance(itm, slice):
            start, stop, step = itm.start, itm.stop, itm.step

            if step  is None: step  = 1
            if start is None: start = ~0 if step < 0 else  0
            if stop  is None: stop  =  0 if step < 0 else ~0

            if start < 0: start += len(self) + 1
            if stop  < 0: stop  += len(self) + 1

            if step > 0:
                return NotSliced(super().__getitem__(slice(start, stop, step)))

            else:
                return NotSliced(super().__getitem__(slice(stop,
start))[::step])

        else:
            return super().__getitem__(itm)

ns = NotSliced([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# This came out wrong last time. I should be more careful...
ns[~4:~0][::-1]
ns[~0:~4:-1]
#>>> [9, 8, 7, 6]
#>>> [9, 8, 7, 6]

ns[~4:~0][::-2]
ns[~0:~4:-2]
#>>> [9, 7]
#>>> [9, 7]

From greg.ewing at canterbury.ac.nz  Mon Oct 28 22:29:35 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Oct 2013 10:29:35 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <20131028140322.59e344c0@anarchist>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <20131028140322.59e344c0@anarchist>
Message-ID: <526ED73F.5070506@canterbury.ac.nz>

Barry Warsaw wrote:
> On Oct 28, 2013, at 11:00 PM, Nick Coghlan wrote:
> 
>>"[<i:" closed start of slice (default)
>>"[i<:" open start of slice
>>":>j]" open end of slice (default)
>>":j>]" closed end of slice
>>":>j:k]" open end of slice with step
>>":j>:k]" closed end of slice with step
>
> Sorry, I'm -1 here.  ...  I don't think this syntax will make it easier to
> understand at a glance (or even upon some deeper inspection).

I agree that this looks far too cluttered.

Joshua's ~ idea shows that we don't need separate syntax for
"from the start" and "from the end", just something that means
"from the other end". Also we want a character that doesn't
look too obtrusive and doesn't already have a meaning the
way we're using it. How about:

    a[^i:j]
    a[i:^j]
    a[^i:^j]

-- 
Greg

From tim.peters at gmail.com  Mon Oct 28 22:31:24 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 28 Oct 2013 16:31:24 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4mjjo$sb$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
Message-ID: <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>

>> under the proposal we have:
>>
>>      s[i:j:k] == s[i:j][::k]

[Terry Reedy]
> I think where we went wrong with strides was to have the sign of the stride
> affect the interpretation of i and j (in analogy with ranges).

I think that's right :-)

> The change is
> to correct this by decoupling steps 1. and 2. below. The result is that i
> and j would mean left and right ends of the slice, rather than 'start' and
> 'stop' ends of the slice.

Right.

> I presume s[::-k], k a count, would continue to mean 'reverse and take every
> kth' (ie, take every kth item from the right instead of the left):

That's not what the proposal said, but it's a reasonable alternative.
Maybe that's better because it's "more compatible" with what happens
today.  The hangup for me is that I have no use cases for negative
strides other than -1, so have no real basis for picking one over the
other.  OTOH, since I _don't_ have any use cases, I don't care either
what happens then ;-)

> s[i:j:-k] == s[i:j:-1][::k]

That's a nice form of symmetry too.  Sold ;-)

> (And one would continue to write the alternative, 'take every kth and
> reverse' explicitly as s[i:j:k][::-1].)
>
> Whether selecting or replacing, this proposal makes the rule for indicating
> an arithmetic subsequence to be:
>
> 1. indicate the contiguous slice to work on with left and right endpoints
> (left end i, right end j, i <= j after normalization with same rules as at
> present);
>
> 2. indicate the starting end and direction of movement, left to right
> (default) or right to left (negate k);
>
> 3. indicate whether to pick every member of the slice (k=1, default) or
> every kth (k > 1), starting with the first item at the indicated end (if
> there is one) and moving in the appropriate direction.

Yup!

> ---
> My quick take on slicing versus indexing. The slice positions of a single
> item are i:(i+1). The average is i.5. Some languages (0-based, like Python)
> round this down to i, others (1-based) round up to i+1.

I think they all round down.  For example, Icon uses 1-based indexing,
and supports slicing.  "abc"[1] is "a" in Icon, and so is "abc"[1:2].
0 isn't a valid index in Icon, can be used in slicing, where it means
"the position just after the last element".

> String 'indexing' is really unit slicing: s[i] == s[i:i+1].  Any sequence
> can sliced.  True indexing requires that the members of the sequence either
> be Python objects (tuples, lists) or usefully convert to such (bytes, other
> arrays, which convert integer members to Python ints).

From ethan at stoneleaf.us  Mon Oct 28 21:42:21 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 28 Oct 2013 13:42:21 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
Message-ID: <526ECC2D.9020901@stoneleaf.us>

On 10/28/2013 01:20 PM, Oscar Benjamin wrote:
> On 28 October 2013 15:04, Brett Cannon <brett at python.org> wrote:
>>
>> I'm not saying not to talk to them, but I also don't think we should
>> necessarily not change it because no one uses it either. If it's wide spread
>> then sure, we just live with it. It's always a balancing act of fixing for
>> future code vs. pain of current code. I'm just saying we shouldn't dismiss
>> changing this out of hand because you are so far the only person who has
>> relied on this.
>
> Anyone who has used negative strides and non-default start/stop is
> relying on it. It has been a core language feature since long before I
> started using Python.

I have code that relies on this.


> Also I'm not the only person to point out that a more common problem
> is with wraparound when doing something like a[:n]. That is a more
> significant problem with slicing and I don't understand why the
> emphasis here is all on the negative strides rather then negative
> indexing. Using negative indices to mean "from the end" is a mistake
> and it leads to things like this:
>
>--> a = 'abcdefg'
>--> for n in reversed(range(7)):
> ...     print(n, a[:-n])
> ...
> 6 a
> 5 ab
> 4 abc
> 3 abcd
> 2 abcde
> 1 abcdef
> 0

I've been bitten by this more than once.  :(


> Something like the matlab/pike syntax would fix these problems as well
> as making it possible to use the same indices for negative stride
> slices. That would be worth a deprecation process in my opinion. The
> specific suggestion so far does not have enough of an advantage to
> justify breaking anyone's code IMO.

+1


>> As for the rush, it's because 3.4b1 is approaching and if this slips to 3.5
>> that's 1.5 years of deprecation time lost for something that doesn't have a
>> syntactic break to help you discover the change in semantics.

The difference between 7.5 years and 6.0 years of deprecation time don't seem that significant to me.  Besides, can't we 
add the deprecation to the docs whenever?

--
~Ethan~

From tjreedy at udel.edu  Mon Oct 28 22:41:27 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 28 Oct 2013 17:41:27 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNkP8SBVdKvdN0R1WwMzPypfUWmkibOCCY89GtSy6oiV3A@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <CAGu0AnuB7FhAhboynCwZycv0UAfdeJT_uYjCGFPQO+CSpRzHcQ@mail.gmail.com>
 <CAExdVNkP8SBVdKvdN0R1WwMzPypfUWmkibOCCY89GtSy6oiV3A@mail.gmail.com>
Message-ID: <l4mllu$ori$1@ger.gmane.org>

On 10/28/2013 4:56 PM, Tim Peters wrote:

> [Bruce Leban]
>> It's definitely not "finally clear" as it's a change in semantics.
>
> Of course it is.
>
>> What about negative strides other than -1?
>
> All strides start the same way, by (conceptually) selecting s[i:j]
> first.  Then the stride is applied to that contiguous slice, starting
> with the first element and taking every abs(k) element thereafter.
> Finally, if k is negative, that sequence is reversed.  k=1, k=-1, k=2,
> k=-2, ..., all the same.

I think this is wrong. Conceptually reverse first, if indicated by 
negative stride, then select: see my previous post for my rationale, and 
below.

>> Which of these is expected?
>>
>> (A) '012345678'[::-2] == '86420'
>>      '0123456789'[::-2] == '97531'

This is the current behavior and I think that [::k] should continue to 
work as it does. I believe Guido only suggested deprecating negative 
strides with non-default endpoints, which implies negative strides 
*with* default endpoints should continue as are. We should not break 
more than necessary.

>> (B) '012345678'[::-2] == '86420'
>>      '0123456789'[::-2] == '86420'

> B.

Aside from all else, I find A) more intuitive. It certainly strikes me 
as more likely to be wanted, though either is obviously rare.

>> If (A) I can get the (B) result by writing [::2][::-1]
 >> but if (B), I'm forced to write:
>>
>> s[0 if len(s) % 2 == 1 else 1::2]
>>
>> or something equally ugly.

No, just reverse the slices.

 >>> '0123456789'[::-1][::2]
'97531'
 >>> '012345678'[::-1][::2]
'86420'

If we were to make the change, I think the docs, at least the tutorial, 
should say that s[i:j:-k] could mean either s[i:j:-1][::k] or 
s[i:j:k][::-1] and that is does mean the former, so if one wants the 
latter, spell it out.

> You're assuming something here you haven't said.  The easiest way to
> get '97531' is to type '97531' ;-)  If your agenda is the general
> "return every 2nd element starting with the last element", then the
> obvious way to do that under the proposal is to write [::-1][::2].
> You can't seriously claim that's harder than the "[::2][::-1]" you
> presented as the obvious way to "get the (B) result" given (A).
>
>> (Also, (A) is the current behavior and switching to (B) would break any
>> existing use of strides < -1.)
>
> Did you notice that Guido titled this thread "Where did we go wrong
> with negative stride?".;-)

I did, and I explained exactly where I thing we went wrong, which was to 
make the interpretation of i and j depend on the sign of k. Undoing this 
does not mandate B instead of A.

-- 
Terry Jan Reedy


From tim.peters at gmail.com  Mon Oct 28 22:55:09 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 28 Oct 2013 16:55:09 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4mllu$ori$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <CAGu0AnuB7FhAhboynCwZycv0UAfdeJT_uYjCGFPQO+CSpRzHcQ@mail.gmail.com>
 <CAExdVNkP8SBVdKvdN0R1WwMzPypfUWmkibOCCY89GtSy6oiV3A@mail.gmail.com>
 <l4mllu$ori$1@ger.gmane.org>
Message-ID: <CAExdVN=3yeL5CJcVBo4rMwAnJteK+TPxzuZDeGU4Lo3L-nhTOA@mail.gmail.com>

[Tim\
>> All strides start the same way, by (conceptually) selecting s[i:j]
>> first.  Then the stride is applied to that contiguous slice, starting
>> with the first element and taking every abs(k) element thereafter.
>> Finally, if k is negative, that sequence is reversed.  k=1, k=-1, k=2,
>> k=-2, ..., all the same.

[Terry]
> I think this is wrong. Conceptually reverse first, if indicated by negative
> stride, then select: see my previous post for my rationale, and below.

I already replied to your previous post, and agreed with you :-)

> ... I think that [::k] should continue to work
> as it does. I believe Guido only suggested deprecating negative strides with
> non-default endpoints, which implies negative strides *with* default
> endpoints should continue as are. We should not break more than necessary.

Take "yes" for an answer ;-)

>> ...
>> Did you notice that Guido titled this thread "Where did we go wrong
>> with negative stride?".;-)

> I did,

And did you notice that I posed that question to someone else? ;-)

> and I explained exactly where I thing we went wrong, which was to
> make the interpretation of i and j depend on the sign of k. Undoing this
> does not mandate B instead of A.

Agreed.

From ncoghlan at gmail.com  Mon Oct 28 23:29:35 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Oct 2013 08:29:35 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
Message-ID: <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>

On 29 Oct 2013 06:21, "Oscar Benjamin" <oscar.j.benjamin at gmail.com> wrote:
>
> On 28 October 2013 15:04, Brett Cannon <brett at python.org> wrote:
> >
> > On Mon, Oct 28, 2013 at 10:49 AM, Oscar Benjamin
> > <oscar.j.benjamin at gmail.com> wrote:
> >>
> >> On 28 October 2013 13:57, Brett Cannon <brett at python.org> wrote:
> >>
> >> I think that the numpy mailing lists should be consulted before any
> >> decisions are made. As Antoine says: if you've never noticed this
> >> behaviour before then it obviously doesn't matter to you that much so
> >> why the rush to deprecate it?
> >
> > I'm not saying not to talk to them, but I also don't think we should
> > necessarily not change it because no one uses it either. If it's wide
spread
> > then sure, we just live with it. It's always a balancing act of fixing
for
> > future code vs. pain of current code. I'm just saying we shouldn't
dismiss
> > changing this out of hand because you are so far the only person who has
> > relied on this.
>
> Anyone who has used negative strides and non-default start/stop is
> relying on it. It has been a core language feature since long before I
> started using Python.
>
> Also I'm not the only person to point out that a more common problem
> is with wraparound when doing something like a[:n]. That is a more
> significant problem with slicing and I don't understand why the
> emphasis here is all on the negative strides rather then negative
> indexing. Using negative indices to mean "from the end" is a mistake
> and it leads to things like this:
>
> >>> a = 'abcdefg'
> >>> for n in reversed(range(7)):
> ...     print(n, a[:-n])
> ...
> 6 a
> 5 ab
> 4 abc
> 3 abcd
> 2 abcde
> 1 abcdef
> 0
>
> You can do something like a[:-n or None] to correctly handle zero but
> that still does the wrong thing when n is negative.
>
> Also why do you get an error when your index goes off one end of the
> array but not when it goes off the other?
>
> >>> a[len(a)]
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IndexError: string index out of range
> >>> a[-1]
> 'g'
>
> I have never been in a situation where I was writing code and didn't
> know whether I wanted to slice/index from the end or the beginning at
> coding time. I would much rather that a[-1] be an error and have an
> explicit syntax for indexing from the end.
>
> I have never found this wraparound to be useful and I think that if
> there is a proposal to change slicing in a backward incompatible way
> then it should be to something that is significantly better by solving
> these real problems. I have often had bugs or been forced to write
> awkward code because of these. The negative slicing indices may be a
> bit harder to reason about but it has never actually caused me any
> problems.
>
> Something like the matlab/pike syntax would fix these problems as well
> as making it possible to use the same indices for negative stride
> slices. That would be worth a deprecation process in my opinion. The
> specific suggestion so far does not have enough of an advantage to
> justify breaking anyone's code IMO.
>
> > As for the rush, it's because 3.4b1 is approaching and if this slips to
3.5
> > that's 1.5 years of deprecation time lost for something that doesn't
have a
> > syntactic break to help you discover the change in semantics.
> >>
> >> > +1 on doing a deprecation in 3.4.
> >>
> >> -1 on any deprecation without a clear plan for a better syntax. Simply
> >> changing the semantics of the current syntax would bring in who knows
> >> how many off-by-one errors for virtually no benefit.
> >
> > The deprecation would be in there from now until Python 4 so it
wouldn't be
> > sudden (remember that we are on a roughly 18 month release cycle, so if
this
> > went into 3.4 that's 7.5 years until this changes in Python 4). And
there's
> > already a future-compatible way to change your code to get the same
results
> > in the end that just require more explicit steps/code.
>
> This argument would be more persuasive if you said: "the new better
> syntax that solves many of the slicing problems will be introduced
> *now*, and the old style/syntax will be deprecated later".

Indeed. I like Terry's proposed semantics, so if we can give that a new
syntax, we can just give range and slice appropriate "reverse=False"
keyword arguments like sorted and list.sort, and never have to deprecate
negative strides (although negative strides would be disallowed when
reverse=True).

For example:

s[i:j:k] - normal forward slice
s[i:j:<<k] - reversed slice (with i and j as left/right rather than
start/stop)

Reversing with unit stride could be:
s[i:j:<<]

When reverse=True, start, stop and step for the range or slice would be
calculated as follows:

start = len(s)-1 if j is None else j-1
stop = -1 if i is None else i-1
step = -k

It doesn't solve the -j vs len(s)-j problem for the end index, but I think
it's still more intuitive for the reasons Tim and Terry gave.

Cheers,
Nick.

>
>
> Oscar
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131029/cfac83f7/attachment.html>

From tjreedy at udel.edu  Mon Oct 28 23:39:11 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 28 Oct 2013 18:39:11 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
Message-ID: <l4mp26$utl$1@ger.gmane.org>

On 10/28/2013 4:20 PM, Oscar Benjamin wrote:

> Also I'm not the only person to point out that a more common problem
> is with wraparound when doing something like a[:n].

I think it a mistake to think in terms of 'wraparound'. This implies to 
me that there is a mod len(s) operation applied, and there is not. A 
negative index or slice position, -n, is simply an abbreviation for 
len(s) - n.  Besides being faster to write, the abbreviation runs about 
3x as fast with 3.3.2 on my machine.

 >>> timeit.timeit('pass', "s='abcde'")
0.02394336495171956
 >>> timeit.timeit('pass', "s='abcde'")
0.02382040032352961
.024 timeit overhead

 >>> timeit.timeit('s[-3]', "s='abcde'")
0.06969358444349899
 >>> timeit.timeit('s[-3]', "s='abcde'")
0.06534832190172146
.068 - .024 = .044 net

 >>> timeit.timeit('s[len(s)-3]', "s='abcde'")
0.15656133106750403
 >>> timeit.timeit('s[len(s)-3]', "s='abcde'")
0.15518289758767878
.156 - .024 = .132 net

The trick works because Python, unlike some other languages, does not 
allow negative indexing from the start of the array.  If Python had 
required an explicit end marker from the beginning, conditional code 
would  be required if the sign were unknown. If Python gained one today, 
it would have to be optional for back compatibility.

-- 
Terry Jan Reedy


From guido at python.org  Mon Oct 28 23:41:02 2013
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Oct 2013 15:41:02 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
Message-ID: <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>

I'm not sure I like new syntax. We'd still have to find a way to represent
this with slice() and also with range().


On Mon, Oct 28, 2013 at 3:29 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

>
> On 29 Oct 2013 06:21, "Oscar Benjamin" <oscar.j.benjamin at gmail.com> wrote:
> >
> > On 28 October 2013 15:04, Brett Cannon <brett at python.org> wrote:
> > >
> > > On Mon, Oct 28, 2013 at 10:49 AM, Oscar Benjamin
> > > <oscar.j.benjamin at gmail.com> wrote:
> > >>
> > >> On 28 October 2013 13:57, Brett Cannon <brett at python.org> wrote:
> > >>
> > >> I think that the numpy mailing lists should be consulted before any
> > >> decisions are made. As Antoine says: if you've never noticed this
> > >> behaviour before then it obviously doesn't matter to you that much so
> > >> why the rush to deprecate it?
> > >
> > > I'm not saying not to talk to them, but I also don't think we should
> > > necessarily not change it because no one uses it either. If it's wide
> spread
> > > then sure, we just live with it. It's always a balancing act of fixing
> for
> > > future code vs. pain of current code. I'm just saying we shouldn't
> dismiss
> > > changing this out of hand because you are so far the only person who
> has
> > > relied on this.
> >
> > Anyone who has used negative strides and non-default start/stop is
> > relying on it. It has been a core language feature since long before I
> > started using Python.
> >
> > Also I'm not the only person to point out that a more common problem
> > is with wraparound when doing something like a[:n]. That is a more
> > significant problem with slicing and I don't understand why the
> > emphasis here is all on the negative strides rather then negative
> > indexing. Using negative indices to mean "from the end" is a mistake
> > and it leads to things like this:
> >
> > >>> a = 'abcdefg'
> > >>> for n in reversed(range(7)):
> > ...     print(n, a[:-n])
> > ...
> > 6 a
> > 5 ab
> > 4 abc
> > 3 abcd
> > 2 abcde
> > 1 abcdef
> > 0
> >
> > You can do something like a[:-n or None] to correctly handle zero but
> > that still does the wrong thing when n is negative.
> >
> > Also why do you get an error when your index goes off one end of the
> > array but not when it goes off the other?
> >
> > >>> a[len(a)]
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > IndexError: string index out of range
> > >>> a[-1]
> > 'g'
> >
> > I have never been in a situation where I was writing code and didn't
> > know whether I wanted to slice/index from the end or the beginning at
> > coding time. I would much rather that a[-1] be an error and have an
> > explicit syntax for indexing from the end.
> >
> > I have never found this wraparound to be useful and I think that if
> > there is a proposal to change slicing in a backward incompatible way
> > then it should be to something that is significantly better by solving
> > these real problems. I have often had bugs or been forced to write
> > awkward code because of these. The negative slicing indices may be a
> > bit harder to reason about but it has never actually caused me any
> > problems.
> >
> > Something like the matlab/pike syntax would fix these problems as well
> > as making it possible to use the same indices for negative stride
> > slices. That would be worth a deprecation process in my opinion. The
> > specific suggestion so far does not have enough of an advantage to
> > justify breaking anyone's code IMO.
> >
> > > As for the rush, it's because 3.4b1 is approaching and if this slips
> to 3.5
> > > that's 1.5 years of deprecation time lost for something that doesn't
> have a
> > > syntactic break to help you discover the change in semantics.
> > >>
> > >> > +1 on doing a deprecation in 3.4.
> > >>
> > >> -1 on any deprecation without a clear plan for a better syntax. Simply
> > >> changing the semantics of the current syntax would bring in who knows
> > >> how many off-by-one errors for virtually no benefit.
> > >
> > > The deprecation would be in there from now until Python 4 so it
> wouldn't be
> > > sudden (remember that we are on a roughly 18 month release cycle, so
> if this
> > > went into 3.4 that's 7.5 years until this changes in Python 4). And
> there's
> > > already a future-compatible way to change your code to get the same
> results
> > > in the end that just require more explicit steps/code.
> >
> > This argument would be more persuasive if you said: "the new better
> > syntax that solves many of the slicing problems will be introduced
> > *now*, and the old style/syntax will be deprecated later".
>
> Indeed. I like Terry's proposed semantics, so if we can give that a new
> syntax, we can just give range and slice appropriate "reverse=False"
> keyword arguments like sorted and list.sort, and never have to deprecate
> negative strides (although negative strides would be disallowed when
> reverse=True).
>
> For example:
>
> s[i:j:k] - normal forward slice
> s[i:j:<<k] - reversed slice (with i and j as left/right rather than
> start/stop)
>
> Reversing with unit stride could be:
> s[i:j:<<]
>
> When reverse=True, start, stop and step for the range or slice would be
> calculated as follows:
>
> start = len(s)-1 if j is None else j-1
> stop = -1 if i is None else i-1
> step = -k
>
> It doesn't solve the -j vs len(s)-j problem for the end index, but I think
> it's still more intuitive for the reasons Tim and Terry gave.
>
> Cheers,
> Nick.
>
> >
> >
> > Oscar
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/c09096b3/attachment-0001.html>

From ncoghlan at gmail.com  Tue Oct 29 00:13:18 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Oct 2013 09:13:18 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
Message-ID: <CADiSq7eJ-nbiUZfZqjdkTS4W1y4MLWk8rSBWR_kZp1dMtV94wg@mail.gmail.com>

On 29 Oct 2013 08:41, "Guido van Rossum" <guido at python.org> wrote:
>
> I'm not sure I like new syntax. We'd still have to find a way to
represent this with slice() and also with range().

Those are much easier: we can just add a "reverse=False" keyword-only
argument.

However, I realised that given the need to appropriately document these
function signatures and the precedent set by sorted (where the reverse flag
is essentially an optimisation trick that avoids a separate reversal
operation) the cleaner interpretation of such an argument is for:

    range(i, j, k, reverse=True)

to effectively mean:

    range(i, j, k)[::-1]

and for:

    s[slice(i, j, k, reverse=True)]

to effectively mean:

    s[i:j:k][::-1]

range and slice would handle the appropriate start/stop/step calculations
under the hood and hence be backwards compatible with existing container
implementations and other code.

This approach also means we could avoid addressing the slice reversal
syntax question for 3.4, and revisit it in the 3.5 time frame (and ditto
for deprecating negative strides). However, the idea of just allowing
keyword args to be passed to the slice builtin in the slice syntax did
occur to me:

    s[i:j:k:reverse=True]

Cheers,
Nick.

>
>
> On Mon, Oct 28, 2013 at 3:29 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>>
>> On 29 Oct 2013 06:21, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
wrote:
>> >
>> > On 28 October 2013 15:04, Brett Cannon <brett at python.org> wrote:
>> > >
>> > > On Mon, Oct 28, 2013 at 10:49 AM, Oscar Benjamin
>> > > <oscar.j.benjamin at gmail.com> wrote:
>> > >>
>> > >> On 28 October 2013 13:57, Brett Cannon <brett at python.org> wrote:
>> > >>
>> > >> I think that the numpy mailing lists should be consulted before any
>> > >> decisions are made. As Antoine says: if you've never noticed this
>> > >> behaviour before then it obviously doesn't matter to you that much
so
>> > >> why the rush to deprecate it?
>> > >
>> > > I'm not saying not to talk to them, but I also don't think we should
>> > > necessarily not change it because no one uses it either. If it's
wide spread
>> > > then sure, we just live with it. It's always a balancing act of
fixing for
>> > > future code vs. pain of current code. I'm just saying we shouldn't
dismiss
>> > > changing this out of hand because you are so far the only person who
has
>> > > relied on this.
>> >
>> > Anyone who has used negative strides and non-default start/stop is
>> > relying on it. It has been a core language feature since long before I
>> > started using Python.
>> >
>> > Also I'm not the only person to point out that a more common problem
>> > is with wraparound when doing something like a[:n]. That is a more
>> > significant problem with slicing and I don't understand why the
>> > emphasis here is all on the negative strides rather then negative
>> > indexing. Using negative indices to mean "from the end" is a mistake
>> > and it leads to things like this:
>> >
>> > >>> a = 'abcdefg'
>> > >>> for n in reversed(range(7)):
>> > ...     print(n, a[:-n])
>> > ...
>> > 6 a
>> > 5 ab
>> > 4 abc
>> > 3 abcd
>> > 2 abcde
>> > 1 abcdef
>> > 0
>> >
>> > You can do something like a[:-n or None] to correctly handle zero but
>> > that still does the wrong thing when n is negative.
>> >
>> > Also why do you get an error when your index goes off one end of the
>> > array but not when it goes off the other?
>> >
>> > >>> a[len(a)]
>> > Traceback (most recent call last):
>> >   File "<stdin>", line 1, in <module>
>> > IndexError: string index out of range
>> > >>> a[-1]
>> > 'g'
>> >
>> > I have never been in a situation where I was writing code and didn't
>> > know whether I wanted to slice/index from the end or the beginning at
>> > coding time. I would much rather that a[-1] be an error and have an
>> > explicit syntax for indexing from the end.
>> >
>> > I have never found this wraparound to be useful and I think that if
>> > there is a proposal to change slicing in a backward incompatible way
>> > then it should be to something that is significantly better by solving
>> > these real problems. I have often had bugs or been forced to write
>> > awkward code because of these. The negative slicing indices may be a
>> > bit harder to reason about but it has never actually caused me any
>> > problems.
>> >
>> > Something like the matlab/pike syntax would fix these problems as well
>> > as making it possible to use the same indices for negative stride
>> > slices. That would be worth a deprecation process in my opinion. The
>> > specific suggestion so far does not have enough of an advantage to
>> > justify breaking anyone's code IMO.
>> >
>> > > As for the rush, it's because 3.4b1 is approaching and if this slips
to 3.5
>> > > that's 1.5 years of deprecation time lost for something that doesn't
have a
>> > > syntactic break to help you discover the change in semantics.
>> > >>
>> > >> > +1 on doing a deprecation in 3.4.
>> > >>
>> > >> -1 on any deprecation without a clear plan for a better syntax.
Simply
>> > >> changing the semantics of the current syntax would bring in who
knows
>> > >> how many off-by-one errors for virtually no benefit.
>> > >
>> > > The deprecation would be in there from now until Python 4 so it
wouldn't be
>> > > sudden (remember that we are on a roughly 18 month release cycle, so
if this
>> > > went into 3.4 that's 7.5 years until this changes in Python 4). And
there's
>> > > already a future-compatible way to change your code to get the same
results
>> > > in the end that just require more explicit steps/code.
>> >
>> > This argument would be more persuasive if you said: "the new better
>> > syntax that solves many of the slicing problems will be introduced
>> > *now*, and the old style/syntax will be deprecated later".
>>
>> Indeed. I like Terry's proposed semantics, so if we can give that a new
syntax, we can just give range and slice appropriate "reverse=False"
keyword arguments like sorted and list.sort, and never have to deprecate
negative strides (although negative strides would be disallowed when
reverse=True).
>>
>> For example:
>>
>> s[i:j:k] - normal forward slice
>> s[i:j:<<k] - reversed slice (with i and j as left/right rather than
start/stop)
>>
>> Reversing with unit stride could be:
>> s[i:j:<<]
>>
>> When reverse=True, start, stop and step for the range or slice would be
calculated as follows:
>>
>> start = len(s)-1 if j is None else j-1
>> stop = -1 if i is None else i-1
>> step = -k
>>
>> It doesn't solve the -j vs len(s)-j problem for the end index, but I
think it's still more intuitive for the reasons Tim and Terry gave.
>>
>> Cheers,
>> Nick.
>>
>> >
>> >
>> > Oscar
>> > _______________________________________________
>> > Python-ideas mailing list
>> > Python-ideas at python.org
>> > https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131029/fd804850/attachment.html>

From oscar.j.benjamin at gmail.com  Tue Oct 29 00:18:08 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 28 Oct 2013 23:18:08 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4mp26$utl$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <l4mp26$utl$1@ger.gmane.org>
Message-ID: <CAHVvXxSHV3S9+aaSktYZ86ghips=GztoQozUVYKss--oxG94_A@mail.gmail.com>

On 28 October 2013 22:39, Terry Reedy <tjreedy at udel.edu> wrote:
> On 10/28/2013 4:20 PM, Oscar Benjamin wrote:
>
>> Also I'm not the only person to point out that a more common problem
>> is with wraparound when doing something like a[:n].
>
>
> I think it a mistake to think in terms of 'wraparound'. This implies to me
> that there is a mod len(s) operation applied, and there is not. A negative
> index or slice position, -n, is simply an abbreviation for len(s) - n.
> Besides being faster to write, the abbreviation runs about 3x as fast with
> 3.3.2 on my machine.

I realise that it doesn't use modulo arithmetic. As I said earlier I
would be able to find uses for the current behaviour if it did.
However it does wraparound in some sense when the sign changes.

> The trick works because Python, unlike some other languages, does not allow
> negative indexing from the start of the array.  If Python had required an
> explicit end marker from the beginning, conditional code would  be required
> if the sign were unknown. If Python gained one today, it would have to be
> optional for back compatibility.

(I don't know if I understand what you mean.) Have you ever written
code where you didn't know if you wanted to index from the start or
the end of a sequence? I haven't and I use slicing/indexing
extensively. I have to write conditional code to handle the annoyingly
permissive current behaviour. Things that should be an error such as
passing in a negative index don't produce an error so I have to check
for it myself.


Oscar

From flying-sheep at web.de  Tue Oct 29 00:17:32 2013
From: flying-sheep at web.de (Philipp A.)
Date: Tue, 29 Oct 2013 00:17:32 +0100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
Message-ID: <CAN8d9gmJ5EGa_9ttE-UovL7ASTMK9p4EWw4YiNQH1w2G-j0RPg@mail.gmail.com>

Am 28.10.2013 16:08 schrieb "Brett Cannon" <brett at python.org>:
> The deprecation would be in there from now until Python 4 so it wouldn't
be sudden (remember that we are on a roughly 18 month release cycle, so if
this went into 3.4 that's 7.5 years until this changes in Python 4).

I don't get your calculation: after 3.9 clearly follows 3.10, as versions
aren't decimal numbers, but tuples of integers.

So we have 1.5?X years, with X being any number from 1 to infinity that
Guido deems suitable.

@proposal:
-1 for explicit impliciticity in slicing syntax, as it's ass complicated as
it sounds (when phrased like I just did) and noisier than obfuscated C

+1 for deprecating negative slicing, and teaching people to use reversed.

But I think we should consider adding some sort of slice view function,
since list[::2] already creates a copy, and reversed(list[::2]) creates two.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131029/78a5c70f/attachment-0001.html>

From ron3200 at gmail.com  Tue Oct 29 00:51:56 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Mon, 28 Oct 2013 18:51:56 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
Message-ID: <l4mtaj$cnk$1@ger.gmane.org>



On 10/28/2013 04:31 PM, Tim Peters wrote:
> That's not what the proposal said, but it's a reasonable alternative.
> Maybe that's better because it's "more compatible" with what happens
> today.  The hangup for me is that I have no use cases for negative
> strides other than -1, so have no real basis for picking one over the
> other.  OTOH, since I_don't_  have any use cases, I don't care either
> what happens then;-)
>
>> >s[i:j:-k] == s[i:j:-1][::k]
> That's a nice form of symmetry too.  Sold;-)

+1  Looks good to me.


We could add a new_slice object without any problems now.  You just need to 
be explicit when using it.

(This works now)

 >>> "abcdefg"[slice(1,5,1)]
'bcde'


And this could work and not cause any compatibility issues.

     "abcdefg"[new_slice(1, 5, -2)]
     "ec"


It would offer an alternative for those who need or want it now and help 
with the change over when/if the time comes.

We also need to remember that slicing is also used for inserting things.

 >>> a = list("python")
 >>> b = list("PYTHON")
 >>> a[::2] = b[::2]
 >>> a
['P', 'y', 'T', 'h', 'O', 'n']


Cheers,
     Ron





















From abarnert at yahoo.com  Tue Oct 29 01:14:00 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Oct 2013 17:14:00 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4mtaj$cnk$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org>
Message-ID: <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>

On Oct 28, 2013, at 16:51, Ron Adam <ron3200 at gmail.com> wrote:

> We also need to remember that slicing is also used for inserting things.
> 
> >>> a = list("python")
> >>> b = list("PYTHON")
> >>> a[::2] = b[::2]
> >>> a
> ['P', 'y', 'T', 'h', 'O', 'n']

I was about to write the same thing. Half the mails so far have said things like "you don't need to do [i: j:k] because you can do [m:n:o][::p]". But that doesn't work with assignment; you're just assigning to the temporary copy of the first slice.

And I think people will be bitten by that. People are _already_ bitten by that today, because they saw somewhere on StackOverflow that foo[i:j][::-1] is easier to understand than foo[j+1:i+1:-1] and tried to assign to it (presumably on the assumption that index and slice assignment must work like C++ and other languages that return "references" to the values that can be assigned into) and don't understand why it had no effect.

Today, people who ask this question are opening a useful door. You can explain to them exactly what slicing does, including how __setitem__ works, and they come out of it knowing how to assign to the range that they wanted.

Never mind that these people have no real need for what they're writing and are just screwing around with language features because it's neat; we don't want to say that anyone who learns that way shouldn't be learning Python, do we?

Anyway, a change that makes it impossible to assign to the range looks like a hole in the language. All you can say is that if you wanted to get the slice you could rewrite it this way, but there's no way to rewrite it in terms of setting a slice, but don't worry, we're pretty sure you'll never need to.

From alexander.belopolsky at gmail.com  Tue Oct 29 01:45:13 2013
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 28 Oct 2013 20:45:13 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
Message-ID: <CAP7h-xYtiNYmz=a-ZzzBOE0eO-_m1UA-OQVHkqqfV3P6J3tQfQ@mail.gmail.com>

On Mon, Oct 28, 2013 at 6:41 PM, Guido van Rossum <guido at python.org> wrote:
> I'm not sure I like new syntax.

Neither do I, but I've never liked the current extended slicing syntax either.

> We'd still have to find a way to represent this with slice() and also with
> range().

These seem easy: slice(i, j, k, reverse=True) and range(i, j, k, reverse=True).

FWIW, I won't miss extended slicing syntax if it goes away in Python
4.  I find a[slice(i, j, step=2, reverse=True)]  more readable than
a[i:j:-k].  Alternatively, we can allow keyword arguments like syntax
inside []:  a[i:j,step=2, reverse=True] can become syntactic sugar for
a[slice(i, j, step=2, reverse=True)] .

From alexander.belopolsky at gmail.com  Tue Oct 29 01:47:56 2013
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Mon, 28 Oct 2013 20:47:56 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7eJ-nbiUZfZqjdkTS4W1y4MLWk8rSBWR_kZp1dMtV94wg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
 <CADiSq7eJ-nbiUZfZqjdkTS4W1y4MLWk8rSBWR_kZp1dMtV94wg@mail.gmail.com>
Message-ID: <CAP7h-xY6brpRtDO9kZup1v9ZXN8xVxCEavVCkwDV5zBr-Jb+bw@mail.gmail.com>

On Mon, Oct 28, 2013 at 7:13 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
..
> to effectively mean:
>
>     range(i, j, k)[::-1]
>
> and for:
>
>     s[slice(i, j, k, reverse=True)]
>
> to effectively mean:
>
>     s[i:j:k][::-1]
>
> range and slice would handle the appropriate start/stop/step calculations
> under the hood and hence be backwards compatible with existing container
> implementations and other code.
>
> This approach also means we could avoid addressing the slice reversal syntax
> question for 3.4, and revisit it in the 3.5 time frame (and ditto for
> deprecating negative strides). However, the idea of just allowing keyword
> args to be passed to the slice builtin in the slice syntax did occur to me:
>
>     s[i:j:k:reverse=True]

+1

In fact, I suggested the same before reading your e-mail.

From tim.peters at gmail.com  Tue Oct 29 01:41:37 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 28 Oct 2013 19:41:37 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
Message-ID: <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>

[Ron Adam]
>> We also need to remember that slicing is also used for inserting things.
>>
>> >>> a = list("python")
>> >>> b = list("PYTHON")
>> >>> a[::2] = b[::2]
>> >>> a
>> ['P', 'y', 'T', 'h', 'O', 'n']

[Andrew Barnert]
> I was about to write the same thing. Half the mails so far have said things like "you don't
> need to do [i: j:k] because you can do [m:n:o][::p]".

You must be missing most of the messages, then ;-)

>  But that doesn't work with assignment; you're just assigning to the temporary copy
> of the first slice.
> ...

Do you have a specific example of a currently-working slice assignment
that couldn't easily be done under proposed alternatives?  I can't
think of one, under "my" proposal as amended by Terry.  Ron's example
is no problem under any of them (because no proposal so far has
suggested changing the current meaning of [::2]).

When you see things like

    s[i:j;k] = s[i:j][::k]

*nobody* is suggesting using the spelling on the RHS.  They're
pointing out a pleasant mathematical equivalence.

From ron3200 at gmail.com  Tue Oct 29 02:34:29 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Mon, 28 Oct 2013 20:34:29 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
Message-ID: <l4n3as$bii$1@ger.gmane.org>



On 10/28/2013 08:00 AM, Nick Coghlan wrote:
> In this vein, I started wondering if it might be worth trying to come up
> with a syntax to control whether the ends of a slice were open or closed.
>
> Since mismatched paren types would be too confusing, perhaps abusing some
> binary operators as Chris suggested could help:
>
> "[<i:" closed start of slice (default)
> "[i<:" open start of slice
> ":>j]" open end of slice (default)
> ":j>]" closed end of slice
> ":>j:k]" open end of slice with step
> ":j>:k]" closed end of slice with step
>
> Default slice: "[<0:-1>:1]"
> Reversed slice: "[<-1:0>:-1]"
>
> This makes it possible to cleanly include the final element as a closed
> range, rather than needing to add or subtract 1 (and avoids the zero trap
> when indexing from the end).

I think a reverse index object could be easier to understand.  For now it 
could be just a subclass of int.  Then 0 and rx(0) would be distinguishable 
from each other.  (-i and rx(i) would be too.)

     seq[0:rx(0)]        Default slice.
     seq[0:rx(0):-1]     Reversed slice.  (compare to above)

     seq[rx(5): rx(0)]   The last 5 items.


A syntax could be added later.  (Insert preferred syntax below.)

     seq[\5:\0]           The last 5 items



How about this example, which would probably use names instead of the
integers in real code.

     >>> "abcdefg"[3:10]       # 10 is past the end.  (works fine)
     'defg'

Sliding the range 5 to the left...

     >>> "abcdefg"[-2:5]       # -2 is before the beginning?  (Nope)
     ''                        # The wrap around gotcha!

The same situation happens when indexing from the right side [-i:-j], and 
sliding the range to the right.  Once j >= 0, it breaks.


It would be nice if these worked the same on both ends.  A reverse index 
object could fix both of these cases.

Cheers,
    Ron



From steve at pearwood.info  Tue Oct 29 03:25:43 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Oct 2013 13:25:43 +1100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4mjjo$sb$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
Message-ID: <20131029022542.GY7989@ando>

On Mon, Oct 28, 2013 at 05:06:09PM -0400, Terry Reedy wrote:
> On 10/28/2013 2:49 PM, Tim Peters wrote:
> 
> >under the proposal we have:
> >
> >     s[i:j:k] == s[i:j][::k]
> 
> I think where we went wrong with strides was to have the sign of the 
> stride affect the interpretation of i and j (in analogy with ranges). 
> The change is to correct this by decoupling steps 1. and 2. below. The 
> result is that i and j would mean left and right ends of the slice, 
> rather than 'start' and 'stop' ends of the slice.

Sorry Terry, your paragraph above is ambiguous to me. It sounds like you 
are saying that having slices work by analogy with range was a mistake. 
Are you suggesting to break the analogy between slicing and range? That 
is, range continues to work they way it currently does, but change 
slice?


> Whether selecting or replacing, this proposal makes the rule for 
> indicating an arithmetic subsequence to be:
> 
> 1. indicate the contiguous slice to work on with left and right 
> endpoints (left end i, right end j, i <= j after normalization with same 
> rules as at present);
> 
> 2. indicate the starting end and direction of movement, left to right 
> (default) or right to left (negate k);
> 
> 3. indicate whether to pick every member of the slice (k=1, default) or 
> every kth (k > 1), starting with the first item at the indicated end (if 
> there is one) and moving in the appropriate direction.

"pick every kth element" works for k=1 as well as k > 1, no need for a 
special case here. Every 1th element is every element :-)


> My quick take on slicing versus indexing. The slice positions of a 
> single item are i:(i+1). The average is i.5. Some languages (0-based, 
> like Python) round this down to i, others (1-based) round up to i+1.

I don't think it's helpful to talk about averaging or rounding the 
indexes. Better to talk about whether indexes are included or excluded, 
or whether the interval is open (end points are excluded) or closed (end 
points are included).

Ruby provides both closed and half-open ranges:

2..5 => 2, 3, 4, 5
2...5 => 2, 3, 4

(If you think that the choice of .. versus ... is backwards, you're not 
alone.) Ruby has no syntax for stride, but range objects have a step(k) 
method that returns every kth value.

http://www.ruby-doc.org/core-1.9.3/Range.html

 
-- 
Steven

From python at mrabarnett.plus.com  Tue Oct 29 04:43:28 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 29 Oct 2013 03:43:28 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4n3as$bii$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org>
Message-ID: <526F2EE0.9010705@mrabarnett.plus.com>

On 29/10/2013 01:34, Ron Adam wrote:
>
>
> On 10/28/2013 08:00 AM, Nick Coghlan wrote:
>> In this vein, I started wondering if it might be worth trying to come up
>> with a syntax to control whether the ends of a slice were open or closed.
>>
>> Since mismatched paren types would be too confusing, perhaps abusing some
>> binary operators as Chris suggested could help:
>>
>> "[<i:" closed start of slice (default)
>> "[i<:" open start of slice
>> ":>j]" open end of slice (default)
>> ":j>]" closed end of slice
>> ":>j:k]" open end of slice with step
>> ":j>:k]" closed end of slice with step
>>
>> Default slice: "[<0:-1>:1]"
>> Reversed slice: "[<-1:0>:-1]"
>>
>> This makes it possible to cleanly include the final element as a closed
>> range, rather than needing to add or subtract 1 (and avoids the zero trap
>> when indexing from the end).
>
> I think a reverse index object could be easier to understand.  For now it
> could be just a subclass of int.  Then 0 and rx(0) would be distinguishable
> from each other.  (-i and rx(i) would be too.)
>
>       seq[0:rx(0)]        Default slice.
>       seq[0:rx(0):-1]     Reversed slice.  (compare to above)
>
>       seq[rx(5): rx(0)]   The last 5 items.
>
>
> A syntax could be added later.  (Insert preferred syntax below.)
>
>       seq[\5:\0]           The last 5 items
>
If you're going to have a reverse index object, shouldn't you also have
an index object?

I don't like the idea of counting from one end with one type and from
the other end with another type.

But if you're really set on having different types of some kind, how about
real counting from the left and imaginary counting from the right:

     seq[5j : 0j] # The last 5 items

     seq[1 : 1j] # From second to second-from-last

>
>
> How about this example, which would probably use names instead of the
> integers in real code.
>
>       >>> "abcdefg"[3:10]       # 10 is past the end.  (works fine)
>       'defg'
>
> Sliding the range 5 to the left...
>
>       >>> "abcdefg"[-2:5]       # -2 is before the beginning?  (Nope)
>       ''                        # The wrap around gotcha!
>
> The same situation happens when indexing from the right side [-i:-j], and
> sliding the range to the right.  Once j >= 0, it breaks.
>
>
> It would be nice if these worked the same on both ends.  A reverse index
> object could fix both of these cases.
>
If you don't want a negative int to count from the right, then the
clearest choice I've seen so far is, IHMO, 'end':

     seq[end - 5 : end] # The last 5 items

     seq[1 : end - 1] # From second to second-from-last

I don't know the best way to handle it, but here's an idea: do it in
the syntax:

     subscript: subscript_test | [subscript_test] ':' [subscript_test] 
[sliceop]
     subscript_test: test | 'end' '-' test


From abarnert at yahoo.com  Tue Oct 29 05:15:52 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Oct 2013 21:15:52 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
 <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
Message-ID: <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>

On Oct 28, 2013, at 17:41, Tim Peters <tim.peters at gmail.com> wrote:

> [Ron Adam]
>>> We also need to remember that slicing is also used for inserting things.
>>> 
>>>>>> a = list("python")
>>>>>> b = list("PYTHON")
>>>>>> a[::2] = b[::2]
>>>>>> a
>>> ['P', 'y', 'T', 'h', 'O', 'n']
> 
> [Andrew Barnert]
>> I was about to write the same thing. Half the mails so far have said things like "you don't
>> need to do [i: j:k] because you can do [m:n:o][::p]".
> 
> You must be missing most of the messages, then ;-)

For example, the whole sub discussion starting with Bruce Leban's post, which I'll quote here:

> (A) '012345678'[::-2] == '86420'
>     '0123456789'[::-2] == '97531'
> 
> or:
> 
> (B) '012345678'[::-2] == '86420'
>     '0123456789'[::-2] == '86420'
> 
> If (A) I can get the (B) result by writing [::2][::-1] but if (B), I'm forced to write:
> 
> s[0 if len(s) % 2 == 1 else 1::2]

The idea is that A (today's behavior) is fine because you can get the B result (one of the proposals, which Bruce apparently didn't like) with two slices. 

Someone then pointed out that the proposal is equally fine because, despite what Bruce suggested, you can get the A result with two slices.

But if you take away the ability to specify A in a single slice, you take away the ability to assign to A.

Imagine I'd written s[:-5:-2]=1, 3 (or equivalently s[:5:-2]) and the language changed to make this now replace the 8 and 6 instead of the 9 and 7. How would I change my code to get the previous behavior back?

It's quite possible no one has ever intentionally written such code. Or, even if they _have_, that they shouldn't have. (You can hardly call something readable if you have to sit down and work through what it would do to various sequences.) And correctly supporting such nonexistent code has been a burden on every custom sequence ever implemented.

So, a proposal that makes strides < -1 with non-None end points into an error (as some of them have) seems reasonable.

But I think a proposal that changes the meaning of such slices into something different (as some of them have) is a lot riskier. (Especially since many custom sequences that didn't implement slice assignment in the clever way would have to be rewritten.)

>> But that doesn't work with assignment; you're just assigning to the temporary copy
>> of the first slice.
>> ...
> 
> Do you have a specific example of a currently-working slice assignment
> that couldn't easily be done under proposed alternatives?

s[:-4:-2]=1, 2

This replaces the last and antepenultimate elements, whether s is even or odd.

I suppose you could mechanically convert it to this:

s[-mid+2::2]=reversed((1,2))

But I don't know that I'd call that "easy".

The question is whether this is realistic code anyone would ever intentionally write.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/0a8f5144/attachment.html>

From abarnert at yahoo.com  Tue Oct 29 05:18:28 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Mon, 28 Oct 2013 21:18:28 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
 <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
 <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>
Message-ID: <693E065F-3DCB-4E02-99FD-90D06A9D75D0@yahoo.com>

Sorry, accidentally hit Send in mid-edit... Please see the fixed version of the last segment below.

Sent from a random iPhone

On Oct 28, 2013, at 21:15, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Oct 28, 2013, at 17:41, Tim Peters <tim.peters at gmail.com> wrote:
> 
>> [Ron Adam]
>>>> We also need to remember that slicing is also used for inserting things.
>>>> 
>>>>>>> a = list("python")
>>>>>>> b = list("PYTHON")
>>>>>>> a[::2] = b[::2]
>>>>>>> a
>>>> ['P', 'y', 'T', 'h', 'O', 'n']
>> 
>> [Andrew Barnert]
>>> I was about to write the same thing. Half the mails so far have said things like "you don't
>>> need to do [i: j:k] because you can do [m:n:o][::p]".
>> 
>> You must be missing most of the messages, then ;-)
> 
> For example, the whole sub discussion starting with Bruce Leban's post, which I'll quote here:
> 
>> (A) '012345678'[::-2] == '86420'
>>     '0123456789'[::-2] == '97531'
>> 
>> or:
>> 
>> (B) '012345678'[::-2] == '86420'
>>     '0123456789'[::-2] == '86420'
>> 
>> If (A) I can get the (B) result by writing [::2][::-1] but if (B), I'm forced to write:
>> 
>> s[0 if len(s) % 2 == 1 else 1::2]
> 
> The idea is that A (today's behavior) is fine because you can get the B result (one of the proposals, which Bruce apparently didn't like) with two slices. 
> 
> Someone then pointed out that the proposal is equally fine because, despite what Bruce suggested, you can get the A result with two slices.
> 
> But if you take away the ability to specify A in a single slice, you take away the ability to assign to A.
> 
> Imagine I'd written s[:-5:-2]=1, 3 (or equivalently s[:5:-2]) and the language changed to make this now replace the 8 and 6 instead of the 9 and 7. How would I change my code to get the previous behavior back?
> 
> It's quite possible no one has ever intentionally written such code. Or, even if they _have_, that they shouldn't have. (You can hardly call something readable if you have to sit down and work through what it would do to various sequences.) And correctly supporting such nonexistent code has been a burden on every custom sequence ever implemented.
> 
> So, a proposal that makes strides < -1 with non-None end points into an error (as some of them have) seems reasonable.
> 
> But I think a proposal that changes the meaning of such slices into something different (as some of them have) is a lot riskier. (Especially since many custom sequences that didn't implement slice assignment in the clever way would have to be rewritten.)
> 
>>> But that doesn't work with assignment; you're just assigning to the temporary copy
>>> of the first slice.
>>> ...
>> 
>> Do you have a specific example of a currently-working slice assignment
>> that couldn't easily be done under proposed alternatives?
> 
> s[:-4:-2]=1, 2
> 
> This replaces the last and antepenultimate elements, whether s is even or odd.
> 
> I suppose you could mechanically convert it to this:

s[-3::2]=reversed((1, 2))

> But I don't know that I'd call that "easy".
> 
> The question is whether this is realistic code anyone would ever intentionally write.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131028/b04e9b72/attachment-0001.html>

From greg.ewing at canterbury.ac.nz  Tue Oct 29 05:31:36 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Oct 2013 17:31:36 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
Message-ID: <526F3A28.3080203@canterbury.ac.nz>

Guido van Rossum wrote:
> I'm not sure I like new syntax. We'd still have to find a way to 
> represent this with slice() and also with range().

If we allowed slice[...] to create slice objects, any new
indexing syntax would carry over to that.

Similarly we could use range[...] to create ranges using
slice syntax.

-- 
Greg

From ron3200 at gmail.com  Tue Oct 29 05:49:01 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Mon, 28 Oct 2013 23:49:01 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <l4nenk$d2h$1@ger.gmane.org>



On 10/27/2013 12:04 PM, Guido van Rossum wrote:
> Are we stuck with this forever? If we want to fix this in Python 4 we'd
> have to start deprecating negative stride with non-empty lower/upper bounds
> now. And we'd have to start deprecating negative step for range()
> altogether, recommending reversed(range(lower, upper)) instead.
>
> Thoughts? Is NumPy also affected?

I found this very short but interesting page that explains a little bit 
about how NumPy uses Python's slices.

     http://ilan.schnell-web.net/prog/slicing/

It looks like as long as we don't change the semantics of how slice objects 
get passed it won't effect NumPy.

Using the example from the web page above, we can see how the slice syntax 
already excepts more than one set of index's and/or values separated by 
commas.   ;-)

 >>> class foo:
...    def __getitem__(self, *args):
...       print(args)
...
 >>> x = foo()

 >>> x[2:3, 4:5]
((slice(2, 3, None), slice(4, 5, None)),)


It looks like it's the __getitem__ and __setitem__ methods that complains 
if you send it more than one set of indices or value.  It's not a syntax 
limitation.

If the left and right indices are to be considered separate from the step, 
we can use this existing legal syntax, and just pass the step after a comma.

     a[i:j, k]

And teach __getitem__ and __setitem__ to take the extra value.

Then your proposed relationship becomes the following and it's even clearer 
that (i and j) are separate and not effected by k.

     a[i:j, k] == a[i:j][:, k]


 >>> i, j, k = 0, 9, -1


 >>> x[i:j, k]
((slice(0, 9, None), -1),)

 >>> x[i:j]
(slice(0, 9, None),)

 >>> x[:, k]
((slice(None, None, None), -1),)


Cheers,
   Ron


From tjreedy at udel.edu  Tue Oct 29 05:51:50 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Oct 2013 00:51:50 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <20131029022542.GY7989@ando>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org> <20131029022542.GY7989@ando>
Message-ID: <l4nesu$eic$1@ger.gmane.org>

On 10/28/2013 10:25 PM, Steven D'Aprano wrote:
> On Mon, Oct 28, 2013 at 05:06:09PM -0400, Terry Reedy wrote:

>> I think where we went wrong with strides was to have the sign of the
>> stride affect the interpretation of i and j (in analogy with ranges).
>> The change is to correct this by decoupling steps 1. and 2. below. The
>> result is that i and j would mean left and right ends of the slice,
>> rather than 'start' and 'stop' ends of the slice.
>
> Sorry Terry, your paragraph above is ambiguous to me. It sounds like you
> are saying that having slices work by analogy with range was a mistake.

I once suggested that slice and range should be consolidated into one 
class. I was told (correctly, I see now) that no, slices and ranges are 
related but different. I have forgotten whatever explanation was given, 
but their parameters have different meanings. For slices, start and stop 
are symmetrical and both mark boundaries between what is included and 
what is excluded. For ranges, start and stop are symmetrical; one is 
included and the other excluded.

What I said above is that a negative stride is enough to say 'reverse 
the direction of selection (or replacement)'.  It is not actually 
necessary to also switch the endpoint arguments.

> Are you suggesting to break the analogy between slicing and range?

It was already broken, more than I really noticed until today.

> That is, range continues to work they way it currently does, but change
> slice?

Perhaps, but I read Guido's post less than 12 hours ago. Thinking about 
ranges is for another day.

>> Whether selecting or replacing, this proposal makes the rule for
>> indicating an arithmetic subsequence to be:
>>
>> 1. indicate the contiguous slice to work on with left and right
>> endpoints (left end i, right end j, i <= j after normalization with same
>> rules as at present);
>>
>> 2. indicate the starting end and direction of movement, left to right
>> (default) or right to left (negate k);
>>
>> 3. indicate whether to pick every member of the slice (k=1, default) or
>> every kth (k > 1), starting with the first item at the indicated end (if
>> there is one) and moving in the appropriate direction.
>
> "pick every kth element" works for k=1 as well as k > 1, no need for a
> special case here. Every 1th element is every element :-)

Right. The reason I special-cased 1 is that -1 is an unambiguous special 
case that can be used to define the -k for k>1 case. Currently, s[::-k] 
== s[::-1][::k]. It has been proposed to change that to s[::k][::-1], 
first by Tim (who changed his mind, as least for now) and perhaps by Nick.

>> My quick take on slicing versus indexing. The slice positions of a
>> single item are i:(i+1). The average is i.5. Some languages (0-based,
>> like Python) round this down to i, others (1-based) round up to i+1.

Because of my experience drawing graphs with dots representing objects 
and with axes with labelled tick marks, I think of slicing in terms of 
labelled tick marks with objects (or object references) 'centered' 
between the tick marks.

|_|_|
0 1 2

If a character is centered between the tick marks labelled 0 and 1, its 
'coordinate' would be .5. I agree that one could get to the same result 
by simply dropping one of the slice endpoints.

Another reason to think of the objects as being at half coordinates is 
that is explains the count correctly without introducing a spurious 
asymmetry. A slice from i to j includes (j-i)+1 slice positions if you 
include i and j or (j-i)-1 if you do not. It include j-i half 
coordinates, which is exactly how many items are included in the slice.

> I don't think it's helpful to talk about averaging or rounding the
> indexes. Better to talk about whether indexes are included or excluded,
> or whether the interval is open (end points are excluded) or closed (end
> points are included).

s[i:j] includes all items between slice positions i and j. I do not 
think that the concept open/closed really applies to slices, as opposed 
to arithmetic interval (whether discrete or continuous). Slicing uses 
slice coordinates, but does not include them in the slice. If one forces 
the concept on slices, the slice interval would be either open or closed 
at both ends. They are definitely not asymmetric, half one, half the 
other. However, the 'length' of a slice is the same as a half-open interval.

-- 
Terry Jan Reedy


From tim.peters at gmail.com  Tue Oct 29 05:39:18 2013
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 28 Oct 2013 23:39:18 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
 <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
 <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>
Message-ID: <CAExdVNn41Qh5pL9C-PRgayqWS3a1Zzt0pV1v5tgdN8_qg5zkxg@mail.gmail.com>

...

[Tim]
> Do you have a specific example of a currently-working slice assignment
> that couldn't easily be done under proposed alternatives?

[Andrew Barnert]
> s[:-4:-2]=1, 2
>
> This replaces the last and antepenultimate elements, whether s is even or
> odd.

> I suppose you could mechanically convert it to this:
>
> s[-mid+2::2]=reversed((1,2))
>
> But I don't know that I'd call that "easy".

Under my & Terry's proposal, it would be written

    s[-3::-2] = 1, 2

And, at least to me, it's far more obvious this way that it affects
(all and only) s[-1] and s[-3].  It's immediate from "OK, s[-3:] is
the last three elements, so only those can possibly be affected.  Then
the stride -2 skips the one in the middle, and picks on the last
element first."

Analyzing the current spelling is a royal PITA.  "OK, umm, ah!  The
stride is negative, so the empty part at the start refers to the last
element of the sequence.  Then the second bit is -4, which is one
larger then we'll actually go.  Oops!  No, the stride is negative
here, so -4 is one *smaller* than we'll actually go.  It will stop at
-4+1 = -3.  I think." ;-)

> The question is whether this is realistic code anyone would ever
> intentionally write.

Obviously so, whenever they need to replace the last element element
with 1 and the antepenultimate element with 2 ;-)

BTW, do you know of any real code that uses a negative stride other
than -1?  Still looking for a real example of that.

From greg.ewing at canterbury.ac.nz  Tue Oct 29 06:05:36 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Oct 2013 18:05:36 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4nesu$eic$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org> <20131029022542.GY7989@ando>
 <l4nesu$eic$1@ger.gmane.org>
Message-ID: <526F4220.1030005@canterbury.ac.nz>

Terry Reedy wrote:
> I have forgotten whatever explanation was given, 
> but their parameters have different meanings. For slices, start and stop 
> are symmetrical and both mark boundaries between what is included and 
> what is excluded. For ranges, start and stop are symmetrical; one is 
> included and the other excluded.

I don't think that's the difference; you can equally well
think of the parameters to range as delineating a slice of
an infinite list of integers.

The real difference is in how negative numbers are
interpreted: range(-2, 3) gives [-2, -1, 0, 1, 2],
whereas slicing does something special with the -2.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Tue Oct 29 06:07:53 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Oct 2013 18:07:53 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4nenk$d2h$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4nenk$d2h$1@ger.gmane.org>
Message-ID: <526F42A9.6080803@canterbury.ac.nz>

Ron Adam wrote:
> If the left and right indices are to be considered separate from the 
> step, we can use this existing legal syntax, and just pass the step 
> after a comma.
> 
>     a[i:j, k]

No, we can't do that, because NumPy uses that for indexing
into a 2-dimensional array.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Tue Oct 29 06:11:59 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Oct 2013 18:11:59 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNn41Qh5pL9C-PRgayqWS3a1Zzt0pV1v5tgdN8_qg5zkxg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
 <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
 <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>
 <CAExdVNn41Qh5pL9C-PRgayqWS3a1Zzt0pV1v5tgdN8_qg5zkxg@mail.gmail.com>
Message-ID: <526F439F.1010902@canterbury.ac.nz>

Tim Peters wrote:
> BTW, do you know of any real code that uses a negative stride other
> than -1?  Still looking for a real example of that.

I tend to scrupulously avoid writing any such code,
precisely because it's so hard to figure out what it
will mean!

I'm even a bit wary of using -1, preferring to use
reversed() if possible.

-- 
Greg

From ron3200 at gmail.com  Tue Oct 29 06:53:27 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Tue, 29 Oct 2013 00:53:27 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526F42A9.6080803@canterbury.ac.nz>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4nenk$d2h$1@ger.gmane.org> <526F42A9.6080803@canterbury.ac.nz>
Message-ID: <l4nige$efk$1@ger.gmane.org>


Meant to send this the list...


On 10/29/2013 12:07 AM, Greg Ewing wrote:
 > Ron Adam wrote:
 >> If the left and right indices are to be considered separate from the
 >> step, we can use this existing legal syntax, and just pass the step after
 >> a comma.
 >>
 >>     a[i:j, k]
 >
 > No, we can't do that, because NumPy uses that for indexing
 > into a 2-dimensional array.

Can you explain why it's an issue?

Currently lists won't accept that,  So NumPy isn't using that spelling with 
lists, and it only requires changing the __getitem__ and __senditem__ 
methods on lists, (which numpy can't be using in this way because it 
currently doens't work.), and it doen't change slice objects, or the slice 
syntax at all?   I can't see how it will effect them.

It's a much smaller change than many of the other sugestions.

Cheers,
    Ron


From rosuav at gmail.com  Tue Oct 29 07:31:25 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 29 Oct 2013 17:31:25 +1100
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526F2EE0.9010705@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
Message-ID: <CAPTjJmoC=t=0yoD0PoyUeTq7YTuWzUvOP+6EBjSJvJa_O5g8Kw@mail.gmail.com>

On Tue, Oct 29, 2013 at 2:43 PM, MRAB <python at mrabarnett.plus.com> wrote:
> But if you're really set on having different types of some kind, how about
> real counting from the left and imaginary counting from the right:
>
>     seq[5j : 0j] # The last 5 items
>
>     seq[1 : 1j] # From second to second-from-last

Interesting idea, but is the notion of indexing a list with a float
going to be another huge can of worms?

ChrisA

From greg.ewing at canterbury.ac.nz  Tue Oct 29 08:14:38 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 29 Oct 2013 20:14:38 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4nige$efk$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4nenk$d2h$1@ger.gmane.org> <526F42A9.6080803@canterbury.ac.nz>
 <l4nige$efk$1@ger.gmane.org>
Message-ID: <526F605E.5090807@canterbury.ac.nz>

Ron Adam wrote:
> Currently lists won't accept that,  So NumPy isn't using that spelling 
> with lists,

You're suggesting that we change the way slicing works
*only* for lists, and not any other indexable type?
What's the point of that?

-- 
Greg

From abarnert at yahoo.com  Tue Oct 29 08:59:07 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 29 Oct 2013 00:59:07 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNn41Qh5pL9C-PRgayqWS3a1Zzt0pV1v5tgdN8_qg5zkxg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
 <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
 <CA2E63C1-C8C5-47C5-9BE9-918ECDCCBB74@yahoo.com>
 <CAExdVNn41Qh5pL9C-PRgayqWS3a1Zzt0pV1v5tgdN8_qg5zkxg@mail.gmail.com>
Message-ID: <26486350-D81C-4EB5-91BE-156219F898EA@yahoo.com>

On Oct 28, 2013, at 21:39, Tim Peters <tim.peters at gmail.com> wrote:

> BTW, do you know of any real code that uses a negative stride other
> than -1?  Still looking for a real example of that.

No. Which is exactly why I suggested just making it illegal instead of giving it a new (somewhat less confusing, but still not obvious) meaning. It doesn't much matter which one is "right" if neither one is useful, does it?

From p.f.moore at gmail.com  Tue Oct 29 11:23:24 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 29 Oct 2013 10:23:24 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
Message-ID: <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>

On 28 October 2013 22:41, Guido van Rossum <guido at python.org> wrote:
> I'm not sure I like new syntax. We'd still have to find a way to represent
> this with slice() and also with range().

It's a shame there isn't an indexing syntax where you can supply an
iterator that produces the set of indexes you want and returns the
subsequence - then we could experiment with alternative semantics in
user code.

So, for example (silly example, because I don't have the time right
now to define an indexing function that matches any of the proposed
solutions):

    >>> def PrimeSlice():
    >>>    yield 2
    >>>    yield 3
    >>>    yield 5
    >>>    yield 7

    >>> 'abcdefgh'[[PrimeSlice()]]
    'bceg'

But of course, to make this user-definable needs new syntax in the
first place :-(

Paul

From robert.kern at gmail.com  Tue Oct 29 12:26:49 2013
From: robert.kern at gmail.com (Robert Kern)
Date: Tue, 29 Oct 2013 11:26:49 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAExdVNn8aRkNgTh_475oDMVvYwOL4gg2BwN+1f=f3nUZyDd3Fw@mail.gmail.com>
 <20131028022046.GU7989@ando>
 <CAExdVNmPsGUKSaz_qW=sDA20w6tjt3Z+Dr8XYU_+GVsXBwBqsA@mail.gmail.com>
 <20131028120921.GW7989@ando>
 <CAExdVNnerCDk634MD6AiJvpgpRn5xRN6VdRyGi9zFhVDyhQ=Ug@mail.gmail.com>
 <l4mjjo$sb$1@ger.gmane.org>
 <CAExdVNmM106z_HNwHa9Cb7KaCyOZjafH4Bc-9o8y6pKaG5V1kg@mail.gmail.com>
 <l4mtaj$cnk$1@ger.gmane.org> <C44BB268-05B3-43CA-95BF-49320FE95D10@yahoo.com>
 <CAExdVNkfhr-5_tJ6egD=H_ZtLL6=P5VMk94L3hdqX+YMh4+=4Q@mail.gmail.com>
Message-ID: <l4o61f$823$1@ger.gmane.org>

On 2013-10-29 00:41, Tim Peters wrote:

> When you see things like
>
>      s[i:j;k] = s[i:j][::k]
>
> *nobody* is suggesting using the spelling on the RHS.  They're
> pointing out a pleasant mathematical equivalence.

Actually, I am suggesting that, in the case of negative k. It is much easier to 
learn, read, and reason about the composition of those two operations than any 
[i:j:k] construction, not matter what semantics you apply to that syntax.

That said, I always use this in numpy where slices are practically free.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


From ncoghlan at gmail.com  Tue Oct 29 14:32:20 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Oct 2013 23:32:20 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
 <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
Message-ID: <CADiSq7d5ymFe0yc5VYb1MVnJFWFNKhQPt4cqmJWeCcSUe5geQA@mail.gmail.com>

On 29 October 2013 20:23, Paul Moore <p.f.moore at gmail.com> wrote:
> On 28 October 2013 22:41, Guido van Rossum <guido at python.org> wrote:
>> I'm not sure I like new syntax. We'd still have to find a way to represent
>> this with slice() and also with range().
>
> It's a shame there isn't an indexing syntax where you can supply an
> iterator that produces the set of indexes you want and returns the
> subsequence - then we could experiment with alternative semantics in
> user code.
>
> So, for example (silly example, because I don't have the time right
> now to define an indexing function that matches any of the proposed
> solutions):
>
>     >>> def PrimeSlice():
>     >>>    yield 2
>     >>>    yield 3
>     >>>    yield 5
>     >>>    yield 7
>
>     >>> 'abcdefgh'[[PrimeSlice()]]
>     'bceg'
>
> But of course, to make this user-definable needs new syntax in the
> first place :-(

Tangent: I thought of a list comprehension based syntax for that a
while ago, but decided it wasn't particularly interesting since it's
too hard to provide sensible fallback behaviour for existing
containers: 'abcdefgh'[x for x in PrimeSlice()]


Back on the topic of slicing with negative steps, I did some
experimentation to see what could be done in current Python using a
callable that produces the appropriate slice objects, and it turns out
you can create a quite usable "rslice" callable, provided you pass in
the length when dealing with mismatched signs on the indices (that's
the advantage of the "[i:j][::-k]" interpretation of the reversed
slice - if you want to interpret it as "[i:j:k][::-1]" as I suggested
previously, I believe you would need to know the length of the
sequence in all cases):

def rslice(*slice_args, length=None):
    """For args (i, j, k) computes a slice equivalent to [i:j][::-k]
(which is not the same as [i:j:-k]!)"""
    forward = slice(*slice_args) # Easiest way to emulate slice arg parsing!
    # Always negate the step
    step = -forward.step
    # Given slice args are closed on the left, open on the right,
    # simply negating the step and swapping left and right will introduce
    # an off-by-one error, so we need to adjust the endpoints to account
    # for the open/closed change
    left = forward.start
    right = forward.stop
    # Check for an empty slice before tinkering with offsets
    if left is not None and right is not None:
        if (left >= 0) != (right >= 0):
            if length is None:
                raise ValueError("Must supply length for indices of
different signs")
            if left < 0:
                left += length
            else:
                right += length
        if left >= right:
            return slice(0, 0, 1)
    stop = left
    if stop is not None:
        # Closed on the left -> open stop value in the reversed slice
        if stop:
            stop -= 1
        else:
            # Converting a start offset of 0 to an end offset of -1 does
            # the wrong thing - need to convert it to None instead
            stop = None
    start = right
    if start is not None:
        # Open on the right -> closed start value in the reversed slice
        if start:
            start -= 1
        else:
            # Converting a stop offset of 0 to a start offset of -1 does
            # the wrong thing - need to convert it to None instead
            start = None
    return slice(start, stop, step)

# Test case
data = range(10)
for i in range(-10, 11):
    for j in range(-10, 11):
        for k in range(1, 11):
            expected = data[i:j][::-k]
            actual = data[rslice(i, j, k, length=len(data))]
            if actual != expected:
                print((i, j, k), actual, expected)

So, at this point, I still quite like the idea of adding a
"reverse=True" keyword only arg to slice and range (with the semantics
of rslice above), and then revisit the idea of offering syntax for it
in Python 3.5. Since slices are objects, they could store the
"reverse" flag internally, and only apply it when the indices() method
(or the C API equivalent) is called to convert the abstract indices to
real ones for the cases where the info is needed - otherwise they'd do
the calculation above to create a suitable "forward" definition for
maximum compatibility with existing container implementations.

A separate keyword only arg like "addlen=True" would also make it
possible to turn off the negative indexing support in a slice object's
indices() method, and switch it to clamping to zero instead.

An alternative to both of those ideas would be to eliminate the
restriction on subclassing slice objects in CPython, then you could
implement slice objects with different indices() method behaviour
(they'd still have to produce a start, stop, step triple though, so
they wouldn't offer the full generality Paul was describing).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From brett at python.org  Tue Oct 29 14:39:23 2013
From: brett at python.org (Brett Cannon)
Date: Tue, 29 Oct 2013 09:39:23 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAN8d9gmJ5EGa_9ttE-UovL7ASTMK9p4EWw4YiNQH1w2G-j0RPg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAN8d9gmJ5EGa_9ttE-UovL7ASTMK9p4EWw4YiNQH1w2G-j0RPg@mail.gmail.com>
Message-ID: <CAP1=2W4+gUfcYAD9sJN+ECC53R7eBbcJZxEJXQLQ6OsuoXw5hw@mail.gmail.com>

On Mon, Oct 28, 2013 at 7:17 PM, Philipp A. <flying-sheep at web.de> wrote:

>
> Am 28.10.2013 16:08 schrieb "Brett Cannon" <brett at python.org>:
>
> > The deprecation would be in there from now until Python 4 so it wouldn't
> be sudden (remember that we are on a roughly 18 month release cycle, so if
> this went into 3.4 that's 7.5 years until this changes in Python 4).
>
> I don't get your calculation: after 3.9 clearly follows 3.10, as versions
> aren't decimal numbers, but tuples of integers.
>
> So we have 1.5?X years, with X being any number from 1 to infinity that
> Guido deems suitable.
>

Because Guido (and I as well) doesn't like minor version numbers that go
past single digits, so the chances of 3.10 are very slim. That's why I put
a cap on the possible number of years before something gets removed.

-Brett


> @proposal:
> -1 for explicit impliciticity in slicing syntax, as it's ass complicated
> as it sounds (when phrased like I just did) and noisier than obfuscated C
>
> +1 for deprecating negative slicing, and teaching people to use reversed.
>
> But I think we should consider adding some sort of slice view function,
> since list[::2] already creates a copy, and reversed(list[::2]) creates two.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131029/6bebd498/attachment.html>

From breamoreboy at yahoo.co.uk  Tue Oct 29 14:55:24 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Tue, 29 Oct 2013 13:55:24 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7d5ymFe0yc5VYb1MVnJFWFNKhQPt4cqmJWeCcSUe5geQA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
 <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
 <CADiSq7d5ymFe0yc5VYb1MVnJFWFNKhQPt4cqmJWeCcSUe5geQA@mail.gmail.com>
Message-ID: <l4oen8$k5v$1@ger.gmane.org>

On 29/10/2013 13:32, Nick Coghlan wrote:
>
> So, at this point, I still quite like the idea of adding a
> "reverse=True" keyword only arg to slice and range (with the semantics
> of rslice above), and then revisit the idea of offering syntax for it
> in Python 3.5.
>

I sincerely hope that you meant reverse=False? :)

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence


From ncoghlan at gmail.com  Tue Oct 29 15:04:30 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 00:04:30 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4oen8$k5v$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
 <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
 <CADiSq7d5ymFe0yc5VYb1MVnJFWFNKhQPt4cqmJWeCcSUe5geQA@mail.gmail.com>
 <l4oen8$k5v$1@ger.gmane.org>
Message-ID: <CADiSq7ewvBGY9kXk-xLOdbLJdZCrHbY5MJ=hV+iYYz4Kqy-9Hw@mail.gmail.com>

On 29 October 2013 23:55, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> On 29/10/2013 13:32, Nick Coghlan wrote:
>>
>>
>> So, at this point, I still quite like the idea of adding a
>> "reverse=True" keyword only arg to slice and range (with the semantics
>> of rslice above), and then revisit the idea of offering syntax for it
>> in Python 3.5.
>>
>
> I sincerely hope that you meant reverse=False? :)

I could claim that I was referring to the way you would call it (which
is why the subclass idea is also attractive), but yes, that's really
just a typo and the default would be the other way around :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From barry at python.org  Tue Oct 29 15:59:18 2013
From: barry at python.org (Barry Warsaw)
Date: Tue, 29 Oct 2013 10:59:18 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAN8d9gmJ5EGa_9ttE-UovL7ASTMK9p4EWw4YiNQH1w2G-j0RPg@mail.gmail.com>
 <CAP1=2W4+gUfcYAD9sJN+ECC53R7eBbcJZxEJXQLQ6OsuoXw5hw@mail.gmail.com>
Message-ID: <20131029105918.4524d621@anarchist>

On Oct 29, 2013, at 09:39 AM, Brett Cannon wrote:

>Because Guido (and I as well) doesn't like minor version numbers that go
>past single digits, so the chances of 3.10 are very slim. That's why I put
>a cap on the possible number of years before something gets removed.

And why 2.6.9 is the end of the line for 2.6. :)

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131029/96a22068/attachment.sig>

From ron3200 at gmail.com  Tue Oct 29 18:49:40 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Tue, 29 Oct 2013 12:49:40 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526F2EE0.9010705@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
Message-ID: <l4osfb$9f7$1@ger.gmane.org>



On 10/28/2013 10:43 PM, MRAB wrote:
>> I think a reverse index object could be easier to understand.  For now it
>> could be just a subclass of int.  Then 0 and rx(0) would be distinguishable
>> from each other.  (-i and rx(i) would be too.)
>>
>>       seq[0:rx(0)]        Default slice.
>>       seq[0:rx(0):-1]     Reversed slice.  (compare to above)
>>
>>       seq[rx(5): rx(0)]   The last 5 items.
>>
>>
>> A syntax could be added later.  (Insert preferred syntax below.)
>>
>>       seq[\5:\0]           The last 5 items
>>
> If you're going to have a reverse index object, shouldn't you also have
> an index object?
>
> I don't like the idea of counting from one end with one type and from
> the other end with another type.

It would be possible to make it work both ways by having a direction 
attribute on it which is set with a unary minus opperation.

      seq[-ix(5): -ix(0)]

Positive integers would work normally too.  Negative ints would just be to 
the left of the first item rather than the left of the last item.



Just had a thought.  In accounting negative numbers are often represented 
as a positive number in parenthes.

        seq[(5,):(0,)]        Last 5 items.

Unfortunately we need the comma to define a single item tuple.  :-/

But this wuold work without adding new syntax or a new type.  And the ',' 
isn't that big of a deal.  It would just take a bit of getting used to it.

Cheers,
     Ron



> But if you're really set on having different types of some kind, how about
> real counting from the left and imaginary counting from the right:
>
>      seq[5j : 0j] # The last 5 items
>
>      seq[1 : 1j] # From second to second-from-last
>
>>
>>
>> How about this example, which would probably use names instead of the
>> integers in real code.
>>
>>       >>> "abcdefg"[3:10]       # 10 is past the end.  (works fine)
>>       'defg'
>>
>> Sliding the range 5 to the left...
>>
>>       >>> "abcdefg"[-2:5]       # -2 is before the beginning?  (Nope)
>>       ''                        # The wrap around gotcha!
>>
>> The same situation happens when indexing from the right side [-i:-j], and
>> sliding the range to the right.  Once j >= 0, it breaks.
>>
>>
>> It would be nice if these worked the same on both ends.  A reverse index
>> object could fix both of these cases.
>>
> If you don't want a negative int to count from the right, then the
> clearest choice I've seen so far is, IHMO, 'end':
>
>      seq[end - 5 : end] # The last 5 items
>
>      seq[1 : end - 1] # From second to second-from-last
>
> I don't know the best way to handle it, but here's an idea: do it in
> the syntax:
>
>      subscript: subscript_test | [subscript_test] ':' [subscript_test]
> [sliceop]
>      subscript_test: test | 'end' '-' test

I think this would work too, but it's not any different than the [\5:\0] 
syntax example.  Just a differnt spelling.

Your example could be done without adding syntax by an end class.  Which is 
effectivly the same as an index class.

Cheers,
    Ron




From python at mrabarnett.plus.com  Tue Oct 29 19:08:00 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 29 Oct 2013 18:08:00 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
 <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
Message-ID: <526FF980.2070306@mrabarnett.plus.com>

On 29/10/2013 10:23, Paul Moore wrote:
> On 28 October 2013 22:41, Guido van Rossum <guido at python.org> wrote:
>> I'm not sure I like new syntax. We'd still have to find a way to represent
>> this with slice() and also with range().
>
> It's a shame there isn't an indexing syntax where you can supply an
> iterator that produces the set of indexes you want and returns the
> subsequence - then we could experiment with alternative semantics in
> user code.
>
> So, for example (silly example, because I don't have the time right
> now to define an indexing function that matches any of the proposed
> solutions):
>
>      >>> def PrimeSlice():
>      >>>    yield 2
>      >>>    yield 3
>      >>>    yield 5
>      >>>    yield 7
>
>      >>> 'abcdefgh'[[PrimeSlice()]]
>      'bceg'
>
> But of course, to make this user-definable needs new syntax in the
> first place :-(
>
We already use (...) and [...], which leaves {...}:

 >>> 'abcdefgh'{PrimeSlice()}
'bceg'


From p.f.moore at gmail.com  Tue Oct 29 20:32:55 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 29 Oct 2013 19:32:55 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <526FF980.2070306@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <d8b7ecc1-dbb9-4248-bf20-e77bdb933394@email.android.com>
 <CAP7+vJKGxtHwwO-vcT5RjyATbPo_Es=vekBnwavLQW2762bBwA@mail.gmail.com>
 <CAExdVNmwrp63NDo6ByhJmK4zdfH4NPqZGyU=Z0F1-15wZoRAmg@mail.gmail.com>
 <20131028002009.6e88487b@fsol>
 <CAP1=2W6qUa612hU3bgQ8=8NUqb-37W_Lt1L31Q4E0u2LqpfbKg@mail.gmail.com>
 <CAHVvXxSu0MAdX1CCMjPeLpyTt7eTTc4XuRTLQeAg-LHCgZ6g-g@mail.gmail.com>
 <CAP1=2W5MUHWwRT21_oeKr+co6=nnPq2AmGWiVw7Pw0-VqjQ=Rw@mail.gmail.com>
 <CAHVvXxSBKDXx_NgzdrhGPH4U7neyJ88sBPQHgCoSBN8C5mWmOg@mail.gmail.com>
 <CADiSq7d9FaMABacJHet5PHVTkdTg46S65kQHMQd+Bqjqpsx=Hg@mail.gmail.com>
 <CAP7+vJ+qyY09+=sbRgrfHMZ77BLbyEv-au-kv8FTonwvyRy-Ag@mail.gmail.com>
 <CACac1F-58Z22nBSK13y_3ddUL7cMu2YwhqJL_DsnH9cS_vnAMA@mail.gmail.com>
 <526FF980.2070306@mrabarnett.plus.com>
Message-ID: <CACac1F-A77L8EcJ8zCtRvHfwBSZcixaS+YdsjmMJ1zvdRcFm-A@mail.gmail.com>

On 29 October 2013 18:08, MRAB <python at mrabarnett.plus.com> wrote:
> On 29/10/2013 10:23, Paul Moore wrote:
>>
>> On 28 October 2013 22:41, Guido van Rossum <guido at python.org> wrote:
>>>
>>> I'm not sure I like new syntax. We'd still have to find a way to
>>> represent
>>> this with slice() and also with range().
>>
>>
>> It's a shame there isn't an indexing syntax where you can supply an
>> iterator that produces the set of indexes you want and returns the
>> subsequence - then we could experiment with alternative semantics in
>> user code.
>>
>> So, for example (silly example, because I don't have the time right
>> now to define an indexing function that matches any of the proposed
>> solutions):
>>
>>      >>> def PrimeSlice():
>>      >>>    yield 2
>>      >>>    yield 3
>>      >>>    yield 5
>>      >>>    yield 7
>>
>>      >>> 'abcdefgh'[[PrimeSlice()]]
>>      'bceg'
>>
>> But of course, to make this user-definable needs new syntax in the
>> first place :-(
>>
> We already use (...) and [...], which leaves {...}:
>
>>>> 'abcdefgh'{PrimeSlice()}
> 'bceg'


You could probably do it by simply adding an extra case to __getitem__
on builtin types: check in order for integer/object with __index__
(single index), Slice object (traditional slice), iterable (series of
arbitrary indices). User defined types would have to implement this in
the same way that they currently have to implement Slice behaviour,
and dictionaries would not behave the same (again, in the same way as
for Slice).

Basically an iteravble doesn't need to be any more of a special case
than Slice (in fact Slice is *more* special, because there is syntax
that generates Slice objects).

Paul

From python at mrabarnett.plus.com  Tue Oct 29 21:19:18 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 29 Oct 2013 20:19:18 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4osfb$9f7$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org>
Message-ID: <52701846.2070604@mrabarnett.plus.com>

On 29/10/2013 17:49, Ron Adam wrote:
> On 10/28/2013 10:43 PM, MRAB wrote:
>>> I think a reverse index object could be easier to understand.  For now it
>>> could be just a subclass of int.  Then 0 and rx(0) would be distinguishable
>>> from each other.  (-i and rx(i) would be too.)
>>>
>>>       seq[0:rx(0)]        Default slice.
>>>       seq[0:rx(0):-1]     Reversed slice.  (compare to above)
>>>
>>>       seq[rx(5): rx(0)]   The last 5 items.
>>>
>>>
>>> A syntax could be added later.  (Insert preferred syntax below.)
>>>
>>>       seq[\5:\0]           The last 5 items
>>>
>> If you're going to have a reverse index object, shouldn't you also have
>> an index object?
>>
>> I don't like the idea of counting from one end with one type and from
>> the other end with another type.
>
> It would be possible to make it work both ways by having a direction
> attribute on it which is set with a unary minus opperation.
>
>        seq[-ix(5): -ix(0)]
>
> Positive integers would work normally too.  Negative ints would just be to
> the left of the first item rather than the left of the last item.
>
>
>
> Just had a thought.  In accounting negative numbers are often represented
> as a positive number in parenthes.
>
>          seq[(5,):(0,)]        Last 5 items.
>
> Unfortunately we need the comma to define a single item tuple.  :-/
>
> But this wuold work without adding new syntax or a new type.  And the ','
> isn't that big of a deal.  It would just take a bit of getting used to it.
>
>
>> But if you're really set on having different types of some kind, how about
>> real counting from the left and imaginary counting from the right:
>>
>>      seq[5j : 0j] # The last 5 items
>>
>>      seq[1 : 1j] # From second to second-from-last
>>
[snip]
Suppose there were two new classes, "index" and "rindex". "index"
counts from the left and "rindex" counts from the right.

You could also use unary ">" and "<":

	>x == index(x)
	<x == rindex(x)

Slicing would be like this:

	seq[<5 : <0] # The last five items
	seq[>1 : <1] # From the second to the second-from-last.

Strictly speaking, str.find and str.index should also return an index
instance. In the case of str.find, if the string wasn't found it would
return >-1 (i.e. index(-1)), which, when used as an index, would raise
an IndexError (index(-1) isn't the same as -1).

In fact, index or rindex instances could end up spreading throughout
the language, to wherever an int is actually an index. (You'd also have
to handle addition and subtraction with indexes, e.g. pos + 1.)

All of which, I suspect, is taking it too far! :-)


From ron3200 at gmail.com  Tue Oct 29 22:25:27 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Tue, 29 Oct 2013 16:25:27 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52701846.2070604@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
Message-ID: <l4p93u$88s$1@ger.gmane.org>



On 10/29/2013 03:19 PM, MRAB wrote:
> On 29/10/2013 17:49, Ron Adam wrote:
>> On 10/28/2013 10:43 PM, MRAB wrote:
>>>> I think a reverse index object could be easier to understand.  For now it
>>>> could be just a subclass of int.  Then 0 and rx(0) would be
>>>> distinguishable
>>>> from each other.  (-i and rx(i) would be too.)
>>>>
>>>>       seq[0:rx(0)]        Default slice.
>>>>       seq[0:rx(0):-1]     Reversed slice.  (compare to above)
>>>>
>>>>       seq[rx(5): rx(0)]   The last 5 items.
>>>>
>>>>
>>>> A syntax could be added later.  (Insert preferred syntax below.)
>>>>
>>>>       seq[\5:\0]           The last 5 items
>>>>
>>> If you're going to have a reverse index object, shouldn't you also have
>>> an index object?
>>>
>>> I don't like the idea of counting from one end with one type and from
>>> the other end with another type.
>>
>> It would be possible to make it work both ways by having a direction
>> attribute on it which is set with a unary minus opperation.
>>
>>        seq[-ix(5): -ix(0)]
>>
>> Positive integers would work normally too.  Negative ints would just be to
>> the left of the first item rather than the left of the last item.
>>
>>
>>
>> Just had a thought.  In accounting negative numbers are often represented
>> as a positive number in parenthes.
>>
>>          seq[(5,):(0,)]        Last 5 items.
>>
>> Unfortunately we need the comma to define a single item tuple.  :-/
>>
>> But this wuold work without adding new syntax or a new type.  And the ','
>> isn't that big of a deal.  It would just take a bit of getting used to it.
>>
>>
>>> But if you're really set on having different types of some kind, how about
>>> real counting from the left and imaginary counting from the right:
>>>
>>>      seq[5j : 0j] # The last 5 items
>>>
>>>      seq[1 : 1j] # From second to second-from-last
>>>
> [snip]
> Suppose there were two new classes, "index" and "rindex". "index"
> counts from the left and "rindex" counts from the right.
>
> You could also use unary ">" and "<":
>
>      >x == index(x)
>      <x == rindex(x)
>
> Slicing would be like this:
>
>      seq[<5 : <0] # The last five items
>      seq[>1 : <1] # From the second to the second-from-last.
>
> Strictly speaking, str.find and str.index should also return an index
> instance. In the case of str.find, if the string wasn't found it would
> return >-1 (i.e. index(-1)), which, when used as an index, would raise
> an IndexError (index(-1) isn't the same as -1).
>
> In fact, index or rindex instances could end up spreading throughout
> the language, to wherever an int is actually an index. (You'd also have
> to handle addition and subtraction with indexes, e.g. pos + 1.)
>
> All of which, I suspect, is taking it too far! :-)

I think it may be the only way to get a clean model of slicing from both 
directions with a 0 based index system.

Cheers,
    Ron






From ncoghlan at gmail.com  Tue Oct 29 23:07:00 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 08:07:00 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4p93u$88s$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
Message-ID: <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>

On 30 Oct 2013 07:26, "Ron Adam" <ron3200 at gmail.com> wrote:
>
>
>
> On 10/29/2013 03:19 PM, MRAB wrote:
>>
>> On 29/10/2013 17:49, Ron Adam wrote:
>>>
>>> On 10/28/2013 10:43 PM, MRAB wrote:
>>>>>
>>>>> I think a reverse index object could be easier to understand.  For
now it
>>>>> could be just a subclass of int.  Then 0 and rx(0) would be
>>>>> distinguishable
>>>>> from each other.  (-i and rx(i) would be too.)
>>>>>
>>>>>       seq[0:rx(0)]        Default slice.
>>>>>       seq[0:rx(0):-1]     Reversed slice.  (compare to above)
>>>>>
>>>>>       seq[rx(5): rx(0)]   The last 5 items.
>>>>>
>>>>>
>>>>> A syntax could be added later.  (Insert preferred syntax below.)
>>>>>
>>>>>       seq[\5:\0]           The last 5 items
>>>>>
>>>> If you're going to have a reverse index object, shouldn't you also have
>>>> an index object?
>>>>
>>>> I don't like the idea of counting from one end with one type and from
>>>> the other end with another type.
>>>
>>>
>>> It would be possible to make it work both ways by having a direction
>>> attribute on it which is set with a unary minus opperation.
>>>
>>>        seq[-ix(5): -ix(0)]
>>>
>>> Positive integers would work normally too.  Negative ints would just be
to
>>> the left of the first item rather than the left of the last item.
>>>
>>>
>>>
>>> Just had a thought.  In accounting negative numbers are often
represented
>>> as a positive number in parenthes.
>>>
>>>          seq[(5,):(0,)]        Last 5 items.
>>>
>>> Unfortunately we need the comma to define a single item tuple.  :-/
>>>
>>> But this wuold work without adding new syntax or a new type.  And the
','
>>> isn't that big of a deal.  It would just take a bit of getting used to
it.
>>>
>>>
>>>> But if you're really set on having different types of some kind, how
about
>>>> real counting from the left and imaginary counting from the right:
>>>>
>>>>      seq[5j : 0j] # The last 5 items
>>>>
>>>>      seq[1 : 1j] # From second to second-from-last
>>>>
>> [snip]
>> Suppose there were two new classes, "index" and "rindex". "index"
>> counts from the left and "rindex" counts from the right.
>>
>> You could also use unary ">" and "<":
>>
>>      >x == index(x)
>>      <x == rindex(x)
>>
>> Slicing would be like this:
>>
>>      seq[<5 : <0] # The last five items
>>      seq[>1 : <1] # From the second to the second-from-last.
>>
>> Strictly speaking, str.find and str.index should also return an index
>> instance. In the case of str.find, if the string wasn't found it would
>> return >-1 (i.e. index(-1)), which, when used as an index, would raise
>> an IndexError (index(-1) isn't the same as -1).
>>
>> In fact, index or rindex instances could end up spreading throughout
>> the language, to wherever an int is actually an index. (You'd also have
>> to handle addition and subtraction with indexes, e.g. pos + 1.)
>>
>> All of which, I suspect, is taking it too far! :-)
>
>
> I think it may be the only way to get a clean model of slicing from both
directions with a 0 based index system.

Isn't all that is needed to prevent the default wraparound behaviour
clamping negative numbers to zero on input?

As in:

def clampleft(start, stop, step):
    if start is not None and start < 0:
        start = 0
    if stop is not None and stop < 0:
        stop = 0
    return slice(start, stop, step)

Similar to rslice and "reverse=False", this could be implemented as a
"range=False" flag (the rationale for the flag name is that in "range",
negative numbers are just negative numbers, without the wraparound
behaviour normally exhibited by the indices calculation in slice objects).

I think there are two reasonable options that could conceivably be included
in 3.4 at this late stage:

* Make slice subclassable and ensure the C API and stdlib respect an
overridden indices() method

* add a "reverse" flag to both slice and range, and a "range" flag to slice.

Either way, if any changes are going to be made, a PEP should be written up
summarising some of the ideas in this thread, including the clampleft() and
rslice() recipes that work in current versions of Python.

Cheers,
Nick.

>
> Cheers,
>    Ron
>
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/633e352c/attachment.html>

From abarnert at yahoo.com  Wed Oct 30 02:04:15 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Tue, 29 Oct 2013 18:04:15 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
Message-ID: <7C705BD6-33BC-47F2-A127-2D63C1E09790@yahoo.com>

On Oct 29, 2013, at 15:07, Nick Coghlan <ncoghlan at gmail.com> wrote:

> Isn't all that is needed to prevent the default wraparound behaviour clamping negative numbers to zero on input?
> 
> As in:
> 
> def clampleft(start, stop, step):
>     if start is not None and start < 0:
>         start = 0
>     if stop is not None and stop < 0:
>         stop = 0
>     return slice(start, stop, step)
> 
Except many of the wraparound cases people complain about are the other way around, negative stop wrapping around to 0. 

You could fix that almost as easily:

def clampright(start, stop, step):
    if start >= 0:
        start = ???
    if stop >= 0:
        stop = None
    return slice(start, stop, step)

Except... What do you set start to if you want to make sure it's past-end? You could force an empty slice (which is the main thing you want) with, e.g., stop=start=0; is that close enough?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131029/9bef5459/attachment-0001.html>

From ncoghlan at gmail.com  Wed Oct 30 02:34:00 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 11:34:00 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <7C705BD6-33BC-47F2-A127-2D63C1E09790@yahoo.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <7C705BD6-33BC-47F2-A127-2D63C1E09790@yahoo.com>
Message-ID: <CADiSq7c3qqm=STwX7mi+=Hyr+zCO3Z17=zp+eXwE-QDA24E+Bw@mail.gmail.com>

On 30 October 2013 11:04, Andrew Barnert <abarnert at yahoo.com> wrote:
> On Oct 29, 2013, at 15:07, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> Isn't all that is needed to prevent the default wraparound behaviour
> clamping negative numbers to zero on input?
>
> As in:
>
> def clampleft(start, stop, step):
>     if start is not None and start < 0:
>         start = 0
>     if stop is not None and stop < 0:
>         stop = 0
>     return slice(start, stop, step)
>
> Except many of the wraparound cases people complain about are the other way
> around, negative stop wrapping around to 0.
>
> You could fix that almost as easily:
>
> def clampright(start, stop, step):
>     if start >= 0:
>         start = ???
>     if stop >= 0:
>         stop = None
>     return slice(start, stop, step)
>
> Except... What do you set start to if you want to make sure it's past-end?
> You could force an empty slice (which is the main thing you want) with,
> e.g., stop=start=0; is that close enough?

Yes, that's what I did in the rslice recipe - if it figured out an
empty slice was needed when explicit bounds were involved, it always
returned "slice(0, 0, step)" regardless of the original inputs.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ron3200 at gmail.com  Wed Oct 30 04:09:56 2013
From: ron3200 at gmail.com (Ron Adam)
Date: Tue, 29 Oct 2013 22:09:56 -0500
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
Message-ID: <l4pt9r$r7b$1@ger.gmane.org>



On 10/29/2013 05:07 PM, Nick Coghlan wrote:
>
> On 30 Oct 2013 07:26, "Ron Adam"
> <ron3200 at gmail.com

>  > I think it may be the only way to get a clean model of slicing from both
> directions with a 0 based index system.
>
> Isn't all that is needed to prevent the default wraparound behaviour
> clamping negative numbers to zero on input?

Well, you can't have indexing from the other end and clamp index's to zero 
at the same time.

The three situations are...

        index both i and j from the front
        index both i and j from the end
        index i from the front + index j from the end.   (?)

And then there is this one, which is what confuses everyone.

        index i from the end + index j from the front.

It's useful in the current slice semantics where I and J are swapped if K 
is negative.  It works, but is not easy to think about clearly.  It's meant 
to match up with start and stop concepts, rather than left and right.

Another issue is weather or not a slice should raise an Index error if it's 
range is outside the length of the sequence.  Both behaviours are useful. 
Currently it doesn't on one end and give the wrong output on the other. 
(When moving a slice left or right.)   :-/

For that matter the wraparound behaviour is sometimes useful too.  But not 
if it's only on the left side.

And then there's the idea of open a closed ends.  Which you have an 
interest in.  Assuming their is four combinations of that... both-closed, 
left-open, right-open, and both-open.



That's a lot of things to be trying to shove into one syntax!



So it seems (to me) we may be better off to just concentrate on writing 
some functions with the desired behaviour(s) and leaving the slice syntax 
question to later.  (But I'm very glad these things are being addressed.)


A function that would cover nearly all of the use cases I can think of...

     # Get items that are within slice range.
     # index's:  l, r, rl, rr --> left, right, rev-left, rev-right
     # The index's always use positive numbers.
     # step and width can be either positive or negative.
     # width - chunk to take at each step.  (If it can work cleanly.)

     get_slice(obj, l=None, r=None, ri=None, rr=None, step=1, width=1)


Used as...

     a = get_slice(s, l=i, r=j)      # index from left end

     a = get_slice(s, rl=i, rr=j)    # index from right end

     a = get_slice(s, l=i, rr=j)     # index from both ends


While that signature definition is long and not too pretty, it could be 
wrapped to make more specialised and nicer to use variations.  Or it could 
be hidden away in __getitem__ methods.

     def mid_slice(obj, i, j, k):
          """Slice obj i and j distance from ends."""
          return get_slice(obj, l=i, rr=j, step=k)


Instead of using flags for these...

        "closed" "open" "open-right" "open-left" "reversed"
        "raise-err"  "wrap-around"

Would it be possible to have those applied with a context manager while the 
object is being indexed?

      with index_mode(seq, "open", "reversed") as s:
          r = mid_slice(s, i, j)

That could work with any slice syntax we use later.  And makes a nice 
building block for creating specialised slice functions.


> As in:
>
> def clampleft(start, stop, step):
>      if start is not None and start < 0:
>          start = 0
>      if stop is not None and stop < 0:
>          stop = 0
>      return slice(start, stop, step)
>
> Similar to rslice and "reverse=False", this could be implemented as a
> "range=False" flag (the rationale for the flag name is that in "range",
> negative numbers are just negative numbers, without the wraparound
> behaviour normally exhibited by the indices calculation in slice objects).

I know some have mentioned unifying range and slice, even though they 
aren't the same thing...  But it suggests doing...

         seq[range(i,j,k)]

I'm not sure there an any real advantage to that other than testing that 
range and slice behave in similar ways.


> I think there are two reasonable options that could conceivably be included
> in 3.4 at this late stage:
>
> * Make slice subclassable and ensure the C API and stdlib respect an
> overridden indices() method

I think that would be good, it would allow some experimentation that may be 
helpful.  Is there any reason to not allow it?


> * add a "reverse" flag to both slice and range, and a "range" flag to slice.
>
> Either way, if any changes are going to be made, a PEP should be written up
> summarising some of the ideas in this thread, including the clampleft() and
> rslice() recipes that work in current versions of Python.

I agree. :-)



Cheers,
    Ron






From ncoghlan at gmail.com  Wed Oct 30 08:39:34 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 17:39:34 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4pt9r$r7b$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
Message-ID: <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>

On 30 October 2013 13:09, Ron Adam <ron3200 at gmail.com> wrote:
> That's a lot of things to be trying to shove into one syntax!

That's why I no longer think it should be handled as syntax. slice is
a builtin, and keyword arguments are a thing.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From oscar.j.benjamin at gmail.com  Wed Oct 30 10:52:37 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Wed, 30 Oct 2013 09:52:37 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
Message-ID: <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>

On 30 October 2013 07:39, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 30 October 2013 13:09, Ron Adam <ron3200 at gmail.com> wrote:
>> That's a lot of things to be trying to shove into one syntax!
>
> That's why I no longer think it should be handled as syntax. slice is
> a builtin, and keyword arguments are a thing.

I assume that you mean to add a reverse keyword argument to the slice
constructor so that I can do:

    b = a[slice(i, j, reverse=True)]

instead of

    b = a[i-1, j-1, -1]

or

    b = a[i:j][::-1]

Firstly would it not be better to add slice.__reversed__ so that it would be

    b = a[reversed(slice(i, j))]

Secondly I don't think I would ever actually want to use this over the
existing possibilities.

There are real problems with slicing and indexing in Python that lead
to corner cases and bugs but this particular issue is not one of them.
The real problems, including the motivating example at the start of
this thread, are caused by the use of negative indices to mean from
the end. Subtracting 1 from the indices when using a negative stride
isn't a big deal but correctly and robustly handling the wraparound
behaviour is. EAFP only works if invalid inputs raise an error and
this is very often not what happens with slicing and indexing.


Oscar

From p.f.moore at gmail.com  Wed Oct 30 11:02:48 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 30 Oct 2013 10:02:48 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
Message-ID: <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>

On 30 October 2013 09:52, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> Firstly would it not be better to add slice.__reversed__ so that it would be
>
>     b = a[reversed(slice(i, j))]

This won't work, because reversed returns an iterator, not a slice object.

> Secondly I don't think I would ever actually want to use this over the
> existing possibilities.

Agreed, while my usage is pretty trivial, I would definitely use

    b = a[::-1]

over

    b = a[Slice(None, None, None, reversed=True)]

I could probably omit some of those None arguments, but I probably
wouldn't simply because I can't remember which are optional.

> There are real problems with slicing and indexing in Python that lead
> to corner cases and bugs but this particular issue is not one of them.
> The real problems, including the motivating example at the start of
> this thread, are caused by the use of negative indices to mean from
> the end.

However, being able to write

    last_n = s[-n:]

is extremely useful. I'm losing track of what is being proposed here,
but I do not want to have to write that as s[len(s)-n:]. Particularly
if "s" is actually a longer variable name, or worse still a calculated
value (which I do a lot).

Paul

From oscar.j.benjamin at gmail.com  Wed Oct 30 11:13:15 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Wed, 30 Oct 2013 10:13:15 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
Message-ID: <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>

On 30 October 2013 10:02, Paul Moore <p.f.moore at gmail.com> wrote:
>> There are real problems with slicing and indexing in Python that lead
>> to corner cases and bugs but this particular issue is not one of them.
>> The real problems, including the motivating example at the start of
>> this thread, are caused by the use of negative indices to mean from
>> the end.
>
> However, being able to write
>
>     last_n = s[-n:]
>
> is extremely useful.

Until you hit the bug where n is 0.

>>> a = 'abcde'
>>> for n in reversed(range(4)):
...     print(n, a[-n:])
...
3 cde
2 de
1 e
0 abcde

This is what I mean by the wraparound behaviour causing corner cases
and bugs. I and others have reported that this is a bigger source of
problems than the off-by-one negative stride issue which has never
caused me any actual problems. Yes I need to think carefully when
writing a negative stride slice but I generally need to think
carefully every time I write any slice particularly a multidimensional
one. The thing that really makes it difficult to reason about slices
is working out whether or not your code is susceptible to wraparound
bugs.

> I'm losing track of what is being proposed here,
> but I do not want to have to write that as s[len(s)-n:]. Particularly
> if "s" is actually a longer variable name, or worse still a calculated
> value (which I do a lot).

But you currently need to write it that way to get the correct behaviour:

>>> for n in reversed(range(4)):
...     print(n, a[len(a)-n:])
...
3 cde
2 de
1 e
0


Oscar

From ncoghlan at gmail.com  Wed Oct 30 11:18:38 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 20:18:38 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
Message-ID: <CADiSq7dAcEOT_RCi3uC_hOVi=eDJWJC+u-TQ0ONhpMSRcA3bZg@mail.gmail.com>

On 30 October 2013 20:02, Paul Moore <p.f.moore at gmail.com> wrote:
> On 30 October 2013 09:52, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>> Firstly would it not be better to add slice.__reversed__ so that it would be
>>
>>     b = a[reversed(slice(i, j))]
>
> This won't work, because reversed returns an iterator, not a slice object.
>
>> Secondly I don't think I would ever actually want to use this over the
>> existing possibilities.
>
> Agreed, while my usage is pretty trivial, I would definitely use
>
>     b = a[::-1]
>
> over
>
>     b = a[Slice(None, None, None, reversed=True)]
>
> I could probably omit some of those None arguments, but I probably
> wouldn't simply because I can't remember which are optional.

Why does that give you trouble when it's identical to what you can
omit from the normal slice syntax? (and from range)


>> There are real problems with slicing and indexing in Python that lead
>> to corner cases and bugs but this particular issue is not one of them.
>> The real problems, including the motivating example at the start of
>> this thread, are caused by the use of negative indices to mean from
>> the end.

And this is one of the things my rslice recipe handles correctly.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ncoghlan at gmail.com  Wed Oct 30 11:22:22 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 20:22:22 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
Message-ID: <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>

On 30 October 2013 20:13, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> But you currently need to write it that way to get the correct behaviour:
>
>>>> for n in reversed(range(4)):
> ...     print(n, a[len(a)-n:])
> ...
> 3 cde
> 2 de
> 1 e
> 0

Regardless, my main point is this: slices are just objects. The syntax:

   s[i:j:k]

is just syntactic sugar for:

  s[slice(i, j, k)]

That means that until people have fully explored exactly the semantics
they want in terms of the existing object model, just as I did for
rslice(), then there are *zero* grounds to be discussing syntax
changes that provide those new semantics.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From p.f.moore at gmail.com  Wed Oct 30 12:35:59 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 30 Oct 2013 11:35:59 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7dAcEOT_RCi3uC_hOVi=eDJWJC+u-TQ0ONhpMSRcA3bZg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CADiSq7dAcEOT_RCi3uC_hOVi=eDJWJC+u-TQ0ONhpMSRcA3bZg@mail.gmail.com>
Message-ID: <CACac1F-12V2hmq=ORnJgTYoAXUvKVfVCK2PA1Qd88Nfs=jUaLg@mail.gmail.com>

On 30 October 2013 10:18, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>     b = a[::-1]
>>
>> over
>>
>>     b = a[Slice(None, None, None, reversed=True)]
>>
>> I could probably omit some of those None arguments, but I probably
>> wouldn't simply because I can't remember which are optional.
>
> Why does that give you trouble when it's identical to what you can
> omit from the normal slice syntax? (and from range)

slice(reversed=True)?

I can omit all the arguments in the indexing case (OK, I enter a step
of -1, but that's equivalent to reversed=True and a step of 1, which
is default). And yet currently slice() fails as a minimum of 1
argument is needed.

I'm not saying that it's ill-defined, just that I'd get confused fast.
So "better to be explicit" (but verbose). And [::-1] is clear and
simple (to me, at least).

Paul

From p.f.moore at gmail.com  Wed Oct 30 12:52:13 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 30 Oct 2013 11:52:13 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
Message-ID: <CACac1F-Ejus9F=6L-WudnHpbqEa=sEZBT5U8NM3Pk8HZqhVbnA@mail.gmail.com>

On 30 October 2013 10:13, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>> However, being able to write
>>
>>     last_n = s[-n:]
>>
>> is extremely useful.
>
> Until you hit the bug where n is 0.
>
>>>> a = 'abcde'
>>>> for n in reversed(range(4)):
> ...     print(n, a[-n:])
> ...
> 3 cde
> 2 de
> 1 e
> 0 abcde
>
> This is what I mean by the wraparound behaviour causing corner cases
> and bugs. I and others have reported that this is a bigger source of
> problems than the off-by-one negative stride issue which has never
> caused me any actual problems.

OK, fair enough. That has *never* been an issue to me, but nor have
negative strides. So I'm in the same boat as Tim, that I never need
any of this so I don't care how it's implemented :-)

What I do care about is that functionality that I do use (s[:-n] where
n is *not* zero) doesn't get removed because it leads to corner cases
that I don't hit but others do. Adding extra functionality with better
boundary conditions is one thing, removing something that people use
*a lot* without issue is different.

Most of my use cases tend to have constant n - something like "if
filename.endswith('.py'): filename = filename[:-3]". Here, using -2
(or -4, or 3j, or len(filename)-3, or whatever ends up being proposed)
isn't too hard, but it doesn't express the extent as clearly to me. Or
I calculate n based on something that means that n will never be 0
(typically some sort of "does this case apply" check like the endswith
above). And again, -n expresses my intent most clearly and won't
trigger bugs.

I'm not saying you don't have real-world use cases, and real bugs
caused by this behaviour, but I am suggesting that it only bites in
particular types of application.

Paul

From ncoghlan at gmail.com  Wed Oct 30 14:45:49 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Oct 2013 23:45:49 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
Message-ID: <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>

On 30 October 2013 20:22, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 30 October 2013 20:13, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>> But you currently need to write it that way to get the correct behaviour:
>>
>>>>> for n in reversed(range(4)):
>> ...     print(n, a[len(a)-n:])
>> ...
>> 3 cde
>> 2 de
>> 1 e
>> 0
>
> Regardless, my main point is this: slices are just objects. The syntax:
>
>    s[i:j:k]
>
> is just syntactic sugar for:
>
>   s[slice(i, j, k)]
>
> That means that until people have fully explored exactly the semantics
> they want in terms of the existing object model, just as I did for
> rslice(), then there are *zero* grounds to be discussing syntax
> changes that provide those new semantics.

Hmm, looks like my rslice testing was broken. Anyway, I created an
enhanced version people using the "End - idx" notation from the end
that actually passes more systematic testing:

https://bitbucket.org/ncoghlan/misc/src/default/rslice.py?at=default

>>> from rslice import rslice, betterslice, End
>>> betterslice(-4, 5)
slice(0, 5, 1)
>>> betterslice(End-4, 5)
slice(-4, 5, 1)
>>> rslice(-4, 5).as_slice(10)
slice(4, -11, -1)
>>> rslice(End-4, 5).as_slice(10)
slice(4, -5, -1)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From shibturn at gmail.com  Sun Oct 27 18:28:39 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Sun, 27 Oct 2013 17:28:39 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
Message-ID: <526D4D47.5090106@gmail.com>



On 27/10/2013 5:04pm, Guido van Rossum wrote:
> Are we stuck with this forever? If we want to fix this in Python 4 we'd
> have to start deprecating negative stride with non-empty lower/upper
> bounds now. And we'd have to start deprecating negative step for range()
> altogether, recommending reversed(range(lower, upper)) instead.

Or recommend using None?

 >>> "abcde"[None:None:-1]
'edcba'

-- 
Richard


From oscar.j.benjamin at gmail.com  Wed Oct 30 15:42:43 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Wed, 30 Oct 2013 14:42:43 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
Message-ID: <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>

On 30 October 2013 13:45, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 30 October 2013 20:22, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> That means that until people have fully explored exactly the semantics
>> they want in terms of the existing object model, just as I did for
>> rslice(), then there are *zero* grounds to be discussing syntax
>> changes that provide those new semantics.
>
> Hmm, looks like my rslice testing was broken. Anyway, I created an
> enhanced version people using the "End - idx" notation from the end
> that actually passes more systematic testing:
>
> https://bitbucket.org/ncoghlan/misc/src/default/rslice.py?at=default

It took me a while to get to that link. I think bitbucket may be
having server problems.

>>>> from rslice import rslice, betterslice, End
>>>> betterslice(-4, 5)
> slice(0, 5, 1)
>>>> betterslice(End-4, 5)
> slice(-4, 5, 1)
>>>> rslice(-4, 5).as_slice(10)
> slice(4, -11, -1)
>>>> rslice(End-4, 5).as_slice(10)
> slice(4, -5, -1)

I like the idea of a magic End object. I would be happy to see
negative indexing deprecated in favour of that. For this to really be
useful though it needs to apply to ordinary indexing as well as
slicing. If it also becomes an error to use negative indices then you
get proper bounds checking as well as an explicit way to show when
you're indexing from the end which is a substantial improvement.


Oscar

From ncoghlan at gmail.com  Wed Oct 30 16:25:25 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 31 Oct 2013 01:25:25 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
Message-ID: <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>

On 31 Oct 2013 00:43, "Oscar Benjamin" <oscar.j.benjamin at gmail.com> wrote:
>
> On 30 October 2013 13:45, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > On 30 October 2013 20:22, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >>
> >> That means that until people have fully explored exactly the semantics
> >> they want in terms of the existing object model, just as I did for
> >> rslice(), then there are *zero* grounds to be discussing syntax
> >> changes that provide those new semantics.
> >
> > Hmm, looks like my rslice testing was broken. Anyway, I created an
> > enhanced version people using the "End - idx" notation from the end
> > that actually passes more systematic testing:
> >
> > https://bitbucket.org/ncoghlan/misc/src/default/rslice.py?at=default
>
> It took me a while to get to that link. I think bitbucket may be
> having server problems.
>
> >>>> from rslice import rslice, betterslice, End
> >>>> betterslice(-4, 5)
> > slice(0, 5, 1)
> >>>> betterslice(End-4, 5)
> > slice(-4, 5, 1)
> >>>> rslice(-4, 5).as_slice(10)
> > slice(4, -11, -1)
> >>>> rslice(End-4, 5).as_slice(10)
> > slice(4, -5, -1)
>
> I like the idea of a magic End object. I would be happy to see
> negative indexing deprecated in favour of that. For this to really be
> useful though it needs to apply to ordinary indexing as well as
> slicing. If it also becomes an error to use negative indices then you
> get proper bounds checking as well as an explicit way to show when
> you're indexing from the end which is a substantial improvement.

That's much harder to do in a backwards compatible way without introducing
both the index() and rindex() types Ron (I think?) suggested (the End
object in my proof-of-concept is a stripped down rindex type), and even
then it's hard to provide both clamping for slices and an index error for
out of bounds item lookup. They both also have the problem that __index__
isn't allowed to return None.

Regardless, the main thing I got out of writing that proof of concept is
that I'd now be +1 on a patch to make it possible and practical to inherit
from slice objects to override their construction and their indices()
method. Previously I would have asked "What's the point?"

Cheers,
Nick.

>
>
> Oscar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131031/bf46df4f/attachment-0001.html>

From techtonik at gmail.com  Wed Oct 30 17:34:35 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Wed, 30 Oct 2013 19:34:35 +0300
Subject: [Python-ideas] os.path.join
Message-ID: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>

  >>> os.path.join('/static', '/styles/largestyles.css')
  '/styles/largestyles.css'

Is it only me who thinks that the code above is wrong?

From geoffspear at gmail.com  Wed Oct 30 17:50:07 2013
From: geoffspear at gmail.com (Geoffrey Spear)
Date: Wed, 30 Oct 2013 12:50:07 -0400
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
Message-ID: <CAGifb9Fm_NVWZ3oYoCJmAxg_GAiCsWG8h7m3tA9GmmBw=vz-zw@mail.gmail.com>

On Wed, Oct 30, 2013 at 12:34 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>   >>> os.path.join('/static', '/styles/largestyles.css')
>   '/styles/largestyles.css'
>
> Is it only me who thinks that the code above is wrong?

No, the code is obviously wrong. What's your idea? To make the bit
about absolute paths in the documentation all bold, red, and blinking?

From bruce at leapyear.org  Wed Oct 30 18:06:03 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 30 Oct 2013 10:06:03 -0700
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
Message-ID: <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>

I don't know if the code is wrong but if you're asking if the *result* of
join is wrong, I don't think it is. It references the same file as these
commands:

cd /static
cat /styles/largestyles,css

I agree it might be confusing but it's pretty explicitly documented. On the
other hand, this is also documented and it's wrong by the above standard

>>> os.path.join(r'c:\abc', r'\def\g')   # Windows paths
'\\def\\g'

On Windows \def\g is a drive-relative path not an absolute path. To get the
right result you need to do:

>>> drive, path = os.path.splitdrive(r'c:\abc')
>>> drive + os.path.join(path, r'/def/g')
'c:/def/g'

This works even on systems that don't use drive letters. It would be nice
if there was a less clumsy way to do this. It's worse than that because it
also screws up UNC paths

>>> os.path.join(r'\\abc\def\ghi', r'\x\y')
'\\x\\y'

The result references a UNC share of \\x\y rather than a directory of x
which is also wrong. It would be nice if there was a simpler way to get
this right:

>>> os.path.join(r'c:\abc', r'\x\y', keep_drive_unc=True)
'c:\\x\\y'
>>> os.path.join(r'\\abc\def\ghi', r'\x\y', keep_drive_unc=True)
'\\\\abc\\def\\x\\y'


--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security


On Wed, Oct 30, 2013 at 9:34 AM, anatoly techtonik <techtonik at gmail.com>wrote:

>   >>> os.path.join('/static', '/styles/largestyles.css')
>   '/styles/largestyles.css'
>
> Is it only me who thinks that the code above is wrong?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/8ab87354/attachment.html>

From python at mrabarnett.plus.com  Wed Oct 30 19:25:04 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Wed, 30 Oct 2013 18:25:04 +0000
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
Message-ID: <52714F00.5070805@mrabarnett.plus.com>

On 30/10/2013 17:06, Bruce Leban wrote:
> I don't know if the code is wrong but if you're asking if the *result*
> of join is wrong, I don't think it is. It references the same file as
> these commands:
>
> cd /static
> cat /styles/largestyles,css
>
> I agree it might be confusing but it's pretty explicitly documented. On
> the other hand, this is also documented and it's wrong by the above standard
>
>  >>> os.path.join(r'c:\abc', r'\def\g')   # Windows paths
> '\\def\\g'
>
> On Windows \def\g is a drive-relative path not an absolute path. To get
> the right result you need to do:
>
>  >>> drive, path = os.path.splitdrive(r'c:\abc')
>  >>> drive + os.path.join(path, r'/def/g')
> 'c:/def/g'
>
> This works even on systems that don't use drive letters. It would be
> nice if there was a less clumsy way to do this. It's worse than that
> because it also screws up UNC paths
>
>  >>> os.path.join(r'\\abc\def\ghi', r'\x\y')
> '\\x\\y'
>
> The result references a UNC share of \\x\y rather than a directory of x
> which is also wrong. It would be nice if there was a simpler way to get
> this right:
>
>  >>> os.path.join(r'c:\abc', r'\x\y', keep_drive_unc=True)
> 'c:\\x\\y'
>  >>> os.path.join(r'\\abc\def\ghi', r'\x\y', keep_drive_unc=True)
> '\\\\abc\\def\\x\\y'
>
Shouldn't that be '\\\\abc\\x\\y'?


From bruce at leapyear.org  Wed Oct 30 20:08:59 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 30 Oct 2013 12:08:59 -0700
Subject: [Python-ideas] os.path.join
In-Reply-To: <52714F00.5070805@mrabarnett.plus.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
 <52714F00.5070805@mrabarnett.plus.com>
Message-ID: <CAGu0AnvDJ_bg-jO8zYQEGZRLyetsNuEHvzjMAWcS37yiiCnARQ@mail.gmail.com>

On Oct 30, 2013 11:25 AM, "MRAB" <python at mrabarnett.plus.com> wrote:
>
> On 30/10/2013 17:06, Bruce Leban wrote:
>>
>>  >>> drive, path = os.path.splitdrive(r'c:\abc')
>>  >>> drive + os.path.join(path, r'/def/g')
>> 'c:/def/g'

After I sent this I realized its more complicated than this. The above code
will fail joining (r'C:\a\b', r'D:\x\y').

>>  >>> os.path.join(r'c:\abc', r'\x\y', keep_drive_unc=True)
>> 'c:\\x\\y'
>>  >>> os.path.join(r'\\abc\def\ghi', r'\x\y', keep_drive_unc=True)
>> '\\\\abc\\def\\x\\y'
>>
> Shouldn't that be '\\\\abc\\x\\y'?

No. A UNC mount point is a server and a share name. \\server on its own
isn't enough. E.g., \\server\C$ is the equivalent of C:

--- Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/2138edbd/attachment.html>

From breamoreboy at yahoo.co.uk  Wed Oct 30 20:41:12 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Wed, 30 Oct 2013 19:41:12 +0000
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
Message-ID: <l4rnch$5b1$2@ger.gmane.org>

On 30/10/2013 16:34, anatoly techtonik wrote:
>    >>> os.path.join('/static', '/styles/largestyles.css')
>    '/styles/largestyles.css'
>
> Is it only me who thinks that the code above is wrong?
>

Is this the appropriate place for such a question?  What is wrong with 
the main Python mailing list, Stackoverflow...?

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence


From greg.ewing at canterbury.ac.nz  Wed Oct 30 22:11:31 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Oct 2013 10:11:31 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
Message-ID: <52717603.6000501@canterbury.ac.nz>

Paul Moore wrote:
> I would definitely use
> 
>     b = a[::-1]
> 
> over
> 
>     b = a[Slice(None, None, None, reversed=True)]

Indeed, the whole reason for having slice syntax is that
it's very concise. One of the things I like most about
Python is that I get to write s[a:b] instead of something
like s.substr(a, b).

I would be very disappointed if I were forced to use
the above monstrosity in some cases.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Wed Oct 30 22:44:45 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Oct 2013 10:44:45 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
Message-ID: <52717DCD.2050803@canterbury.ac.nz>

Nick Coghlan wrote:

> That means that until people have fully explored exactly the semantics
> they want in terms of the existing object model ... then there are *zero*
 > grounds to be discussing syntax changes that provide those new semantics.

I don't think it's possible to decouple the syntactic and
semantic issues that easily.

Consider the problem of how to specify from-the-end indexes
without zero behaving incorrectly. There are a couple of
ways this could be tackled.

One would be to introduce a new type representing an
index from the end. This wouldn't require any new syntax,
but it would be verbose to spell out, so later we would
probably want to consider a new syntax for constructing
this type inside a slice expression.

But if we're willing to consider new syntax, we don't
need a new type -- we can just invent a syntax for
specifying from-the-end indexing directly, and end
up with a simpler design overall.

There would obviously have to be some way of specifying
the same thing that the new syntax specifies using
arguments to slice(), but that would be mostly an
implementation detail. It shouldn't be the driving
force behind the design.

-- 
Greg

From ncoghlan at gmail.com  Wed Oct 30 22:52:27 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 31 Oct 2013 07:52:27 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52717603.6000501@canterbury.ac.nz>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <52717603.6000501@canterbury.ac.nz>
Message-ID: <CADiSq7dRFkJzhBRsZGhONhiH+R+Hbgkx0SWcPCTdSSnXOPj2vw@mail.gmail.com>

On 31 Oct 2013 07:14, "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote:
>
> Paul Moore wrote:
>>
>> I would definitely use
>>
>>     b = a[::-1]
>>
>> over
>>
>>     b = a[Slice(None, None, None, reversed=True)]
>
>
> Indeed, the whole reason for having slice syntax is that
> it's very concise. One of the things I like most about
> Python is that I get to write s[a:b] instead of something
> like s.substr(a, b).
>
> I would be very disappointed if I were forced to use
> the above monstrosity in some cases.

You can't have new syntax without defining the desired semantics for that
syntax first. Since slices are just objects, it doesn't make sense to argue
about syntactic details until the desired semantics are actually clear and
demonstrated in an object based proof-of-concept.

Cheers,
Nick.

>
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131031/168b647f/attachment.html>

From ncoghlan at gmail.com  Wed Oct 30 22:58:28 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 31 Oct 2013 07:58:28 +1000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52717DCD.2050803@canterbury.ac.nz>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <52717DCD.2050803@canterbury.ac.nz>
Message-ID: <CADiSq7d0ORMS-VeP-BPCojPMi-7B1CU-brcWK932Ye0WFhwTgw@mail.gmail.com>

On 31 Oct 2013 07:45, "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote:
>
> Nick Coghlan wrote:
>
>> That means that until people have fully explored exactly the semantics
>> they want in terms of the existing object model ... then there are *zero*
>
> > grounds to be discussing syntax changes that provide those new
semantics.
>
> I don't think it's possible to decouple the syntactic and
> semantic issues that easily.
>
> Consider the problem of how to specify from-the-end indexes
> without zero behaving incorrectly. There are a couple of
> ways this could be tackled.
>
> One would be to introduce a new type representing an
> index from the end. This wouldn't require any new syntax,
> but it would be verbose to spell out, so later we would
> probably want to consider a new syntax for constructing
> this type inside a slice expression.
>
> But if we're willing to consider new syntax, we don't
> need a new type -- we can just invent a syntax for
> specifying from-the-end indexing directly, and end
> up with a simpler design overall.

It isn't simpler though - since, as you note below, anything we can express
in the syntax *must* be expressible in the slice() API. Slice notation is
currently pure syntactic sugar and it should stay that way.

>
> There would obviously have to be some way of specifying
> the same thing that the new syntax specifies using
> arguments to slice(), but that would be mostly an
> implementation detail. It shouldn't be the driving
> force behind the design.

Who said it was? But defining an object API first lets us define and test
proposed semantics in pure Python, avoiding any reliance on abstract
handwaving.

Cheers,
Nick.

>
>
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131031/82336bca/attachment.html>

From greg.ewing at canterbury.ac.nz  Wed Oct 30 23:10:35 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Oct 2013 11:10:35 +1300
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
Message-ID: <527183DB.6000607@canterbury.ac.nz>

Bruce Leban wrote:
> It references the same file as 
> these commands:
> 
> cd /static
> cat /styles/largestyles,css
> 
> On 
> the other hand, this is also documented and it's wrong by the above standard
> 
>  >>> os.path.join(r'c:\abc', r'\def\g')   # Windows paths
> '\\def\\g'

Actually, it's not -- it gives the same result as the
equivalent series of Windows CLI commands. Whether that's
a *useful* result is another matter. :-(

-- 
Greg

From rymg19 at gmail.com  Wed Oct 30 23:35:57 2013
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Wed, 30 Oct 2013 17:35:57 -0500
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
Message-ID: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>

The recent thread/post/whatever on os.path.join has gotten me thinking. Say
I wanted to join a Windows path...on Ubuntu. This is what I get:

ryan at DevPC-LX:~$ python
Python 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.join('C:\\', 'x.jpg')
'C:\\/x.jpg'
>>>

Isn't something wrong there? My idea: check for \'s in the path. If there
are any, assume \ is the path separator, not /.

-- 
Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/03c694bf/attachment.html>

From elazarg at gmail.com  Wed Oct 30 23:43:19 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Thu, 31 Oct 2013 00:43:19 +0200
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
Message-ID: <CAPw6O2QMVu_UghVsPUijUMxLLxkJRHBx61Ei2No9Zwm+gYSgDw@mail.gmail.com>

2013/10/31 Ryan Gonzalez <rymg19 at gmail.com>:
>>>> import os
>>>> os.path.join('C:\\', 'x.jpg')
> 'C:\\/x.jpg'
>>>>
>
> Isn't something wrong there? My idea: check for \'s in the path. If there
> are any, assume \ is the path separator, not /.

No, nothing is wrong:

C:\Dev>cd C:\/temp

C:\Temp>

From tjreedy at udel.edu  Wed Oct 30 23:47:45 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 30 Oct 2013 18:47:45 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
Message-ID: <l4s2a9$ar5$1@ger.gmane.org>

On 10/30/2013 11:25 AM, Nick Coghlan wrote:
>
> On 31 Oct 2013 00:43, "Oscar Benjamin"

>  > I like the idea of a magic End object. I would be happy to see
>  > negative indexing deprecated in favour of that. For this to really be
>  > useful though it needs to apply to ordinary indexing as well as
>  > slicing. If it also becomes an error to use negative indices then you
>  > get proper bounds checking as well as an explicit way to show when
>  > you're indexing from the end which is a substantial improvement.

I though of using a magic symbol, $, for that -- a[$-n]. But aside from
the issue of using one of the 2 remaining unused ascii symbols for
something that can already be done, it would not work in a slice call.

> That's much harder to do in a backwards compatible way without
> introducing both the index() and rindex() types Ron (I think?) suggested
> (the End object in my proof-of-concept is a stripped down rindex type),
> and even then it's hard to provide both clamping for slices and an index
> error for out of bounds item lookup. They both also have the problem
> that __index__ isn't allowed to return None.
>
> Regardless, the main thing I got out of writing that proof of concept is
> that I'd now be +1 on a patch to make it possible and practical to
> inherit from slice objects to override their construction and their
> indices() method. Previously I would have asked "What's the point?"

Indeed you did ;-)
 From the fourth message of http://bugs.python.org/issue17279:

"From the current [2013 Feb] python-ideas 'range' thread:
Me: Would it be correct to say (now) that all 4 are intentional 
omissions? and not merely oversights?
Nick: Yes, I think so. People will have to be *real* convincing to 
explain a case where composition isn't a more appropriate solution."

I think one point is that if seq.__getitem__(ob) uses 'if isinstance(ob, 
slice):' instead of 'if type(ob) is slice:', subclass instances will 
work whereas wrapper instances would not. I would make range 
subclassable at the same time.

-- 
Terry Jan Reedy


From guido at python.org  Wed Oct 30 23:50:39 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Oct 2013 15:50:39 -0700
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
Message-ID: <CAP7+vJLN_i3iEmPThWAYin6TBWGV29Y2phaRm8M9j7P+Q-a9EA@mail.gmail.com>

No, nothing's wrong. You should use the ntpath module in this case.

You should also be using Python 3. :-)

$ python3
Python 3.4.0a4+ (default:0917f6c62c62, Oct 22 2013, 10:55:35)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ntpath
>>> ntpath.join('C:\\', 'x.jpg')
'C:\\x.jpg'
>>>



On Wed, Oct 30, 2013 at 3:35 PM, Ryan Gonzalez <rymg19 at gmail.com> wrote:

> The recent thread/post/whatever on os.path.join has gotten me thinking.
> Say I wanted to join a Windows path...on Ubuntu. This is what I get:
>
> ryan at DevPC-LX:~$ python
> Python 2.7.3 (default, Sep 26 2013, 20:03:06)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> >>> os.path.join('C:\\', 'x.jpg')
> 'C:\\/x.jpg'
> >>>
>
> Isn't something wrong there? My idea: check for \'s in the path. If there
> are any, assume \ is the path separator, not /.
>
> --
> Ryan
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/aff21b13/attachment.html>

From bruce at leapyear.org  Wed Oct 30 23:51:01 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 30 Oct 2013 15:51:01 -0700
Subject: [Python-ideas] os.path.join
In-Reply-To: <527183DB.6000607@canterbury.ac.nz>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
 <527183DB.6000607@canterbury.ac.nz>
Message-ID: <CAGu0Anttu69FDBjuKuKSq-2fNghc5Ffg1q5Z+gDvMfT3K3NqHQ@mail.gmail.com>

On Wed, Oct 30, 2013 at 3:10 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> On the other hand, this is also documented and it's wrong by the above
>> standard
>>
>>  >>> os.path.join(r'c:\abc', r'\def\g')   # Windows paths
>> '\\def\\g'
>>
>
> Actually, it's not -- it gives the same result as the
> equivalent series of Windows CLI commands. Whether that's
> a *useful* result is another matter. :-(
>

I meant cd /D -- which does a true change working directory rather than
bare cd which just changes the directory for the drive but doesn't change
the working drive. Windows maintains separate working directories for each
drive, which frequently surprises users.

Perhaps an os.path.joinw or join_windows_paths would be a good idea.



--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/eeeefef3/attachment-0001.html>

From rymg19 at gmail.com  Wed Oct 30 23:58:14 2013
From: rymg19 at gmail.com (Ryan Gonzalez)
Date: Wed, 30 Oct 2013 17:58:14 -0500
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAP7+vJLN_i3iEmPThWAYin6TBWGV29Y2phaRm8M9j7P+Q-a9EA@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
 <CAP7+vJLN_i3iEmPThWAYin6TBWGV29Y2phaRm8M9j7P+Q-a9EA@mail.gmail.com>
Message-ID: <CAO41-mNz6-VSQTgo4P5k=vJs2RPkMfCYZwCzyepSmbp7q2Z1wg@mail.gmail.com>

1.Python 3 doesn't come with Ubuntu
2.I still get irritated by that darn print statement
3.Everything I've written is for Python 2. I'm too lazy to port right now

Plus:




On Wed, Oct 30, 2013 at 5:50 PM, Guido van Rossum <guido at python.org> wrote:

> No, nothing's wrong. You should use the ntpath module in this case.
>
> You should also be using Python 3. :-)
>
> $ python3
> Python 3.4.0a4+ (default:0917f6c62c62, Oct 22 2013, 10:55:35)
> [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
>
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import ntpath
> >>> ntpath.join('C:\\', 'x.jpg')
> 'C:\\x.jpg'
> >>>
>
>
>
> On Wed, Oct 30, 2013 at 3:35 PM, Ryan Gonzalez <rymg19 at gmail.com> wrote:
>
>> The recent thread/post/whatever on os.path.join has gotten me thinking.
>> Say I wanted to join a Windows path...on Ubuntu. This is what I get:
>>
>> ryan at DevPC-LX:~$ python
>> Python 2.7.3 (default, Sep 26 2013, 20:03:06)
>> [GCC 4.6.3] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> >>> import os
>> >>> os.path.join('C:\\', 'x.jpg')
>> 'C:\\/x.jpg'
>> >>>
>>
>> Isn't something wrong there? My idea: check for \'s in the path. If there
>> are any, assume \ is the path separator, not /.
>>
>> --
>> Ryan
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>



-- 
Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/4743a29c/attachment.html>

From ericsnowcurrently at gmail.com  Thu Oct 31 00:00:42 2013
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 30 Oct 2013 17:00:42 -0600
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4s2a9$ar5$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
Message-ID: <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>

On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
> the issue of using one of the 2 remaining unused ascii symbols for
> something that can already be done, it would not work in a slice call.

Is that like where you have 1 more shot on your camera and you don't
want to use it for fear that something more spectacular might show up
afterward?  (and hope that you didn't leave your lens cap on when you
finally take the picture!)  :-)

-eric

From bruce at leapyear.org  Wed Oct 30 23:56:44 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 30 Oct 2013 15:56:44 -0700
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAPw6O2QMVu_UghVsPUijUMxLLxkJRHBx61Ei2No9Zwm+gYSgDw@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
 <CAPw6O2QMVu_UghVsPUijUMxLLxkJRHBx61Ei2No9Zwm+gYSgDw@mail.gmail.com>
Message-ID: <CAGu0AnuHqOFCf1tHM7RcJFuth+7GaEBY-h0Nj9jE2vUanbsEvQ@mail.gmail.com>

ntpath still gets drive-relative paths wrong on Windows:

>>> ntpath.join(r'\\a\b\c\d', r'\e\f')
'\\e\\f'
# should be r'\\a\b\e\f'

>>> ntpath.join(r'C:\a\b\c\d', r'\e\f')
'\\e\\f'
# should be r'C:\e\f'

(same behavior in Python 2.7 and 3.3)

--- Bruce
I'm hiring: http://www.cadencemd.com/info/jobs
Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
Learn how hackers think: http://j.mp/gruyere-security


On Wed, Oct 30, 2013 at 3:43 PM, ????? <elazarg at gmail.com> wrote:

> 2013/10/31 Ryan Gonzalez <rymg19 at gmail.com>:
> >>>> import os
> >>>> os.path.join('C:\\', 'x.jpg')
> > 'C:\\/x.jpg'
> >>>>
> >
> > Isn't something wrong there? My idea: check for \'s in the path. If there
> > are any, assume \ is the path separator, not /.
>
> No, nothing is wrong:
>
> C:\Dev>cd C:\/temp
>
> C:\Temp>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/0b82c782/attachment.html>

From guido at python.org  Thu Oct 31 00:09:35 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Oct 2013 16:09:35 -0700
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAGu0AnuHqOFCf1tHM7RcJFuth+7GaEBY-h0Nj9jE2vUanbsEvQ@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
 <CAPw6O2QMVu_UghVsPUijUMxLLxkJRHBx61Ei2No9Zwm+gYSgDw@mail.gmail.com>
 <CAGu0AnuHqOFCf1tHM7RcJFuth+7GaEBY-h0Nj9jE2vUanbsEvQ@mail.gmail.com>
Message-ID: <CAP7+vJ+H9rbe87wAj15RWyNoopR2NPB8NUBbwviPB6bp4knnGQ@mail.gmail.com>

Yeah, ntpath doen't know about UNC paths. :-( We should fix this. We should
also make sure that PEP 428 (pathlib) does this right from day 1.


On Wed, Oct 30, 2013 at 3:56 PM, Bruce Leban <bruce at leapyear.org> wrote:

> ntpath still gets drive-relative paths wrong on Windows:
>
> >>> ntpath.join(r'\\a\b\c\d', r'\e\f')
> '\\e\\f'
> # should be r'\\a\b\e\f'
>
> >>> ntpath.join(r'C:\a\b\c\d', r'\e\f')
> '\\e\\f'
> # should be r'C:\e\f'
>
> (same behavior in Python 2.7 and 3.3)
>
> --- Bruce
> I'm hiring: http://www.cadencemd.com/info/jobs
> Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
> Learn how hackers think: http://j.mp/gruyere-security
>
>
> On Wed, Oct 30, 2013 at 3:43 PM, ????? <elazarg at gmail.com> wrote:
>
>> 2013/10/31 Ryan Gonzalez <rymg19 at gmail.com>:
>> >>>> import os
>> >>>> os.path.join('C:\\', 'x.jpg')
>> > 'C:\\/x.jpg'
>> >>>>
>> >
>> > Isn't something wrong there? My idea: check for \'s in the path. If
>> there
>> > are any, assume \ is the path separator, not /.
>>
>> No, nothing is wrong:
>>
>> C:\Dev>cd C:\/temp
>>
>> C:\Temp>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/c794db9d/attachment-0001.html>

From guido at python.org  Thu Oct 31 00:12:56 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Oct 2013 16:12:56 -0700
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAP7+vJ+H9rbe87wAj15RWyNoopR2NPB8NUBbwviPB6bp4knnGQ@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
 <CAPw6O2QMVu_UghVsPUijUMxLLxkJRHBx61Ei2No9Zwm+gYSgDw@mail.gmail.com>
 <CAGu0AnuHqOFCf1tHM7RcJFuth+7GaEBY-h0Nj9jE2vUanbsEvQ@mail.gmail.com>
 <CAP7+vJ+H9rbe87wAj15RWyNoopR2NPB8NUBbwviPB6bp4knnGQ@mail.gmail.com>
Message-ID: <CAP7+vJLOLO=p-JC=pLtP_ZSJRj5utrSABdXb+9d=AD0mUTCcWQ@mail.gmail.com>

(Sorry, it's not just UNC paths -- it's all paths with drives.)


On Wed, Oct 30, 2013 at 4:09 PM, Guido van Rossum <guido at python.org> wrote:

> Yeah, ntpath doen't know about UNC paths. :-( We should fix this. We
> should also make sure that PEP 428 (pathlib) does this right from day 1.
>
>
> On Wed, Oct 30, 2013 at 3:56 PM, Bruce Leban <bruce at leapyear.org> wrote:
>
>> ntpath still gets drive-relative paths wrong on Windows:
>>
>> >>> ntpath.join(r'\\a\b\c\d', r'\e\f')
>> '\\e\\f'
>> # should be r'\\a\b\e\f'
>>
>> >>> ntpath.join(r'C:\a\b\c\d', r'\e\f')
>> '\\e\\f'
>> # should be r'C:\e\f'
>>
>> (same behavior in Python 2.7 and 3.3)
>>
>> --- Bruce
>> I'm hiring: http://www.cadencemd.com/info/jobs
>> Latest blog post: Alice's Puzzle Page http://www.vroospeak.com
>> Learn how hackers think: http://j.mp/gruyere-security
>>
>>
>> On Wed, Oct 30, 2013 at 3:43 PM, ????? <elazarg at gmail.com> wrote:
>>
>>> 2013/10/31 Ryan Gonzalez <rymg19 at gmail.com>:
>>> >>>> import os
>>> >>>> os.path.join('C:\\', 'x.jpg')
>>> > 'C:\\/x.jpg'
>>> >>>>
>>> >
>>> > Isn't something wrong there? My idea: check for \'s in the path. If
>>> there
>>> > are any, assume \ is the path separator, not /.
>>>
>>> No, nothing is wrong:
>>>
>>> C:\Dev>cd C:\/temp
>>>
>>> C:\Temp>
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/8226ec9f/attachment.html>

From ben+python at benfinney.id.au  Thu Oct 31 00:23:49 2013
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 31 Oct 2013 10:23:49 +1100
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
 <CAP7+vJLN_i3iEmPThWAYin6TBWGV29Y2phaRm8M9j7P+Q-a9EA@mail.gmail.com>
 <CAO41-mNz6-VSQTgo4P5k=vJs2RPkMfCYZwCzyepSmbp7q2Z1wg@mail.gmail.com>
Message-ID: <7wppqmxrzu.fsf@benfinney.id.au>

Ryan Gonzalez <rymg19 at gmail.com> writes:

> 3.Everything I've written is for Python 2. I'm too lazy to port right
> now

Improvements such as are being suggested in this thread will not be made
in Python 2. So this discussion is either about improvements proposed in
Python 3 and later, or it's not worth having.

-- 
 \       ?What I resent is that the range of your vision should be the |
  `\                                 limit of my action.? ?Henry James |
_o__)                                                                  |
Ben Finney


From python at mrabarnett.plus.com  Thu Oct 31 00:59:03 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Wed, 30 Oct 2013 23:59:03 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
Message-ID: <52719D47.1010507@mrabarnett.plus.com>

On 30/10/2013 23:00, Eric Snow wrote:
> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
>> the issue of using one of the 2 remaining unused ascii symbols for
>> something that can already be done, it would not work in a slice call.
>
> Is that like where you have 1 more shot on your camera and you don't
> want to use it for fear that something more spectacular might show up
> afterward?  (and hope that you didn't leave your lens cap on when you
> finally take the picture!)  :-)
>
I don't think it's that bad; I count 3: "!", "$" and "?". :-)


From python at mrabarnett.plus.com  Thu Oct 31 01:06:03 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Oct 2013 00:06:03 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CADiSq7dRFkJzhBRsZGhONhiH+R+Hbgkx0SWcPCTdSSnXOPj2vw@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <526D4FF7.6010106@mrabarnett.plus.com>
 <CAP7+vJK8N8Vg0SMhT=JQ4coK-Yd9Gpt4W9-FaSDSAbSFWudDhw@mail.gmail.com>
 <l4julq$tnd$1@ger.gmane.org> <526DA58B.7080504@canterbury.ac.nz>
 <CAPTjJmp+H_SfXwn+JEvDp7bewWYxET8CRcZwzyvXrHkJCtGobA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <52717603.6000501@canterbury.ac.nz>
 <CADiSq7dRFkJzhBRsZGhONhiH+R+Hbgkx0SWcPCTdSSnXOPj2vw@mail.gmail.com>
Message-ID: <52719EEB.2060904@mrabarnett.plus.com>

On 30/10/2013 21:52, Nick Coghlan wrote:
>
> On 31 Oct 2013 07:14, "Greg Ewing" <greg.ewing at canterbury.ac.nz
> <mailto:greg.ewing at canterbury.ac.nz>> wrote:
>  >
>  > Paul Moore wrote:
>  >>
>  >> I would definitely use
>  >>
>  >>     b = a[::-1]
>  >>
>  >> over
>  >>
>  >>     b = a[Slice(None, None, None, reversed=True)]
>  >
>  >
>  > Indeed, the whole reason for having slice syntax is that
>  > it's very concise. One of the things I like most about
>  > Python is that I get to write s[a:b] instead of something
>  > like s.substr(a, b).
>  >
>  > I would be very disappointed if I were forced to use
>  > the above monstrosity in some cases.
>
> You can't have new syntax without defining the desired semantics for
> that syntax first. Since slices are just objects, it doesn't make sense
> to argue about syntactic details until the desired semantics are
> actually clear and demonstrated in an object based proof-of-concept.
>
How about a new function "rev" which returns the reverse of its argument:

def rev(arg):
     if isinstance(arg, str):
         return ''.join(reversed(arg))

     return type(arg)(reversed(arg))

The disadvantage is that it would be slicing and then reversing, so 2
steps, which is less efficient.

From elazarg at gmail.com  Thu Oct 31 01:05:17 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Thu, 31 Oct 2013 02:05:17 +0200
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52719D47.1010507@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
Message-ID: <CAPw6O2TNYzVbBr9dVZJL-LEdPiQ2XzZUw2R3nyR7DJgm4pmEnA@mail.gmail.com>

2013/10/31 MRAB <python at mrabarnett.plus.com>:
> On 30/10/2013 23:00, Eric Snow wrote:
>>
>> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>
>>> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
>>> the issue of using one of the 2 remaining unused ascii symbols for
>>> something that can already be done, it would not work in a slice call.
>>
>>
>> Is that like where you have 1 more shot on your camera and you don't
>> want to use it for fear that something more spectacular might show up
>> afterward?  (and hope that you didn't leave your lens cap on when you
>> finally take the picture!)  :-)
>>
> I don't think it's that bad; I count 3: "!", "$" and "?". :-)
>
Can't it be done by adding a __sub__ method to len?

a[:len-n]

Readable and short.

From alexander.belopolsky at gmail.com  Thu Oct 31 01:17:00 2013
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 30 Oct 2013 20:17:00 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52719D47.1010507@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
Message-ID: <CAP7h-xZAgv2yRg_BATOdMjtG+oMkuOmUdtef3_MC-6v642bVFg@mail.gmail.com>

On Wed, Oct 30, 2013 at 7:59 PM, MRAB <python at mrabarnett.plus.com> wrote:

> I don't think it's that bad; I count 3: "!", "$" and "?". :-)


Wasn't use of "`" dropped from Python 3?  This makes it 4!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/5542ff35/attachment.html>

From elazarg at gmail.com  Thu Oct 31 02:01:12 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Thu, 31 Oct 2013 03:01:12 +0200
Subject: [Python-ideas] Allow attribute references for decimalinteger
Message-ID: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>

There's an unnecessary corner case regarding integer literals and attributes:
>>> 1..real
1.0
>>> 1.0.real
1.0
>>> 1..real
1.0
>>> 1. .real
1.0
>>> .1.real
0.1
>>> 1 .real
1
>>> (1).real
1
>>> 1.real
  File "<stdin>", line 1
    1.real
         ^
SyntaxError: invalid syntax

Why does it fail? To my human eyes it seems (almost) completely
unambiguous. Is it a lexing thing? I couldn't find an explanation (or
any reference at all, although there should be) in the docs.
The only ambiguity I can see is 1.j - but the desired meaning is
clear: 1j (just like 1.0j).

It may confuse beginners; it made me believe that there's no such
thing as 1.__class__ a couple of years ago. I guess it's bad for code
generators too.

I suggest making "1.identifier" legal, and adding `j` and `J`
properties to numbers.Number to mean the sensible thing (so 0.j is not
a special syntax as it today).

Elazar

From greg.ewing at canterbury.ac.nz  Thu Oct 31 02:16:23 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Oct 2013 14:16:23 +1300
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52719D47.1010507@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
Message-ID: <5271AF67.3050407@canterbury.ac.nz>

On 31/10/13 12:59, MRAB wrote:

> I don't think it's that bad; I count 3: "!", "$" and "?". :-)

And we also have ` in reserve if we get really desperate.

Hmmm... backquote... backwards indexing...

(Ducks as Tim Peters throws a bucket of grit that he's
cleaned off his monitor.)

-- 
Greg


From alexander.belopolsky at gmail.com  Thu Oct 31 02:17:12 2013
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 30 Oct 2013 21:17:12 -0400
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
Message-ID: <CAP7h-xYJ=3cWXNNwccWj7K_SQ0ouhZsm4=k5bVRKq7tc=gX_2Q@mail.gmail.com>

On Wed, Oct 30, 2013 at 9:01 PM, ????? <elazarg at gmail.com> wrote:

> >>> 1.real
>   File "<stdin>", line 1
>     1.real
>          ^
> SyntaxError: invalid syntax
>
> Why does it fail? To my human eyes it seems (almost) completely
> unambiguous. Is it a lexing thing?
>

Yes.  The first . following a digit makes into the same float token.  To
make . its own token the number must be complete by the time the tokenizer
sees it.  I don't think your proposal is implementable without making
parser significantly more complicated.

> The only ambiguity I can see is 1.j  ..

What would 1.e50 mean under your proposal?  Currently we have

>>> 1.e50
1e+50
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131030/2e4a1747/attachment.html>

From python at mrabarnett.plus.com  Thu Oct 31 02:32:18 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Oct 2013 01:32:18 +0000
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
Message-ID: <5271B322.1000108@mrabarnett.plus.com>

On 31/10/2013 01:01, ????? wrote:
> There's an unnecessary corner case regarding integer literals and attributes:
>>>> 1..real
> 1.0
>>>> 1.0.real
> 1.0
>>>> 1..real
> 1.0
>>>> 1. .real
> 1.0
>>>> .1.real
> 0.1
>>>> 1 .real
> 1
>>>> (1).real
> 1
>>>> 1.real
>    File "<stdin>", line 1
>      1.real
>           ^
> SyntaxError: invalid syntax
>
> Why does it fail? To my human eyes it seems (almost) completely
> unambiguous. Is it a lexing thing? I couldn't find an explanation (or
> any reference at all, although there should be) in the docs.
> The only ambiguity I can see is 1.j - but the desired meaning is
> clear: 1j (just like 1.0j).
>
It'll make the lexer more complicated if it can't tell whether the "."
is part of a float literal or not, and, anyway, it's already the case
that 1.j == 1.0j, not (1).j, so saying that 1.real == (1).real would be
inconsistent.

> It may confuse beginners; it made me believe that there's no such
> thing as 1.__class__ a couple of years ago. I guess it's bad for code
> generators too.
>
> I suggest making "1.identifier" legal, and adding `j` and `J`
> properties to numbers.Number to mean the sensible thing (so 0.j is not
> a special syntax as it today).
>
I'd prefer it if Python insisted that a decimal point be preceded and
followed by a digit, but changing it might break existing code. It's
one of those changes that could've been made in Python 3, I suppose,
but it's not something I'm losing any sleep over! :-)

From python at mrabarnett.plus.com  Thu Oct 31 02:33:45 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Oct 2013 01:33:45 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAP7h-xZAgv2yRg_BATOdMjtG+oMkuOmUdtef3_MC-6v642bVFg@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
 <CAP7h-xZAgv2yRg_BATOdMjtG+oMkuOmUdtef3_MC-6v642bVFg@mail.gmail.com>
Message-ID: <5271B379.1040009@mrabarnett.plus.com>

On 31/10/2013 00:17, Alexander Belopolsky wrote:
>
> On Wed, Oct 30, 2013 at 7:59 PM, MRAB <python at mrabarnett.plus.com
> <mailto:python at mrabarnett.plus.com>> wrote:
>
>     I don't think it's that bad; I count 3: "!", "$" and "?". :-)
>
>
> Wasn't use of "`" dropped from Python 3?  This makes it 4!
>
Wasn't one of the reasons it was dropped because it looked too much
like "'"? Well, it still does! :-)


From python at mrabarnett.plus.com  Thu Oct 31 02:36:21 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Oct 2013 01:36:21 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <CAPw6O2TNYzVbBr9dVZJL-LEdPiQ2XzZUw2R3nyR7DJgm4pmEnA@mail.gmail.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
 <CAPw6O2TNYzVbBr9dVZJL-LEdPiQ2XzZUw2R3nyR7DJgm4pmEnA@mail.gmail.com>
Message-ID: <5271B415.8080607@mrabarnett.plus.com>

On 31/10/2013 00:05, ????? wrote:
> 2013/10/31 MRAB <python at mrabarnett.plus.com>:
>> On 30/10/2013 23:00, Eric Snow wrote:
>>>
>>> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>>
>>>> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
>>>> the issue of using one of the 2 remaining unused ascii symbols for
>>>> something that can already be done, it would not work in a slice call.
>>>
>>>
>>> Is that like where you have 1 more shot on your camera and you don't
>>> want to use it for fear that something more spectacular might show up
>>> afterward?  (and hope that you didn't leave your lens cap on when you
>>> finally take the picture!)  :-)
>>>
>> I don't think it's that bad; I count 3: "!", "$" and "?". :-)
>>
> Can't it be done by adding a __sub__ method to len?
>
> a[:len-n]
>
> Readable and short.
>
-1

I don't like how it makes that function special. I'd much prefer "end"
(or "End") instead.


From elazarg at gmail.com  Thu Oct 31 02:55:10 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Thu, 31 Oct 2013 03:55:10 +0200
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <5271B322.1000108@mrabarnett.plus.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
Message-ID: <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>

[MRAB?]
> it's already the case> that 1.j == 1.0j, not (1).j, so saying that 1.real == (1).real would be inconsistent.

I don't understand. In what sense 1.j != (1).j ? The latter is an
AttributeError, which I suggest to change.

[Alexander Belopolsky]
> What would 1.e50 mean under your proposal?

Well that seems to kill it :-(. I should have digged deeper before
proposing this idea.

I assume it's fixable by making it a special syntax itself, or by
raising an AttributeError (I really doubt it appears anywhere) or by
making int's __getattribute__ handle it in a special way, but I
wouldn't suggest these ideas seriously.

The docs should mention this issue, however. perhaps in an end note.

From tjreedy at udel.edu  Thu Oct 31 03:05:26 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 30 Oct 2013 22:05:26 -0400
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <52719D47.1010507@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
Message-ID: <l4sdsu$tt7$1@ger.gmane.org>

On 10/30/2013 7:59 PM, MRAB wrote:
> On 30/10/2013 23:00, Eric Snow wrote:
>> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy
>> <tjreedy at udel.edu> wrote:
>>> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
>>> the issue of using one of the 2 remaining unused ascii symbols for
>>> something that can already be done, it would not work in a slice call.
>>
>> Is that like where you have 1 more shot on your camera and you don't
>> want to use it for fear that something more spectacular might show up
>> afterward?  (and hope that you didn't leave your lens cap on when you
>> finally take the picture!)  :-)
>>
> I don't think it's that bad; I count 3: "!", "$" and "?". :-)
 >>> 2 != 3
True


-- 
Terry Jan Reedy


From python at mrabarnett.plus.com  Thu Oct 31 03:06:35 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Oct 2013 02:06:35 +0000
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
Message-ID: <5271BB2B.9060703@mrabarnett.plus.com>

On 31/10/2013 01:55, ????? wrote:
> [MRAB?]
>> it's already the case> that 1.j == 1.0j, not (1).j, so saying that 1.real == (1).real would be inconsistent.
>
> I don't understand. In what sense 1.j != (1).j ? The latter is an
> AttributeError, which I suggest to change.
>
Ah, I see what you mean.

-1

I don't like the idea of a float or int having its imaginary equivalent
as an attribute. It just feels 'wrong' to me!

> [Alexander Belopolsky]
>> What would 1.e50 mean under your proposal?
>
> Well that seems to kill it :-(. I should have digged deeper before
> proposing this idea.
>
> I assume it's fixable by making it a special syntax itself, or by
> raising an AttributeError (I really doubt it appears anywhere) or by
> making int's __getattribute__ handle it in a special way, but I
> wouldn't suggest these ideas seriously.
>
> The docs should mention this issue, however. perhaps in an end note.
>


From elazarg at gmail.com  Thu Oct 31 03:13:33 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Thu, 31 Oct 2013 04:13:33 +0200
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <5271BB2B.9060703@mrabarnett.plus.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
 <5271BB2B.9060703@mrabarnett.plus.com>
Message-ID: <CAPw6O2TPHd3w-V-tQv7gAsiuciH16DLYgqFur5iE7SPk0Aq_SA@mail.gmail.com>

2013/10/31 MRAB <python at mrabarnett.plus.com>:
> On 31/10/2013 01:55, ????? wrote:
>
> -1
>
> I don't like the idea of a float or int having its imaginary equivalent
> as an attribute. It just feels 'wrong' to me!
>
Oh. I think I understand: it should be named __j__ ;-)

From ethan at stoneleaf.us  Thu Oct 31 02:43:14 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 30 Oct 2013 18:43:14 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <5271B415.8080607@mrabarnett.plus.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
 <CAPw6O2TNYzVbBr9dVZJL-LEdPiQ2XzZUw2R3nyR7DJgm4pmEnA@mail.gmail.com>
 <5271B415.8080607@mrabarnett.plus.com>
Message-ID: <5271B5B2.7050209@stoneleaf.us>

On 10/30/2013 06:36 PM, MRAB wrote:
> On 31/10/2013 00:05, ????? wrote:
>> 2013/10/31 MRAB <python at mrabarnett.plus.com>:
>>> On 30/10/2013 23:00, Eric Snow wrote:
>>>>
>>>> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>>>
>>>>> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
>>>>> the issue of using one of the 2 remaining unused ascii symbols for
>>>>> something that can already be done, it would not work in a slice call.
>>>>
>>>>
>>>> Is that like where you have 1 more shot on your camera and you don't
>>>> want to use it for fear that something more spectacular might show up
>>>> afterward?  (and hope that you didn't leave your lens cap on when you
>>>> finally take the picture!)  :-)
>>>>
>>> I don't think it's that bad; I count 3: "!", "$" and "?". :-)
>>>
>> Can't it be done by adding a __sub__ method to len?
>>
>> a[:len-n]
>>
>> Readable and short.
>>
> -1
>
> I don't like how it makes that function special.

Not only that, but len wouldn't know what it was subtracting from.

--
~Ethan~

From steve at pearwood.info  Thu Oct 31 03:56:54 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 31 Oct 2013 13:56:54 +1100
Subject: [Python-ideas] Support os.path.join for Windows paths on Posix
In-Reply-To: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
References: <CAO41-mMWWu0wT7bHJHBPAJ6D5FzJaTfWAHw7pKoFSG5VcOygAw@mail.gmail.com>
Message-ID: <20131031025653.GB18730@ando>

On Wed, Oct 30, 2013 at 05:35:57PM -0500, Ryan Gonzalez wrote:
> The recent thread/post/whatever on os.path.join has gotten me thinking. Say
> I wanted to join a Windows path...on Ubuntu. This is what I get:
[...]
> Isn't something wrong there? My idea: check for \'s in the path. If there
> are any, assume \ is the path separator, not /.

Others have already pointed out that using ntpath is the right solution 
for this problem, but I wanted to mention that the suggestion to guess 
the path separator based on the path is risky. Backslashes are legal in 
file and directly names on POSIX systems (Mac, Unix, Linux etc), which 
means that sometimes os.join will be given a POSIX path containing 
backslashes, in which case it will wrongly guess it is a Windows path.

os.path.join("ab\\c", "d", "file.txt")

on POSIX ought to give the path containing two directories and a file 
name 'ab\\c/d/file.txt' but with the guesser will generate a single file 
name 'ab\\c\\d\\file.txt'.



-- 
Steven

From python at mrabarnett.plus.com  Thu Oct 31 03:57:34 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Oct 2013 02:57:34 +0000
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <5271B5B2.7050209@stoneleaf.us>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
 <CAPw6O2TNYzVbBr9dVZJL-LEdPiQ2XzZUw2R3nyR7DJgm4pmEnA@mail.gmail.com>
 <5271B415.8080607@mrabarnett.plus.com> <5271B5B2.7050209@stoneleaf.us>
Message-ID: <5271C71E.2060804@mrabarnett.plus.com>

On 31/10/2013 01:43, Ethan Furman wrote:
> On 10/30/2013 06:36 PM, MRAB wrote:
>> On 31/10/2013 00:05, ????? wrote:
>>> 2013/10/31 MRAB <python at mrabarnett.plus.com>:
>>>> On 30/10/2013 23:00, Eric Snow wrote:
>>>>>
>>>>> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>>>>
>>>>>> I though of using a magic symbol, $, for that -- a[$-n]. But aside from
>>>>>> the issue of using one of the 2 remaining unused ascii symbols for
>>>>>> something that can already be done, it would not work in a slice call.
>>>>>
>>>>>
>>>>> Is that like where you have 1 more shot on your camera and you don't
>>>>> want to use it for fear that something more spectacular might show up
>>>>> afterward?  (and hope that you didn't leave your lens cap on when you
>>>>> finally take the picture!)  :-)
>>>>>
>>>> I don't think it's that bad; I count 3: "!", "$" and "?". :-)
>>>>
>>> Can't it be done by adding a __sub__ method to len?
>>>
>>> a[:len-n]
>>>
>>> Readable and short.
>>>
>> -1
>>
>> I don't like how it makes that function special.
>
> Not only that, but len wouldn't know what it was subtracting from.
>
But you could have an "End" class, something like this:

class End:
     def __init__(self, offset=0):
         self.offset = offset

     def __sub__(self, offset):
         return End(self.offset - offset)

     def __add__(self, offset):
         return End(self.offset + offset)

     def __str__(self):
         if self.offset < 0:
             return 'End - {}'.format(-self.offset)

         if self.offset > 0:
             return 'End + {}'.format(self.offset)

         return 'End'

Unfortunately, all those methods that expect an index would have to be
modified. :-(


From greg.ewing at canterbury.ac.nz  Thu Oct 31 04:53:57 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Oct 2013 16:53:57 +1300
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
Message-ID: <5271D455.2090507@canterbury.ac.nz>

On 31/10/13 14:55, ????? wrote:
> [Alexander Belopolsky]
>> What would 1.e50 mean under your proposal?
>
> Well that seems to kill it :-(.

In hindsight, things might have been better if the decimal point
were required to have at least one digit after it, but it's
probably too late to change that now.

-- 
Greg


From abarnert at yahoo.com  Thu Oct 31 07:06:50 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Wed, 30 Oct 2013 23:06:50 -0700
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <l4s2a9$ar5$1@ger.gmane.org>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
Message-ID: <0C20E3A4-6750-4E4C-857F-6D81C30983B6@yahoo.com>

On Oct 30, 2013, at 15:47, Terry Reedy <tjreedy at udel.edu> wrote:

> I think one point is that if seq.__getitem__(ob) uses 'if isinstance(ob, slice):' instead of 'if type(ob) is slice:', subclass instances will work whereas wrapper instances would not.

Why would anyone use isinstance(ob, slice)? If you can write "start, stop, step = ob.indices(length) without getting a TypeError, what else do you care about? (Code that _does_ care about the difference probably wouldn't work correctly with any custom slice object anyway.)

If there really is an issue, we could easily add a collections.abc.Slice.

Or... This may be a heretical and/or just stupid idea, but what about reviving the old names __getslice__ and friends (now taking a slice object instead of 2.x's start and stop)? Then the interpreter calls __getslice__ if you use slicing syntax, or if you use indexing syntax with an instance of abc.Slice (or anything but a numbers.Number even?). That way the code is only in one place instead of having to be written in each sequence class.

From abarnert at yahoo.com  Thu Oct 31 08:18:33 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 31 Oct 2013 00:18:33 -0700 (PDT)
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <0C20E3A4-6750-4E4C-857F-6D81C30983B6@yahoo.com>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CADiSq7du4OjdoPRH8tQeJwEcCu34DowhR8jz-Qd4x=TqhuYh1w@mail.gmail.com>
 <l4n3as$bii$1@ger.gmane.org> <526F2EE0.9010705@mrabarnett.plus.com>
 <l4osfb$9f7$1@ger.gmane.org> <52701846.2070604@mrabarnett.plus.com>
 <l4p93u$88s$1@ger.gmane.org>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org> <0C20E3A4-6750-4E4C-857F-6D81C30983B6@yahoo.com>
Message-ID: <1383203913.36934.YahooMailNeo@web184705.mail.ne1.yahoo.com>

This reminded me of something related.

Quasi-sequences?things that implement the implicit sequence protocol (being indexable with?contiguous integers starting from 0, so they're good enough to be used as iterables even though they don't define __iter__), but aren't Sequences (e.g.,?because they're lazy and/or infinite and therefore can't be Sized) work with slices (as long as start and stop are nonnegative).?But only because they ignore the indices method and use the start, stop, and step attributes directly. And that will break any meaningful subclass of slice (except those that only deviate from the base class when given negative indices).

One option is to allow these quasi-sequences to call indices(None), which (in slice; subclasses could do something different if they wanted) would raise an IndexError if its start or stop were negative, otherwise act as if it were given an infinite length. (This would also make such quasi-sequences easier to write, and more consistent.)

Here's an example (which may be kind of silly, but someone wrote it, and it works, and it's in a project I maintain?): a LazyList that wraps up an iterator and acts like a quasi-sequence?you can index it and slice it, and even mutate it; the first time you try to get/set/del an index higher than all that have been accessed so far, it moves an appropriate number of values from the stored iterator to a list, then just does the get/set/del on that list.


For example, if Squares(i) is an iterator that's like (n*n for n in itertools.count(i)) but with a useful repr:

>>> ll = LazyList(Squares(0))

>>> ll
LazyList(Squares(0))
>>> ll[1:-1]
IndexError: LazyList indices cannot be negative
>>> ll[1]
1
>>> ll
LazyList(0, 1, Squares(2))
>>> del ll[2:6:2]

>>> ll
LazyList(0, 1, 9, Squares(5))
>>> ll[5:2:-1]
[49, 36, 25]
>>> ll
LazyList(0, 1, 9, 25, 36, 49, Squares(8))

For an even simpler example, here's an InfiniteRange class:

>>> r = InfiniteRange(2)
>>> r
InfiniteRange(2)
>>> r[0]
2
>>> r[11::2]
InfiniteRange(13, 2)
>>> r[11:15:2]
range(13, 17, 2)
>>> r[15:11:-2]
range(17, 13, -2)
>>> r[:-1]
IndexError: InfiniteRange indices cannot be negative

----- Original Message -----
> From: Andrew Barnert <abarnert at yahoo.com>
> To: Terry Reedy <tjreedy at udel.edu>
> Cc: "python-ideas at python.org" <python-ideas at python.org>
> Sent: Wednesday, October 30, 2013 11:06 PM
> Subject: Re: [Python-ideas] Where did we go wrong with negative stride?
> 
> On Oct 30, 2013, at 15:47, Terry Reedy <tjreedy at udel.edu> wrote:
> 
>>  I think one point is that if seq.__getitem__(ob) uses 'if 
> isinstance(ob, slice):' instead of 'if type(ob) is slice:', subclass 
> instances will work whereas wrapper instances would not.
> 
> Why would anyone use isinstance(ob, slice)? If you can write "start, stop, 
> step = ob.indices(length) without getting a TypeError, what else do you care 
> about? (Code that _does_ care about the difference probably wouldn't work 
> correctly with any custom slice object anyway.)
> 
> If there really is an issue, we could easily add a collections.abc.Slice.
> 
> Or... This may be a heretical and/or just stupid idea, but what about reviving 
> the old names __getslice__ and friends (now taking a slice object instead of 
> 2.x's start and stop)? Then the interpreter calls __getslice__ if you use 
> slicing syntax, or if you use indexing syntax with an instance of abc.Slice (or 
> anything but a numbers.Number even?). That way the code is only in one place 
> instead of having to be written in each sequence class.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> 

From techtonik at gmail.com  Thu Oct 31 11:24:19 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 31 Oct 2013 13:24:19 +0300
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAGifb9Fm_NVWZ3oYoCJmAxg_GAiCsWG8h7m3tA9GmmBw=vz-zw@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGifb9Fm_NVWZ3oYoCJmAxg_GAiCsWG8h7m3tA9GmmBw=vz-zw@mail.gmail.com>
Message-ID: <CAPkN8xLC+dYS-OUhfap2mM3r=2cuusVAxyDoReQ68ZzdtzUjLA@mail.gmail.com>

On Wed, Oct 30, 2013 at 7:50 PM, Geoffrey Spear <geoffspear at gmail.com> wrote:
> On Wed, Oct 30, 2013 at 12:34 PM, anatoly techtonik <techtonik at gmail.com> wrote:
>>   >>> os.path.join('/static', '/styles/largestyles.css')
>>   '/styles/largestyles.css'
>>
>> Is it only me who thinks that the code above is wrong?
>
> No, the code is obviously wrong. What's your idea? To make the bit
> about absolute paths in the documentation all bold, red, and blinking?

No. This won't help. The idea is to fill request for a change in
behavior for future versions of Python. With rename if necessary.
Perhaps even for this - http://www.python.org/dev/peps/pep-0428/

And also the idea is to listen to arguments to protect current behavior.

And also what do people think about a library with cross-platform path
operations behavior. Meaning that given all set of paths available on
internet, this library will normalize operations with those paths, and
will provide path handling rules (functions) that work identically on
every platform. In cases where it is impossible, the documentation
will contain description of the problem.
--
anatoly t.

From techtonik at gmail.com  Thu Oct 31 11:30:02 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 31 Oct 2013 13:30:02 +0300
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
Message-ID: <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>

On Wed, Oct 30, 2013 at 8:06 PM, Bruce Leban <bruce at leapyear.org> wrote:
> I don't know if the code is wrong but if you're asking if the *result* of
> join is wrong, I don't think it is. It references the same file as these
> commands:
>
> cd /static
> cat /styles/largestyles,css
>
> I agree it might be confusing but it's pretty explicitly documented.

Yes. It is confusing.

1. How often the operations to join absolute paths is needed?
2. What is expected result of this operation?

For me, as a user, the answer to 1 is 'never', for 2 I'd expect 2nd
path to be treated as relative one. Thinking about this as 2nd path is
an absolute path from the mountpoint specified in the 1st.
--
anatoly t.

From techtonik at gmail.com  Thu Oct 31 11:31:16 2013
From: techtonik at gmail.com (anatoly techtonik)
Date: Thu, 31 Oct 2013 13:31:16 +0300
Subject: [Python-ideas] os.path.join
In-Reply-To: <l4rnch$5b1$2@ger.gmane.org>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <l4rnch$5b1$2@ger.gmane.org>
Message-ID: <CAPkN8xJ1mtWkFiJiRE9pZB9vixnCpxYhdfDsM7hb04UDX5XevQ@mail.gmail.com>

On Wed, Oct 30, 2013 at 10:41 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> On 30/10/2013 16:34, anatoly techtonik wrote:
>>
>>    >>> os.path.join('/static', '/styles/largestyles.css')
>>    '/styles/largestyles.css'
>>
>> Is it only me who thinks that the code above is wrong?
>>
>
> Is this the appropriate place for such a question?  What is wrong with the
> main Python mailing list, Stackoverflow...?
>
> --
> Python is the second best programming language in the world.
> But the best has yet to be invented.  Christian Tismer

Both Python ML and SO are bad for inventing new languages.
--
anatoly t.

From p.f.moore at gmail.com  Thu Oct 31 13:56:53 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 31 Oct 2013 12:56:53 +0000
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
 <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>
Message-ID: <CACac1F_eGf4mPybtuM98MhDs6aLmVrNCxJ2XKRy3ZAi9ChBxSA@mail.gmail.com>

On 31 October 2013 10:30, anatoly techtonik <techtonik at gmail.com> wrote:
>> I agree it might be confusing but it's pretty explicitly documented.
>
> Yes. It is confusing.
>
> 1. How often the operations to join absolute paths is needed?

Infrequently, but occasionally. Usually the first argument will be a
fixed value which is a "base path" and the second will be a
user-supplied (or similar) value which is to be interpreted as
relative to the base, unless it's an absolute path when it's to be
used unchanged.

> 2. What is expected result of this operation?

Exactly what Python currently does.

> For me, as a user, the answer to 1 is 'never', for 2 I'd expect 2nd
> path to be treated as relative one. Thinking about this as 2nd path is
> an absolute path from the mountpoint specified in the 1st.

I would never want this behaviour in any real application I have encountered.

Paul

From elazarg at gmail.com  Thu Oct 31 14:37:36 2013
From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=)
Date: Thu, 31 Oct 2013 15:37:36 +0200
Subject: [Python-ideas] Where did we go wrong with negative stride?
In-Reply-To: <5271B5B2.7050209@stoneleaf.us>
References: <CAP7+vJJy2C8-wVKM9zsA0-SKZmk-FXBo1jepLwtR+3Ed+AxNgA@mail.gmail.com>
 <CADiSq7da_+R9cV34=ENKheUkMW60Mdzf3ZB1ydjR6=3pz0p5GQ@mail.gmail.com>
 <l4pt9r$r7b$1@ger.gmane.org>
 <CADiSq7d5Cx2yYtYJ5k=k=Bd3TnSm4fb+-QwQ3JW7_6KwinMaoA@mail.gmail.com>
 <CAHVvXxQP=BHh-iGgg-WstJmZh=Y2nWXiV8sepNDcFHZNmWqYcg@mail.gmail.com>
 <CACac1F9fKSXmBy8+Td5ku7zdVuc7iS6HZgpdqaQ_Hg6U6=MQGw@mail.gmail.com>
 <CAHVvXxRFnzny508J70k9_e-hf5bSB2Lw5PbGkmxkB_Jway+9aw@mail.gmail.com>
 <CADiSq7cAkYBKzNoV2hetR9NwhZF2u-vGEQShd5TGfZeViN3gRg@mail.gmail.com>
 <CADiSq7dn_VeN1xbF90HOJjy6BiKKrM3yjqW3YrRhLhPsUCRd_g@mail.gmail.com>
 <CAHVvXxTcMKqc9q2mOP=-2gLoH8jVM8u6WSW7xuOj6ppnynWHFg@mail.gmail.com>
 <CADiSq7dtxLfWmtNuN7s9Tieb_XKJQsSTJubQx3Yd1J3uZwEf+w@mail.gmail.com>
 <l4s2a9$ar5$1@ger.gmane.org>
 <CALFfu7Bg1sptCzKofaSaZZrem+TAbNqL-d=8_YxMrCH=Vwp=kQ@mail.gmail.com>
 <52719D47.1010507@mrabarnett.plus.com>
 <CAPw6O2TNYzVbBr9dVZJL-LEdPiQ2XzZUw2R3nyR7DJgm4pmEnA@mail.gmail.com>
 <5271B415.8080607@mrabarnett.plus.com>
 <5271B5B2.7050209@stoneleaf.us>
Message-ID: <CAPw6O2RD37BqgU05F8F9apbNnYnjUEaQ-RP8BZcrNMct4c4XCw@mail.gmail.com>

2013/10/31 Ethan Furman <ethan at stoneleaf.us>:
> On 10/30/2013 06:36 PM, MRAB wrote:
>>
>> On 31/10/2013 00:05, ????? wrote:
>>>
>>> 2013/10/31 MRAB <python at mrabarnett.plus.com>:
>>>>
>>>> On 30/10/2013 23:00, Eric Snow wrote:
>>>>>
>>>>>
>>>>> On Wed, Oct 30, 2013 at 4:47 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>>>>>>
>>>>>>
>>>>>> I though of using a magic symbol, $, for that -- a[$-n]. But aside
>>>>>> from
>>>>>> the issue of using one of the 2 remaining unused ascii symbols for
>>>>>> something that can already be done, it would not work in a slice call.
>>>>>
>>>>>
>>>>>
>>>>> Is that like where you have 1 more shot on your camera and you don't
>>>>> want to use it for fear that something more spectacular might show up
>>>>> afterward?  (and hope that you didn't leave your lens cap on when you
>>>>> finally take the picture!)  :-)
>>>>>
>>>> I don't think it's that bad; I count 3: "!", "$" and "?". :-)
>>>>
>>> Can't it be done by adding a __sub__ method to len?
>>>
>>> a[:len-n]
>>>
>>> Readable and short.
>>>
>> -1
>>
>> I don't like how it makes that function special.
>
>
> Not only that, but len wouldn't know what it was subtracting from.
>
But that doesn't matter; the operation will return the same End object
discussed here.

Perhaps we can get this End object by adding two tokens: ":-" and "[-". So

a[-3:-5] == a[slice(End-3, End-5, None)]

although it will turn a[-3] into a[End-3]. I don't think it's a
problem if the latter will behave in the same way as the former (i.e
End-3 be a subtype of int).

Note that with an End object (regardless of wheather it's called
"End", "len-x" or ":-x") we can get End/5. I think that's a nice thing
to have.

One more thing: End-5 should be callable, so it can be passed around.

(End-3)("hello") == len("hello")-3
(End-0)("hello") == len("hello")

This way End is a generalization of len, making len somewhat redundant.

From breamoreboy at yahoo.co.uk  Thu Oct 31 15:07:46 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Thu, 31 Oct 2013 14:07:46 +0000
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xJ1mtWkFiJiRE9pZB9vixnCpxYhdfDsM7hb04UDX5XevQ@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <l4rnch$5b1$2@ger.gmane.org>
 <CAPkN8xJ1mtWkFiJiRE9pZB9vixnCpxYhdfDsM7hb04UDX5XevQ@mail.gmail.com>
Message-ID: <l4to7d$6ee$1@ger.gmane.org>

On 31/10/2013 10:31, anatoly techtonik wrote:
> On Wed, Oct 30, 2013 at 10:41 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>> On 30/10/2013 16:34, anatoly techtonik wrote:
>>>
>>>     >>> os.path.join('/static', '/styles/largestyles.css')
>>>     '/styles/largestyles.css'
>>>
>>> Is it only me who thinks that the code above is wrong?
>>>
>>
>> Is this the appropriate place for such a question?  What is wrong with the
>> main Python mailing list, Stackoverflow...?
>>
>> --
>> Python is the second best programming language in the world.
>> But the best has yet to be invented.  Christian Tismer
>
> Both Python ML and SO are bad for inventing new languages.
> --
> anatoly t.
>

I'm completely baffled by your comment, so please explain yourself.

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence


From flying-sheep at web.de  Thu Oct 31 15:35:00 2013
From: flying-sheep at web.de (Philipp A.)
Date: Thu, 31 Oct 2013 15:35:00 +0100
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <5271D455.2090507@canterbury.ac.nz>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
 <5271D455.2090507@canterbury.ac.nz>
Message-ID: <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>

2013/10/31 Greg Ewing <greg.ewing at canterbury.ac.nz>

> In hindsight, things might have been better if the decimal point
> were required to have at least one digit after it, but it's
> probably too late to change that now.


i disagree. i like writing ?1.? for ?float(1)? and ?.1? for ?1/10?. what?s
the point of redundant zeros?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131031/9ee08ff6/attachment.html>

From rosuav at gmail.com  Thu Oct 31 16:00:13 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 1 Nov 2013 02:00:13 +1100
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
 <5271D455.2090507@canterbury.ac.nz>
 <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
Message-ID: <CAPTjJmp8WmHMpWdVOaMOnVXBMmA17jy1u-sOhbn8wH23X0KnAw@mail.gmail.com>

On Fri, Nov 1, 2013 at 1:35 AM, Philipp A. <flying-sheep at web.de> wrote:
> 2013/10/31 Greg Ewing <greg.ewing at canterbury.ac.nz>
>>
>> In hindsight, things might have been better if the decimal point
>> were required to have at least one digit after it, but it's
>> probably too late to change that now.
>
>
> i disagree. i like writing ?1.? for ?float(1)? and ?.1? for ?1/10?. what?s
> the point of redundant zeros?

".1" is not under challenge (the requirement suggested is merely a
digit after, nothing about before). I agree that "1." for 1.0 is a
useful shorthand, but if the parser had originally been written to
disallow it, I doubt there'd be very strong call to make it more free.

ChrisA

From ronaldoussoren at mac.com  Thu Oct 31 16:10:46 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Thu, 31 Oct 2013 16:10:46 +0100
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
 <5271D455.2090507@canterbury.ac.nz>
 <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
Message-ID: <D0BF07E6-D5A5-4537-8E0A-766E04AE6F66@mac.com>


On 31 Oct, 2013, at 15:35, Philipp A. <flying-sheep at web.de> wrote:

> 2013/10/31 Greg Ewing <greg.ewing at canterbury.ac.nz>
> In hindsight, things might have been better if the decimal point
> were required to have at least one digit after it, but it's
> probably too late to change that now.
> 
> i disagree. i like writing ?1.? for ?float(1)? and ?.1? for ?1/10?. what?s the point of redundant zeros?

Increased readability. 

Ronald


From ethan at stoneleaf.us  Thu Oct 31 16:15:10 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 31 Oct 2013 08:15:10 -0700
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
 <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>
Message-ID: <527273FE.2060104@stoneleaf.us>

On 10/31/2013 03:30 AM, anatoly techtonik wrote:
> On Wed, Oct 30, 2013 at 8:06 PM, Bruce Leban <bruce at leapyear.org> wrote:
>> I don't know if the code is wrong but if you're asking if the *result* of
>> join is wrong, I don't think it is. It references the same file as these
>> commands:
>>
>> cd /static
>> cat /styles/largestyles,css
>
> 2. What is expected result of this operation?
>
> for 2 I'd expect 2nd path to be treated as relative one.

If 2 is a relative path, it shouldn't be leading with a slash.

--
~Ethan~

From ethan at stoneleaf.us  Thu Oct 31 16:16:50 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 31 Oct 2013 08:16:50 -0700
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xJ1mtWkFiJiRE9pZB9vixnCpxYhdfDsM7hb04UDX5XevQ@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <l4rnch$5b1$2@ger.gmane.org>
 <CAPkN8xJ1mtWkFiJiRE9pZB9vixnCpxYhdfDsM7hb04UDX5XevQ@mail.gmail.com>
Message-ID: <52727462.3050101@stoneleaf.us>

On 10/31/2013 03:31 AM, anatoly techtonik wrote:
> On Wed, Oct 30, 2013 at 10:41 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
>>
>> Is this the appropriate place for such a question?  What is wrong with the
>> main Python mailing list, Stackoverflow...?
>
> Both Python ML and SO are bad for inventing new languages.

If you're inventing a new language, why are you wasting time on a Python venue?

--
~Ethan~

From g.brandl at gmx.net  Thu Oct 31 17:06:01 2013
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 31 Oct 2013 17:06:01 +0100
Subject: [Python-ideas] os.path.join
In-Reply-To: <CACac1F_eGf4mPybtuM98MhDs6aLmVrNCxJ2XKRy3ZAi9ChBxSA@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
 <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>
 <CACac1F_eGf4mPybtuM98MhDs6aLmVrNCxJ2XKRy3ZAi9ChBxSA@mail.gmail.com>
Message-ID: <l4tv2n$1mi$1@ger.gmane.org>

Am 31.10.2013 13:56, schrieb Paul Moore:
> On 31 October 2013 10:30, anatoly techtonik <techtonik at gmail.com> wrote:
>>> I agree it might be confusing but it's pretty explicitly documented.
>>
>> Yes. It is confusing.
>>
>> 1. How often the operations to join absolute paths is needed?
> 
> Infrequently, but occasionally. Usually the first argument will be a
> fixed value which is a "base path" and the second will be a
> user-supplied (or similar) value which is to be interpreted as
> relative to the base, unless it's an absolute path when it's to be
> used unchanged.

Exactly.  Absolute paths are different from relative paths, and it should
be made clear to everyone quite early that "foo" is different from "/foo".

I hope nobody would expect path.join('C:\\xyz', 'D:\\abc') to result in
'C:\\xyz\\D:\\abc' on Windows.

Georg


From abarnert at yahoo.com  Thu Oct 31 17:26:28 2013
From: abarnert at yahoo.com (Andrew Barnert)
Date: Thu, 31 Oct 2013 09:26:28 -0700
Subject: [Python-ideas] os.path.join
In-Reply-To: <CACac1F_eGf4mPybtuM98MhDs6aLmVrNCxJ2XKRy3ZAi9ChBxSA@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGu0AnunwyJnee1Zxc_0Hfd=Wh4Skp9r0gAPiX=xsqpPcNMZpQ@mail.gmail.com>
 <CAPkN8xKxd+MtdZ1Nf49GSZhmT=Aw61OvCohKPTeHXe5j7QW1Yw@mail.gmail.com>
 <CACac1F_eGf4mPybtuM98MhDs6aLmVrNCxJ2XKRy3ZAi9ChBxSA@mail.gmail.com>
Message-ID: <6406A2F6-4AE6-4B9E-9047-600CA0658F8F@yahoo.com>

On Oct 31, 2013, at 5:56, Paul Moore <p.f.moore at gmail.com> wrote:

> On 31 October 2013 10:30, anatoly techtonik <techtonik at gmail.com> wrote:
>>> I agree it might be confusing but it's pretty explicitly documented.
>> 
>> Yes. It is confusing.
>> 
>> 1. How often the operations to join absolute paths is needed?
> 
> Infrequently, but occasionally. Usually the first argument will be a
> fixed value which is a "base path" and the second will be a
> user-supplied (or similar) value which is to be interpreted as
> relative to the base, unless it's an absolute path when it's to be
> used unchanged.

Agreed. Any command line tool that takes an optional base path in an flag arg and paths to files in positional args works that way--or it works by first chdir-ing to the base path then using the paths, which names the same files. It would be very surprising if it didn't.  Even the way base URLs and href URLs on web pages are combined is based on this behavior.

Languages that don't do it this way are surprising. For example, Ruby's File.join always treats the argument as a relative path. So I had to write my own method that did the equivalent of "p2 if p2.startswith(os.sep) else p1.join(p2)". (Perl of course has at least 5 ways to do it, 2 that act like Python, 1 that acts like Ruby, and 2 that double up the separator with whatever meaning that happens to have on each platform--but that isn't surprising; it's perl.)

>> 2. What is expected result of this operation?
> 
> Exactly what Python currently does.
> 
>> For me, as a user, the answer to 1 is 'never', for 2 I'd expect 2nd
>> path to be treated as relative one. Thinking about this as 2nd path is
>> an absolute path from the mountpoint specified in the 1st.
> 
> I would never want this behaviour in any real application I have encountered.

Agreed.

I've never seen anyone argue that the other behavior would be more "natural". I _have_ seen an argument that it's more "secure", but this seems like a silly argument. After all, Ruby's File.join doesn't stop you from joining "../../../etc", so why should it stop you from joining "/etc"? And, if there _is_ a good reason to stop you, why does it return a path that's likely to silently work (but not in the way the user intended) rather than raise? And what if you're writing a command line tool intended for system administration rather than a web app? Even with a web app, if you run inside a chroot or similar jail, how do you provide access to the entire jail? It seems like the kind of "security" feature that PHP hacks would devise, like using extra quoting so an attacker has to throw an extra quote in if he wants to inject SQL...


From janzert at janzert.com  Thu Oct 31 17:46:36 2013
From: janzert at janzert.com (Janzert)
Date: Thu, 31 Oct 2013 12:46:36 -0400
Subject: [Python-ideas] os.path.join
In-Reply-To: <CAPkN8xLC+dYS-OUhfap2mM3r=2cuusVAxyDoReQ68ZzdtzUjLA@mail.gmail.com>
References: <CAPkN8xLgywa5-BJi63BsadMJc+Rfpd1D=qCXQUQuP1hRNf=4tw@mail.gmail.com>
 <CAGifb9Fm_NVWZ3oYoCJmAxg_GAiCsWG8h7m3tA9GmmBw=vz-zw@mail.gmail.com>
 <CAPkN8xLC+dYS-OUhfap2mM3r=2cuusVAxyDoReQ68ZzdtzUjLA@mail.gmail.com>
Message-ID: <l4u1hl$316$1@ger.gmane.org>

On 10/31/2013 6:24 AM, anatoly techtonik wrote:
>
> And also the idea is to listen to arguments to protect current behavior.
>

Explicitly saying that you're trolling is rather poor form for you.

Janzert


From flying-sheep at web.de  Thu Oct 31 18:35:57 2013
From: flying-sheep at web.de (Philipp A.)
Date: Thu, 31 Oct 2013 18:35:57 +0100
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAPTjJmp8WmHMpWdVOaMOnVXBMmA17jy1u-sOhbn8wH23X0KnAw@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
 <5271D455.2090507@canterbury.ac.nz>
 <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
 <CAPTjJmp8WmHMpWdVOaMOnVXBMmA17jy1u-sOhbn8wH23X0KnAw@mail.gmail.com>
Message-ID: <CAN8d9g=dsC8P0RccpNqpES+C5rfaDzx8c2jMpBGXx7hf5ZnENw@mail.gmail.com>

2013/10/31 Chris Angelico <rosuav at gmail.com>

> I agree that "1." for 1.0 is a useful shorthand, but if the parser had
> originally been written to disallow it, I doubt there'd be very strong call
> to make it more free.
>

good point. i should remember to always think like this in here ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131031/986f79e2/attachment.html>

From greg.ewing at canterbury.ac.nz  Thu Oct 31 23:59:12 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 01 Nov 2013 11:59:12 +1300
Subject: [Python-ideas] Allow attribute references for decimalinteger
In-Reply-To: <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
References: <CAPw6O2S9ZT7Fg40qDB4tdaZqMXYdyqM1XvWUVzNimiYvwzCWOQ@mail.gmail.com>
 <5271B322.1000108@mrabarnett.plus.com>
 <CAPw6O2Ss=6R4Nwmvb3vGR2=5sOvbWSjL4OTdxLZ0ibcR8RQp5w@mail.gmail.com>
 <5271D455.2090507@canterbury.ac.nz>
 <CAN8d9gmp=E3+ONVh0YBf_xxdjBoeu9bpgVY57_J1FSWBWCEHdg@mail.gmail.com>
Message-ID: <5272E0C0.5090401@canterbury.ac.nz>

Philipp A. wrote:
> 2013/10/31 Greg Ewing <greg.ewing at canterbury.ac.nz 
> <mailto:greg.ewing at canterbury.ac.nz>>
> 
>     In hindsight, things might have been better if the decimal point
>     were required to have at least one digit after it, 
> 
> i disagree. i like writing ?1.? for ?float(1)? and ?.1? for ?1/10?. 
> what?s the point of redundant zeros?

I meant that it would be better for the lexer, as it
would remove the ambiguity.

Personally I prefer to write '1.0' rather than '1.' in the
interests of readability, so it wouldn't bother me,
but I understand that others may feel differently.

Having said that, nowadays I'm not sure that there is
much reason to ever write '1.' rather than just '1',
since ints get promoted to floats in most contexts where
it's necessary. This is especially true now that the
division operator works sanely.

-- 
Greg