From mark at qtrac.eu  Wed Sep  8 18:50:29 2010
From: mark at qtrac.eu (Mark Summerfield)
Date: Wed, 8 Sep 2010 17:50:29 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
Message-ID: <20100908175029.6617ae3b@dino>

Hi,

I can't see a _nice_ way of splitting a with statement over mulitple
lines:

class FakeContext:
    def __init__(self, name):
        self.name = name
    def __enter__(self):
        print("enter", self.name)
    def __exit__(self, *args):
        print("exit", self.name)

with FakeContext("a") as a, FakeContext("b") as b:
    pass # works fine


with FakeContext("a") as a,
     FakeContext("b") as b:
    pass # synax error


with (FakeContext("a") as a,
      FakeContext("b") as b):
    pass # synax error

The use case where this mattered to me was this:

    with open(args.actual, encoding="utf-8") as afh,
    open(args.expected, encoding="utf-8") as efh: actual =
    [line.rstrip("\n\r") for line in afh.readlines()] expected =
    [line.rstrip("\n\r") for line in efh.readlines()]

Naturally, I could split the line in an ugly place:

    with open(args.actual, encoding="utf-8") as afh, open(args.expected,
	    encoding="utf-8") as efh:

but it seems a shame to do so. Or am I missing something?

I'm using Python 3.1.2.

-- 
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
    C++, Python, Qt, PyQt - training and consultancy
        "Rapid GUI Programming with Python and Qt" - ISBN 0132354187
            http://www.qtrac.eu/pyqtbook.html


From nathan at cmu.edu  Wed Sep  8 19:00:25 2010
From: nathan at cmu.edu (Nathan Schneider)
Date: Wed, 8 Sep 2010 13:00:25 -0400
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <20100908175029.6617ae3b@dino>
References: <20100908175029.6617ae3b@dino>
Message-ID: <AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>

Mark,

I have approached these cases by using the backslash line-continuation operator:

with FakeContext("a") as a, \
   FakeContext("b") as b:
   pass

Nathan

On Wed, Sep 8, 2010 at 12:50 PM, Mark Summerfield <mark at qtrac.eu> wrote:
> Hi,
>
> I can't see a _nice_ way of splitting a with statement over mulitple
> lines:
>
> class FakeContext:
> ? ?def __init__(self, name):
> ? ? ? ?self.name = name
> ? ?def __enter__(self):
> ? ? ? ?print("enter", self.name)
> ? ?def __exit__(self, *args):
> ? ? ? ?print("exit", self.name)
>
> with FakeContext("a") as a, FakeContext("b") as b:
> ? ?pass # works fine
>
>
> with FakeContext("a") as a,
> ? ? FakeContext("b") as b:
> ? ?pass # synax error
>
>
> with (FakeContext("a") as a,
> ? ? ?FakeContext("b") as b):
> ? ?pass # synax error
>
> The use case where this mattered to me was this:
>
> ? ?with open(args.actual, encoding="utf-8") as afh,
> ? ?open(args.expected, encoding="utf-8") as efh: actual =
> ? ?[line.rstrip("\n\r") for line in afh.readlines()] expected =
> ? ?[line.rstrip("\n\r") for line in efh.readlines()]
>
> Naturally, I could split the line in an ugly place:
>
> ? ?with open(args.actual, encoding="utf-8") as afh, open(args.expected,
> ? ? ? ? ? ?encoding="utf-8") as efh:
>
> but it seems a shame to do so. Or am I missing something?
>
> I'm using Python 3.1.2.
>
> --
> Mark Summerfield, Qtrac Ltd, www.qtrac.eu
> ? ?C++, Python, Qt, PyQt - training and consultancy
> ? ? ? ?"Rapid GUI Programming with Python and Qt" - ISBN 0132354187
> ? ? ? ? ? ?http://www.qtrac.eu/pyqtbook.html
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>


From mwm-keyword-python.b4bdba at mired.org  Wed Sep  8 19:04:00 2010
From: mwm-keyword-python.b4bdba at mired.org (Mike Meyer)
Date: Wed, 8 Sep 2010 13:04:00 -0400
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <20100908175029.6617ae3b@dino>
References: <20100908175029.6617ae3b@dino>
Message-ID: <20100908130400.75ec0a60@bhuda.mired.org>

On Wed, 8 Sep 2010 17:50:29 +0100
Mark Summerfield <mark at qtrac.eu> wrote:

> Hi,
> 
> I can't see a _nice_ way of splitting a with statement over mulitple
> lines:
> 
> class FakeContext:
>     def __init__(self, name):
>         self.name = name
>     def __enter__(self):
>         print("enter", self.name)
>     def __exit__(self, *args):
>         print("exit", self.name)
> 
> with FakeContext("a") as a, FakeContext("b") as b:
>     pass # works fine
> 
> 
> with FakeContext("a") as a,
>      FakeContext("b") as b:
>     pass # synax error
> 
> 
> with (FakeContext("a") as a,
>       FakeContext("b") as b):
>     pass # synax error

How about:

with FakeContext("a") as a:
 with FakeContext("B") as b:

If the double-indent bothers you, using two two-space indents might be
acceptable in this case.

       <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


From g.brandl at gmx.net  Wed Sep  8 20:07:56 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 08 Sep 2010 20:07:56 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <20100908175029.6617ae3b@dino>
References: <20100908175029.6617ae3b@dino>
Message-ID: <i68jkk$dnf$1@dough.gmane.org>

Am 08.09.2010 18:50, schrieb Mark Summerfield:
> Hi,
> 
> I can't see a _nice_ way of splitting a with statement over mulitple
> lines:
> 
> class FakeContext:
>     def __init__(self, name):
>         self.name = name
>     def __enter__(self):
>         print("enter", self.name)
>     def __exit__(self, *args):
>         print("exit", self.name)
> 
> with FakeContext("a") as a, FakeContext("b") as b:
>     pass # works fine
> 
> 
> with FakeContext("a") as a,
>      FakeContext("b") as b:
>     pass # synax error
> 
> 
> with (FakeContext("a") as a,
>       FakeContext("b") as b):
>     pass # synax error

In addition to the backslash hint already given, I'd like to explain why
this version isn't allowed: the parser couldn't distinguish between a
multi-context with and an expression in parentheses.

(In the case of import, where parens can be used around the import list,
this is different, no arbitrary expression is allowed.)

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



From ncoghlan at gmail.com  Wed Sep  8 23:30:26 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 9 Sep 2010 07:30:26 +1000
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <i68jkk$dnf$1@dough.gmane.org>
References: <20100908175029.6617ae3b@dino>
	<i68jkk$dnf$1@dough.gmane.org>
Message-ID: <AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>

On Thu, Sep 9, 2010 at 4:07 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> In addition to the backslash hint already given, I'd like to explain why
> this version isn't allowed: the parser couldn't distinguish between a
> multi-context with and an expression in parentheses.
>
> (In the case of import, where parens can be used around the import list,
> this is different, no arbitrary expression is allowed.)

I've sometimes wondered if we should consider the idea of making line
continuation implicit between keywords and their associated colons.
I've never seriously investigated the implications for the parser,
though.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From python at mrabarnett.plus.com  Thu Sep  9 00:17:11 2010
From: python at mrabarnett.plus.com (MRAB)
Date: Wed, 08 Sep 2010 23:17:11 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
References: <20100908175029.6617ae3b@dino>	<i68jkk$dnf$1@dough.gmane.org>
	<AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
Message-ID: <4C880B67.5070607@mrabarnett.plus.com>

On 08/09/2010 22:30, Nick Coghlan wrote:
> On Thu, Sep 9, 2010 at 4:07 AM, Georg Brandl<g.brandl at gmx.net>  wrote:
>> In addition to the backslash hint already given, I'd like to explain why
>> this version isn't allowed: the parser couldn't distinguish between a
>> multi-context with and an expression in parentheses.
>>
>> (In the case of import, where parens can be used around the import list,
>> this is different, no arbitrary expression is allowed.)
>
> I've sometimes wondered if we should consider the idea of making line
> continuation implicit between keywords and their associated colons.
> I've never seriously investigated the implications for the parser,
> though.
>
If a colon was omitted by mistake, how much later would the parser
report a syntax error?


From greg.ewing at canterbury.ac.nz  Thu Sep  9 01:19:47 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 09 Sep 2010 11:19:47 +1200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <4C880B67.5070607@mrabarnett.plus.com>
References: <20100908175029.6617ae3b@dino> <i68jkk$dnf$1@dough.gmane.org>
	<AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
	<4C880B67.5070607@mrabarnett.plus.com>
Message-ID: <4C881A13.4060709@canterbury.ac.nz>

MRAB wrote:
> On 08/09/2010 22:30, Nick Coghlan wrote:
> 
>> I've sometimes wondered if we should consider the idea of making line
>> continuation implicit between keywords and their associated colons.
>>
> If a colon was omitted by mistake, how much later would the parser
> report a syntax error?

It might be best to allow this only if the continuation
lines are indented at least as far as the starting line.

-- 
Greg


From mikegraham at gmail.com  Thu Sep  9 01:47:50 2010
From: mikegraham at gmail.com (Mike Graham)
Date: Wed, 8 Sep 2010 19:47:50 -0400
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
References: <20100908175029.6617ae3b@dino> <i68jkk$dnf$1@dough.gmane.org>
	<AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
Message-ID: <AANLkTikVrUkzP-=77-86ggbK6DoXTSMZEFg7+Eokooj8@mail.gmail.com>

On Wed, Sep 8, 2010 at 5:30 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I've sometimes wondered if we should consider the idea of making line
> continuation implicit between keywords and their associated colons.

This would also have the nice aesthetic quality of making colons serve
a purpose.


From greg at krypto.org  Thu Sep  9 07:05:35 2010
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 8 Sep 2010 22:05:35 -0700
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>
References: <20100908175029.6617ae3b@dino>
	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>
Message-ID: <AANLkTimG_uqaJYjpNRN1BM4xzp+gOnDBV7Eodiev+tK8@mail.gmail.com>

On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider <nathan at cmu.edu> wrote:

> Mark,
>
> I have approached these cases by using the backslash line-continuation
> operator:
>
> with FakeContext("a") as a, \
>   FakeContext("b") as b:
>   pass
>
> Nathan
>

I'm in the "\ is evil" at all costs camp so I'd suggest either the nested
with statements or alternatively do this:

fc = FakeContext
with fc("a") as a, fc("b") as b:
    pass


> On Wed, Sep 8, 2010 at 12:50 PM, Mark Summerfield <mark at qtrac.eu> wrote:
> > Hi,
> >
> > I can't see a _nice_ way of splitting a with statement over mulitple
> > lines:
> >
> > class FakeContext:
> >    def __init__(self, name):
> >        self.name = name
> >    def __enter__(self):
> >        print("enter", self.name)
> >    def __exit__(self, *args):
> >        print("exit", self.name)
> >
> > with FakeContext("a") as a, FakeContext("b") as b:
> >    pass # works fine
> >
> >
> > with FakeContext("a") as a,
> >     FakeContext("b") as b:
> >    pass # synax error
> >
> >
> > with (FakeContext("a") as a,
> >      FakeContext("b") as b):
> >    pass # synax error
> >
> > The use case where this mattered to me was this:
> >
> >    with open(args.actual, encoding="utf-8") as afh,
> >    open(args.expected, encoding="utf-8") as efh: actual =
> >    [line.rstrip("\n\r") for line in afh.readlines()] expected =
> >    [line.rstrip("\n\r") for line in efh.readlines()]
> >
> > Naturally, I could split the line in an ugly place:
> >
> >    with open(args.actual, encoding="utf-8") as afh, open(args.expected,
> >            encoding="utf-8") as efh:
> >
> > but it seems a shame to do so. Or am I missing something?
> >
> > I'm using Python 3.1.2.
> >
> > --
> > Mark Summerfield, Qtrac Ltd, www.qtrac.eu
> >    C++, Python, Qt, PyQt - training and consultancy
> >        "Rapid GUI Programming with Python and Qt" - ISBN 0132354187
> >            http://www.qtrac.eu/pyqtbook.html
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100908/4f05d5b1/attachment.html>

From mark at qtrac.eu  Thu Sep  9 07:49:51 2010
From: mark at qtrac.eu (Mark Summerfield)
Date: Thu, 9 Sep 2010 06:49:51 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>
References: <20100908175029.6617ae3b@dino>
	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>
Message-ID: <20100909064951.1e1b4df3@dino>

Hi Nathan,

On Wed, 8 Sep 2010 13:00:25 -0400
Nathan Schneider <nathan at cmu.edu> wrote:
> Mark,
> 
> I have approached these cases by using the backslash
> line-continuation operator:
> 
> with FakeContext("a") as a, \
>    FakeContext("b") as b:
>    pass

Yes, of course, and that's the way I've done it. But it seems a pity to
do it this way when the documentation explicitly discourages the use of
the backslash for line continuation:
http://docs.python.org/py3k/howto/doanddont.html
(look at the very last item)


> 
> Nathan
> 
> On Wed, Sep 8, 2010 at 12:50 PM, Mark Summerfield <mark at qtrac.eu>
> wrote:
> > Hi,
> >
> > I can't see a _nice_ way of splitting a with statement over mulitple
> > lines:
> >
> > class FakeContext:
> > ? ?def __init__(self, name):
> > ? ? ? ?self.name = name
> > ? ?def __enter__(self):
> > ? ? ? ?print("enter", self.name)
> > ? ?def __exit__(self, *args):
> > ? ? ? ?print("exit", self.name)
> >
> > with FakeContext("a") as a, FakeContext("b") as b:
> > ? ?pass # works fine
> >
> >
> > with FakeContext("a") as a,
> > ? ? FakeContext("b") as b:
> > ? ?pass # synax error
> >
> >
> > with (FakeContext("a") as a,
> > ? ? ?FakeContext("b") as b):
> > ? ?pass # synax error
> >
> > The use case where this mattered to me was this:
> >
> > ? ?with open(args.actual, encoding="utf-8") as afh,
> > ? ?open(args.expected, encoding="utf-8") as efh: actual =
> > ? ?[line.rstrip("\n\r") for line in afh.readlines()] expected =
> > ? ?[line.rstrip("\n\r") for line in efh.readlines()]
> >
> > Naturally, I could split the line in an ugly place:
> >
> > ? ?with open(args.actual, encoding="utf-8") as afh,
> > open(args.expected, encoding="utf-8") as efh:
> >
> > but it seems a shame to do so. Or am I missing something?
> >
> > I'm using Python 3.1.2.
> >
> > --
> > Mark Summerfield, Qtrac Ltd, www.qtrac.eu
> > ? ?C++, Python, Qt, PyQt - training and consultancy
> > ? ? ? ?"Rapid GUI Programming with Python and Qt" - ISBN 0132354187
> > ? ? ? ? ? ?http://www.qtrac.eu/pyqtbook.html
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >



-- 
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
    C++, Python, Qt, PyQt - training and consultancy
        "Programming in Python 3" - ISBN 0321680561
            http://www.qtrac.eu/py3book.html


From ben+python at benfinney.id.au  Thu Sep  9 09:55:38 2010
From: ben+python at benfinney.id.au (Ben Finney)
Date: Thu, 09 Sep 2010 17:55:38 +1000
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
References: <20100908175029.6617ae3b@dino>
	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>
	<AANLkTimG_uqaJYjpNRN1BM4xzp+gOnDBV7Eodiev+tK8@mail.gmail.com>
Message-ID: <87k4mv9wqt.fsf@benfinney.id.au>

"Gregory P. Smith" <greg at krypto.org>
writes:

> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider <nathan at cmu.edu> wrote:
> > I have approached these cases by using the backslash line-continuation
> > operator:
> >
> > with FakeContext("a") as a, \
> >   FakeContext("b") as b:
> >   pass
>
> I'm in the "\ is evil" at all costs camp [?]

I agree, especially when we have a much neater continuation mechanism
that could work just fine here::

    with (FakeContext("a") as a,
             FakeContext("b") as b):
         pass

-- 
 \      ?[Entrenched media corporations will] maintain the status quo, |
  `\       or die trying. Either is better than actually WORKING for a |
_o__)                  living.? ?ringsnake.livejournal.com, 2007-11-12 |
Ben Finney



From andy at insectnation.org  Thu Sep  9 11:06:25 2010
From: andy at insectnation.org (Andy Buckley)
Date: Thu, 09 Sep 2010 10:06:25 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <AANLkTikVrUkzP-=77-86ggbK6DoXTSMZEFg7+Eokooj8@mail.gmail.com>
References: <20100908175029.6617ae3b@dino>
	<i68jkk$dnf$1@dough.gmane.org>	<AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
	<AANLkTikVrUkzP-=77-86ggbK6DoXTSMZEFg7+Eokooj8@mail.gmail.com>
Message-ID: <4C88A391.5070209@insectnation.org>

On 09/09/10 00:47, Mike Graham wrote:
> On Wed, Sep 8, 2010 at 5:30 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I've sometimes wondered if we should consider the idea of making line
>> continuation implicit between keywords and their associated colons.
> 
> This would also have the nice aesthetic quality of making colons serve
> a purpose.

Good point! I'm regularly niggled that backslash continuations are
needed for long conditional statements where parentheses are not
logically necessary (and look disturbingly unpythonic.) There's no
ambiguity in allowing statements to extend until the colon, particularly
if Greg's "at least as far" indentation rule is applied. +1 from me.

Andy



From g.brandl at gmx.net  Thu Sep  9 14:08:25 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 09 Sep 2010 14:08:25 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <4C881A13.4060709@canterbury.ac.nz>
References: <20100908175029.6617ae3b@dino>
	<i68jkk$dnf$1@dough.gmane.org>	<AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>	<4C880B67.5070607@mrabarnett.plus.com>
	<4C881A13.4060709@canterbury.ac.nz>
Message-ID: <i6aiuj$k2r$1@dough.gmane.org>

Am 09.09.2010 01:19, schrieb Greg Ewing:
> MRAB wrote:
>> On 08/09/2010 22:30, Nick Coghlan wrote:
>> 
>>> I've sometimes wondered if we should consider the idea of making line
>>> continuation implicit between keywords and their associated colons.
>>>
>> If a colon was omitted by mistake, how much later would the parser
>> report a syntax error?
> 
> It might be best to allow this only if the continuation
> lines are indented at least as far as the starting line.

That is dangerous, it makes the whitespace rules more complicated.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



From g.brandl at gmx.net  Thu Sep  9 14:14:50 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 09 Sep 2010 14:14:50 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <20100909064951.1e1b4df3@dino>
References: <20100908175029.6617ae3b@dino>	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>
	<20100909064951.1e1b4df3@dino>
Message-ID: <i6ajal$maa$1@dough.gmane.org>

Am 09.09.2010 07:49, schrieb Mark Summerfield:
> Hi Nathan,
> 
> On Wed, 8 Sep 2010 13:00:25 -0400
> Nathan Schneider <nathan at cmu.edu> wrote:
>> Mark,
>> 
>> I have approached these cases by using the backslash
>> line-continuation operator:
>> 
>> with FakeContext("a") as a, \
>>    FakeContext("b") as b:
>>    pass
> 
> Yes, of course, and that's the way I've done it. But it seems a pity to
> do it this way when the documentation explicitly discourages the use of
> the backslash for line continuation:
> http://docs.python.org/py3k/howto/doanddont.html
> (look at the very last item)

Which is actually factually incorrect and should be rewritten.  The only
situation where stray whitespace after a backslash is valid syntax is
within a string literal (and there, there is no alternative).

So at least the "stray whitespace leads to silently buggy code" reason
not to use backslashes is wrong.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



From g.brandl at gmx.net  Thu Sep  9 14:17:37 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 09 Sep 2010 14:17:37 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <87k4mv9wqt.fsf@benfinney.id.au>
References: <20100908175029.6617ae3b@dino>	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>	<AANLkTimG_uqaJYjpNRN1BM4xzp+gOnDBV7Eodiev+tK8@mail.gmail.com>
	<87k4mv9wqt.fsf@benfinney.id.au>
Message-ID: <i6ajfr$maa$3@dough.gmane.org>

Am 09.09.2010 09:55, schrieb Ben Finney:
> "Gregory P. Smith" <greg at krypto.org>
> writes:
> 
>> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider <nathan at cmu.edu> wrote:
>> > I have approached these cases by using the backslash line-continuation
>> > operator:
>> >
>> > with FakeContext("a") as a, \
>> >   FakeContext("b") as b:
>> >   pass
>>
>> I'm in the "\ is evil" at all costs camp [?]
> 
> I agree, especially when we have a much neater continuation mechanism
> that could work just fine here::
> 
>     with (FakeContext("a") as a,
>              FakeContext("b") as b):
>          pass

No, it could not work just fine.  You are basically banning tuples from the
context expression (remember that the "as" clause is optional).

Maybe one could argue that this is not a problem because tuples are not
context managers anyway, but how would this work then:

i = 0 or 1
with (a, b)[i]:

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



From g.brandl at gmx.net  Thu Sep  9 14:16:49 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 09 Sep 2010 14:16:49 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <87k4mv9wqt.fsf@benfinney.id.au>
References: <20100908175029.6617ae3b@dino>	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>	<AANLkTimG_uqaJYjpNRN1BM4xzp+gOnDBV7Eodiev+tK8@mail.gmail.com>
	<87k4mv9wqt.fsf@benfinney.id.au>
Message-ID: <i6ajeb$maa$2@dough.gmane.org>

Am 09.09.2010 09:55, schrieb Ben Finney:
> "Gregory P. Smith" <greg at krypto.org>
> writes:
> 
>> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider <nathan at cmu.edu> wrote:
>> > I have approached these cases by using the backslash line-continuation
>> > operator:
>> >
>> > with FakeContext("a") as a, \
>> >   FakeContext("b") as b:
>> >   pass
>>
>> I'm in the "\ is evil" at all costs camp [?]
> 
> I agree, especially when we have a much neater continuation mechanism
> that could work just fine here::
> 
>     with (FakeContext("a") as a,
>              FakeContext("b") as b):
>          pass

No, it could not work just fine.  You are basically banning tuples from the
context expression (remember that the "as" clause is optional).

You would argue that this is not a problem because tuples are not context

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



From ncoghlan at gmail.com  Thu Sep  9 14:53:37 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 9 Sep 2010 22:53:37 +1000
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <i6aiuj$k2r$1@dough.gmane.org>
References: <20100908175029.6617ae3b@dino> <i68jkk$dnf$1@dough.gmane.org>
	<AANLkTineJgzc+Sd3n=fNfKZZpykh=0E_=3NFy-dNCXJ+@mail.gmail.com>
	<4C880B67.5070607@mrabarnett.plus.com>
	<4C881A13.4060709@canterbury.ac.nz> <i6aiuj$k2r$1@dough.gmane.org>
Message-ID: <AANLkTimcrM4y0-9gTtR_9g1H7kY09rjTUnu7L9jYCcod@mail.gmail.com>

On Thu, Sep 9, 2010 at 10:08 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Am 09.09.2010 01:19, schrieb Greg Ewing:
>> MRAB wrote:
>>> On 08/09/2010 22:30, Nick Coghlan wrote:
>>>
>>>> I've sometimes wondered if we should consider the idea of making line
>>>> continuation implicit between keywords and their associated colons.
>>>>
>>> If a colon was omitted by mistake, how much later would the parser
>>> report a syntax error?
>>
>> It might be best to allow this only if the continuation
>> lines are indented at least as far as the starting line.
>
> That is dangerous, it makes the whitespace rules more complicated.

I'm actually not sure it is even *possible* in general to implement my
suggestion given the deliberate limitations of Python's parser.
Parentheses normally work their indentation-ignoring magic by dropping
down into expression evaluation scope where indentation isn't
significant (import is a special case where this doesn't quite happen,
but it's a rather constrained one).

This is definitely a wart in the with statement syntax, but it really
isn't clear how best to resolve it.

You can at least use parentheses in the individual context
expressions, even though you can't wrap the whole thing:

.>> from contextlib import contextmanager
.>> @contextmanager
... def FakeContext(a):
...   yield a
...
.>> with FakeContext(1) as x, (
...      FakeContext(2)) as y:
...   print(x, y)
...
1 2


Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From grosser.meister.morti at gmx.net  Thu Sep  9 15:02:24 2010
From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Thu, 09 Sep 2010 15:02:24 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <i6ajfr$maa$3@dough.gmane.org>
References: <20100908175029.6617ae3b@dino>	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>	<AANLkTimG_uqaJYjpNRN1BM4xzp+gOnDBV7Eodiev+tK8@mail.gmail.com>	<87k4mv9wqt.fsf@benfinney.id.au>
	<i6ajfr$maa$3@dough.gmane.org>
Message-ID: <4C88DAE0.9070607@gmx.net>

On 09/09/2010 02:17 PM, Georg Brandl wrote:
> Am 09.09.2010 09:55, schrieb Ben Finney:
>> "Gregory P. Smith"<greg at krypto.org>
>> writes:
>>
>>> On Wed, Sep 8, 2010 at 10:00 AM, Nathan Schneider<nathan at cmu.edu>  wrote:
>>>> I have approached these cases by using the backslash line-continuation
>>>> operator:
>>>>
>>>> with FakeContext("a") as a, \
>>>>    FakeContext("b") as b:
>>>>    pass
>>>
>>> I'm in the "\ is evil" at all costs camp [?]
>>
>> I agree, especially when we have a much neater continuation mechanism
>> that could work just fine here::
>>
>>      with (FakeContext("a") as a,
>>               FakeContext("b") as b):
>>           pass
>
> No, it could not work just fine.  You are basically banning tuples from the
> context expression (remember that the "as" clause is optional).
>
> Maybe one could argue that this is not a problem because tuples are not
> context managers anyway, but how would this work then:
>
> i = 0 or 1
> with (a, b)[i]:
>
> Georg
>

Just write:
with ((a, b)[i]):

It's ugly but it would work. ;)

	-panzi


From mal at egenix.com  Thu Sep  9 15:32:15 2010
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 09 Sep 2010 15:32:15 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <20100908175029.6617ae3b@dino>
References: <20100908175029.6617ae3b@dino>
Message-ID: <4C88E1DF.3090502@egenix.com>

Mark Summerfield wrote:
> Hi,
> 
> I can't see a _nice_ way of splitting a with statement over mulitple
> lines:
> 
> class FakeContext:
>     def __init__(self, name):
>         self.name = name
>     def __enter__(self):
>         print("enter", self.name)
>     def __exit__(self, *args):
>         print("exit", self.name)
> 
> with FakeContext("a") as a, FakeContext("b") as b:
>     pass # works fine
> 
> 
> with FakeContext("a") as a,
>      FakeContext("b") as b:
>     pass # synax error
> 
> 
> with (FakeContext("a") as a,
>       FakeContext("b") as b):
>     pass # synax error
> 
> The use case where this mattered to me was this:
> 
>     with open(args.actual, encoding="utf-8") as afh,
>     open(args.expected, encoding="utf-8") as efh: actual =
>     [line.rstrip("\n\r") for line in afh.readlines()] expected =
>     [line.rstrip("\n\r") for line in efh.readlines()]
> 
> Naturally, I could split the line in an ugly place:
> 
>     with open(args.actual, encoding="utf-8") as afh, open(args.expected,
> 	    encoding="utf-8") as efh:
> 
> but it seems a shame to do so. Or am I missing something?

Why do you need to put everything on one line ?

afh = open(args.actual, encoding="utf-8")
efh = open(args.expected, encoding="utf-8")

with afh, efh:
   ...

In the context of files, the only purpose of the with statement
is to close them when leaving the block.

>>> a = open('/etc/passwd')
>>> b = open('/etc/group')
>>> with a,b: print a.readline(), b.readline()
...
at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
at:!:25:

>>> a
<closed file '/etc/passwd', mode 'r' at 0x7f0093e62390>
>>> b
<closed file '/etc/group', mode 'r' at 0x7f0093e62420>

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 09 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2010-08-19: Released mxODBC 3.1.0              http://python.egenix.com/
2010-09-15: DZUG Tagung, Dresden, Germany                   6 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


From fuzzyman at voidspace.org.uk  Thu Sep  9 15:41:52 2010
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Thu, 9 Sep 2010 14:41:52 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <4C88E1DF.3090502@egenix.com>
References: <20100908175029.6617ae3b@dino>
	<4C88E1DF.3090502@egenix.com>
Message-ID: <AANLkTi=sHcw1T7+e=R3ZCKQTOrO_BjMkLCFTFQu-Yu5a@mail.gmail.com>

On 9 September 2010 14:32, M.-A. Lemburg <mal at egenix.com> wrote:

> [snip...]
> Why do you need to put everything on one line ?
>
> afh = open(args.actual, encoding="utf-8")
> efh = open(args.expected, encoding="utf-8")
>
> with afh, efh:
>   ...
>
> In the context of files, the only purpose of the with statement
> is to close them when leaving the block.
>
> >>> a = open('/etc/passwd')
> >>> b = open('/etc/group')
>

If my understanding is correct (which is perhaps unlikely...), using a
single line will close a if opening b fails. Whereas doing them separately
before the with statement risks leaving the first un-exited if creating the
second fails.

Michael


> >>> with a,b: print a.readline(), b.readline()
> ...
> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
> at:!:25:
>
> >>> a
> <closed file '/etc/passwd', mode 'r' at 0x7f0093e62390>
> >>> b
> <closed file '/etc/group', mode 'r' at 0x7f0093e62420>
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, Sep 09 2010)
> >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
> >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 2010-08-19: Released mxODBC 3.1.0              http://python.egenix.com/
> 2010-09-15 <http://python.egenix.com/%0A2010-09-15>: DZUG Tagung, Dresden,
> Germany                   6 days to go
>
> ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
>
>
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
http://www.voidspace.org.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100909/ce124b99/attachment.html>

From mal at egenix.com  Thu Sep  9 15:53:49 2010
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 09 Sep 2010 15:53:49 +0200
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <AANLkTi=sHcw1T7+e=R3ZCKQTOrO_BjMkLCFTFQu-Yu5a@mail.gmail.com>
References: <20100908175029.6617ae3b@dino>	<4C88E1DF.3090502@egenix.com>
	<AANLkTi=sHcw1T7+e=R3ZCKQTOrO_BjMkLCFTFQu-Yu5a@mail.gmail.com>
Message-ID: <4C88E6ED.8000807@egenix.com>

Michael Foord wrote:
> On 9 September 2010 14:32, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>> [snip...]
>> Why do you need to put everything on one line ?
>>
>> afh = open(args.actual, encoding="utf-8")
>> efh = open(args.expected, encoding="utf-8")
>>
>> with afh, efh:
>>   ...
>>
>> In the context of files, the only purpose of the with statement
>> is to close them when leaving the block.
>>
>>>>> a = open('/etc/passwd')
>>>>> b = open('/etc/group')
>>
> 
> If my understanding is correct (which is perhaps unlikely...), using a
> single line will close a if opening b fails. Whereas doing them separately
> before the with statement risks leaving the first un-exited if creating the
> second fails.

Right, but if you stuff everything on a single line, your
error handling will have a hard time figuring out which of
the two failed to open.

I was under the impression that Mark wanted to "protect" the
inner block of the with statement, not the context manager
creation itself.

As usual: hiding away too much stuff in your closet makes things
look tidy, but causes a hell of a mess if you ever need to open
it again :-)

> Michael
> 
> 
>>>>> with a,b: print a.readline(), b.readline()
>> ...
>> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
>> at:!:25:
>>
>>>>> a
>> <closed file '/etc/passwd', mode 'r' at 0x7f0093e62390>
>>>>> b
>> <closed file '/etc/group', mode 'r' at 0x7f0093e62420>
>>
>> --
>> Marc-Andre Lemburg
>> eGenix.com
>>
>> Professional Python Services directly from the Source  (#1, Sep 09 2010)
>>>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
>> ________________________________________________________________________
>> 2010-08-19: Released mxODBC 3.1.0              http://python.egenix.com/
>> 2010-09-15 <http://python.egenix.com/%0A2010-09-15>: DZUG Tagung, Dresden,
>> Germany                   6 days to go
>>
>> ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
>>
>>
>>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>>           Registered at Amtsgericht Duesseldorf: HRB 46611
>>               http://www.egenix.com/company/contact/
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
> 
> 
> 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 09 2010)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2010-08-19: Released mxODBC 3.1.0              http://python.egenix.com/
2010-09-15: DZUG Tagung, Dresden, Germany                   6 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


From mark at qtrac.eu  Thu Sep  9 16:13:54 2010
From: mark at qtrac.eu (Mark Summerfield)
Date: Thu, 9 Sep 2010 15:13:54 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <4C88E6ED.8000807@egenix.com>
References: <20100908175029.6617ae3b@dino> <4C88E1DF.3090502@egenix.com>
	<AANLkTi=sHcw1T7+e=R3ZCKQTOrO_BjMkLCFTFQu-Yu5a@mail.gmail.com>
	<4C88E6ED.8000807@egenix.com>
Message-ID: <20100909151354.6d0ce7a8@dino>

On Thu, 09 Sep 2010 15:53:49 +0200
"M.-A. Lemburg" <mal at egenix.com> wrote:
> Michael Foord wrote:
> > On 9 September 2010 14:32, M.-A. Lemburg <mal at egenix.com> wrote:
> > 
> >> [snip...]
> >> Why do you need to put everything on one line ?
> >>
> >> afh = open(args.actual, encoding="utf-8")
> >> efh = open(args.expected, encoding="utf-8")
> >>
> >> with afh, efh:
> >>   ...
> >>
> >> In the context of files, the only purpose of the with statement
> >> is to close them when leaving the block.
> >>
> >>>>> a = open('/etc/passwd')
> >>>>> b = open('/etc/group')
> >>
> > 
> > If my understanding is correct (which is perhaps unlikely...),
> > using a single line will close a if opening b fails. Whereas doing
> > them separately before the with statement risks leaving the first
> > un-exited if creating the second fails.
> 
> Right, but if you stuff everything on a single line, your
> error handling will have a hard time figuring out which of
> the two failed to open.
> 
> I was under the impression that Mark wanted to "protect" the
> inner block of the with statement, not the context manager
> creation itself.

Actually, I was more interested in the aesthetics. I've become
habituated to _never_ using \ continuations and found it unsightly to
need one here.

> As usual: hiding away too much stuff in your closet makes things
> look tidy, but causes a hell of a mess if you ever need to open
> it again :-)

:-)

> 
> > Michael
> > 
> > 
> >>>>> with a,b: print a.readline(), b.readline()
> >> ...
> >> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
> >> at:!:25:
> >>
> >>>>> a
> >> <closed file '/etc/passwd', mode 'r' at 0x7f0093e62390>
> >>>>> b
> >> <closed file '/etc/group', mode 'r' at 0x7f0093e62420>
> >>
> >> --
> >> Marc-Andre Lemburg
> >> eGenix.com
> >>
> >> Professional Python Services directly from the Source  (#1, Sep 09
> >> 2010)
> >>>>> Python/Zope Consulting and Support ...
> >>>>> http://www.egenix.com/
> >>>>> mxODBC.Zope.Database.Adapter ...
> >>>>> http://zope.egenix.com/ mxODBC, mxDateTime,
> >>>>> mxTextTools ...        http://python.egenix.com/
> >> ________________________________________________________________________
> >> 2010-08-19: Released mxODBC 3.1.0
> >> http://python.egenix.com/ 2010-09-15
> >> <http://python.egenix.com/%0A2010-09-15>: DZUG Tagung, Dresden,
> >> Germany                   6 days to go
> >>
> >> ::: Try our new mxODBC.Connect Python Database Interface for
> >> free ! ::::
> >>
> >>
> >>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
> >>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> >>           Registered at Amtsgericht Duesseldorf: HRB 46611
> >>               http://www.egenix.com/company/contact/
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at python.org
> >> http://mail.python.org/mailman/listinfo/python-ideas
> >>
> > 
> > 
> > 
> 



-- 
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
    C++, Python, Qt, PyQt - training and consultancy
        "Programming in Python 3" - ISBN 0321680561
            http://www.qtrac.eu/py3book.html


From fuzzyman at voidspace.org.uk  Thu Sep  9 16:34:25 2010
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Thu, 9 Sep 2010 15:34:25 +0100
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <4C88E6ED.8000807@egenix.com>
References: <20100908175029.6617ae3b@dino> <4C88E1DF.3090502@egenix.com>
	<AANLkTi=sHcw1T7+e=R3ZCKQTOrO_BjMkLCFTFQu-Yu5a@mail.gmail.com>
	<4C88E6ED.8000807@egenix.com>
Message-ID: <AANLkTimZ1upOMqSCwtgcApJw4Aaz39DW00pos0Yv1oFY@mail.gmail.com>

On 9 September 2010 14:53, M.-A. Lemburg <mal at egenix.com> wrote:

> Michael Foord wrote:
> > On 9 September 2010 14:32, M.-A. Lemburg <mal at egenix.com> wrote:
> >
> >> [snip...]
> >> Why do you need to put everything on one line ?
> >>
> >> afh = open(args.actual, encoding="utf-8")
> >> efh = open(args.expected, encoding="utf-8")
> >>
> >> with afh, efh:
> >>   ...
> >>
> >> In the context of files, the only purpose of the with statement
> >> is to close them when leaving the block.
> >>
> >>>>> a = open('/etc/passwd')
> >>>>> b = open('/etc/group')
> >>
> >
> > If my understanding is correct (which is perhaps unlikely...), using a
> > single line will close a if opening b fails. Whereas doing them
> separately
> > before the with statement risks leaving the first un-exited if creating
> the
> > second fails.
>
> Right, but if you stuff everything on a single line, your
> error handling will have a hard time figuring out which of
> the two failed to open.
>

If you *need* to distinguish at a higher level then you have no choice. I
was really just pointing out that there are *semantic* differences as well,
and in fact the code you posted is less safe than the one line version. You
lose some of the error handling built-in to context manager creation.

Michael


>
> I was under the impression that Mark wanted to "protect" the
> inner block of the with statement, not the context manager
> creation itself.
>
> As usual: hiding away too much stuff in your closet makes things
> look tidy, but causes a hell of a mess if you ever need to open
> it again :-)
>
> > Michael
> >
> >
> >>>>> with a,b: print a.readline(), b.readline()
> >> ...
> >> at:x:25:25:Batch jobs daemon:/var/spool/atjobs:/bin/bash
> >> at:!:25:
> >>
> >>>>> a
> >> <closed file '/etc/passwd', mode 'r' at 0x7f0093e62390>
> >>>>> b
> >> <closed file '/etc/group', mode 'r' at 0x7f0093e62420>
> >>
> >> --
> >> Marc-Andre Lemburg
> >> eGenix.com
> >>
> >> Professional Python Services directly from the Source  (#1, Sep 09 2010)
> >>>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
> >>>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
> >>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> >> ________________________________________________________________________
> >> 2010-08-19: Released mxODBC 3.1.0
> http://python.egenix.com/
> >> 2010-09-15 <http://python.egenix.com/%0A2010-09-15>: DZUG Tagung,
> Dresden,
> >> Germany                   6 days to go
> >>
> >> ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
> >>
> >>
> >>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
> >>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> >>           Registered at Amtsgericht Duesseldorf: HRB 46611
> >>               http://www.egenix.com/company/contact/
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at python.org
> >> http://mail.python.org/mailman/listinfo/python-ideas
> >>
> >
> >
> >
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, Sep 09 2010)
> >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
> >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
> >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 2010-08-19: Released mxODBC 3.1.0              http://python.egenix.com/
> 2010-09-15 <http://python.egenix.com/%0A2010-09-15>: DZUG Tagung, Dresden,
> Germany                   6 days to go
>
> ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
>
>
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
http://www.voidspace.org.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100909/b6207779/attachment.html>

From tjreedy at udel.edu  Thu Sep  9 22:55:15 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 09 Sep 2010 16:55:15 -0400
Subject: [Python-ideas] with statement syntax forces ugly line breaks?
In-Reply-To: <i6ajal$maa$1@dough.gmane.org>
References: <20100908175029.6617ae3b@dino>	<AANLkTinm-P0yz01P-iU7A-kNpopWJeYEkvsL47BN7h+G@mail.gmail.com>	<20100909064951.1e1b4df3@dino>
	<i6ajal$maa$1@dough.gmane.org>
Message-ID: <i6bhjj$bgr$1@dough.gmane.org>

On 9/9/2010 8:14 AM, Georg Brandl wrote:
> Am 09.09.2010 07:49, schrieb Mark Summerfield:
>> Hi Nathan,
>>
>> On Wed, 8 Sep 2010 13:00:25 -0400
>> Nathan Schneider<nathan at cmu.edu>  wrote:
>>> Mark,
>>>
>>> I have approached these cases by using the backslash
>>> line-continuation operator:
>>>
>>> with FakeContext("a") as a, \
Adding a space makes the following a SyntaxError.
No silent error here.

>>>     FakeContext("b") as b:
>>>     pass
>>
>> Yes, of course, and that's the way I've done it. But it seems a pity to
>> do it this way when the documentation explicitly discourages the use of
>> the backslash for line continuation:
>> http://docs.python.org/py3k/howto/doanddont.html
>> (look at the very last item)

If no one uses \ for end of line escape, it should be removed ...
But I am not suggesting that.

> Which is actually factually incorrect and should be rewritten.  The only
> situation where stray whitespace after a backslash is valid syntax is
> within a string literal (and there, there is no alternative).
>
> So at least the "stray whitespace leads to silently buggy code" reason
> not to use backslashes is wrong.
>
> Georg
>


-- 
Terry Jan Reedy



From cool-rr at cool-rr.com  Fri Sep 10 18:37:44 2010
From: cool-rr at cool-rr.com (cool-RR)
Date: Fri, 10 Sep 2010 18:37:44 +0200
Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ?
Message-ID: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>

I noticed that it's impossible to call a Python function with two starred
argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone
wants to feed two lists of arguments into a function, why not?

I understand why you can't have two stars in a function definition; But why
can't you have two (or more) stars in a function call?


Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100910/bd8a872b/attachment.html>

From python at mrabarnett.plus.com  Fri Sep 10 18:54:33 2010
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 10 Sep 2010 17:54:33 +0100
Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ?
In-Reply-To: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
Message-ID: <4C8A62C9.1040206@mrabarnett.plus.com>

On 10/09/2010 17:37, cool-RR wrote:
> I noticed that it's impossible to call a Python function with two
> starred argument lists, like this: `f(*my_list, *my_other_list)`. I
> mean, if someone wants to feed two lists of arguments into a function,
> why not?
>
> I understand why you can't have two stars in a function definition; But
> why can't you have two (or more) stars in a function call?
>
Would there be any advantage over `f(*(my_list + my_other_list))`?

(Send to wrong list originally :-()


From benjamin at python.org  Fri Sep 10 19:03:20 2010
From: benjamin at python.org (Benjamin Peterson)
Date: Fri, 10 Sep 2010 17:03:20 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_not_f=28*my=5Flist=2C*my=5Fother=5Fl?=
	=?utf-8?b?aXN0KSA/?=
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
Message-ID: <loom.20100910T190239-534@post.gmane.org>

cool-RR <cool-rr at ...> writes:

> 
> I noticed that it's impossible to call a Python function with two starred
argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone
wants to feed two lists of arguments into a function, why not?

Okay, so why would you want to?



From phd at phd.pp.ru  Fri Sep 10 18:57:13 2010
From: phd at phd.pp.ru (Oleg Broytman)
Date: Fri, 10 Sep 2010 20:57:13 +0400
Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ?
In-Reply-To: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
Message-ID: <20100910165713.GA24612@phd.pp.ru>

On Fri, Sep 10, 2010 at 06:37:44PM +0200, cool-RR wrote:
> f(*my_list, *my_other_list)

   Not every one-lined should be a syntax. Just call

f(*(my_list + my_other_list))

Oleg.
-- 
     Oleg Broytman            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.


From stefan_ml at behnel.de  Fri Sep 10 19:16:52 2010
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 10 Sep 2010 19:16:52 +0200
Subject: [Python-ideas] Why not f(*my_list,*my_other_list) ?
In-Reply-To: <loom.20100910T190239-534@post.gmane.org>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
	<loom.20100910T190239-534@post.gmane.org>
Message-ID: <i6dp64$3lc$2@dough.gmane.org>

Benjamin Peterson, 10.09.2010 19:03:
> cool-RR<cool-rr at ...>  writes:
>
>>
>> I noticed that it's impossible to call a Python function with two starred
> argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone
> wants to feed two lists of arguments into a function, why not?
>
> Okay, so why would you want to?

Well, it can happen. It doesn't merit a syntax extension, though. You can 
just do

     args_for_f = tuple(my_list) + tuple(my_other_list)

     f(*args_for_f)

(using tuple() here in case both are not really lists)

Stefan



From daniel at stutzbachenterprises.com  Fri Sep 10 19:34:42 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Fri, 10 Sep 2010 12:34:42 -0500
Subject: [Python-ideas] Why not f(*my_list,*my_other_list) ?
In-Reply-To: <i6dp64$3lc$2@dough.gmane.org>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
	<loom.20100910T190239-534@post.gmane.org>
	<i6dp64$3lc$2@dough.gmane.org>
Message-ID: <AANLkTimaWf0jMQKfr+WxTBbmarD9CL=sM2t6W4uLCPcJ@mail.gmail.com>

On Fri, Sep 10, 2010 at 12:16 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:

>    args_for_f = tuple(my_list) + tuple(my_other_list)
>    f(*args_for_f)
>

An alternative with better performance is:

from itertools import chain
f(*chain(my_list, my_other_list))
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100910/fd0a1ab6/attachment.html>

From sergio at gruposinternet.com.br  Fri Sep 10 19:43:30 2010
From: sergio at gruposinternet.com.br (=?ISO-8859-1?Q?S=E9rgio?= Surkamp)
Date: Fri, 10 Sep 2010 14:43:30 -0300
Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ?
In-Reply-To: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
Message-ID: <20100910144330.640866f2@icedearth.corp.grupos.com.br>

Em Fri, 10 Sep 2010 18:37:44 +0200
cool-RR <cool-rr at cool-rr.com> escreveu:

> I noticed that it's impossible to call a Python function with two
> starred argument lists, like this: `f(*my_list, *my_other_list)`. I
> mean, if someone wants to feed two lists of arguments into a
> function, why not?
> 
> I understand why you can't have two stars in a function definition;
> But why can't you have two (or more) stars in a function call?
> 
> 
> Ram.

How the compiler should treat that? Put half of the arguments in the
first list and the other half on the second list?

Regards,
-- 
  .:''''':.
.:'        `     S?rgio Surkamp | Gerente de Rede
::    ........   sergio at gruposinternet.com.br
`:.        .:'
  `:,   ,.:'     *Grupos Internet S.A.*
    `: :'        R. Lauro Linhares, 2123 Torre B - Sala 201
     : :         Trindade - Florian?polis - SC
     :.'
     ::          +55 48 3234-4109
     :
     '           http://www.gruposinternet.com.br


From mikegraham at gmail.com  Fri Sep 10 21:28:09 2010
From: mikegraham at gmail.com (Mike Graham)
Date: Fri, 10 Sep 2010 15:28:09 -0400
Subject: [Python-ideas] Why not f(*my_list,*my_other_list) ?
In-Reply-To: <AANLkTimaWf0jMQKfr+WxTBbmarD9CL=sM2t6W4uLCPcJ@mail.gmail.com>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
	<loom.20100910T190239-534@post.gmane.org>
	<i6dp64$3lc$2@dough.gmane.org>
	<AANLkTimaWf0jMQKfr+WxTBbmarD9CL=sM2t6W4uLCPcJ@mail.gmail.com>
Message-ID: <AANLkTinWmF2M=k0QeKq871kjU38wMYyRf6bNC40h8N8Z@mail.gmail.com>

On Fri, Sep 10, 2010 at 1:34 PM, Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> An alternative with better performance is:
>
> from itertools import chain
> f(*chain(my_list, my_other_list))

Maybe.


From tjreedy at udel.edu  Fri Sep 10 23:25:35 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 10 Sep 2010 17:25:35 -0400
Subject: [Python-ideas] Why not f(*my_list, *my_other_list) ?
In-Reply-To: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
References: <AANLkTikTaNMuEFO_pnCS0FNbLHmXX-gCfLYg=RCt_c8F@mail.gmail.com>
Message-ID: <i6e7og$5ng$1@dough.gmane.org>

On 9/10/2010 12:37 PM, cool-RR wrote:
> I noticed that it's impossible to call a Python function with two
> starred argument lists, like this: `f(*my_list, *my_other_list)`. I
> mean, if someone wants to feed two lists of arguments into a function,
> why not?
>
> I understand why you can't have two stars in a function definition; But
> why can't you have two (or more) stars in a function call?

Beyond
0. Not needed
as others explained, some speculations:

1. Calls are designed to mirror definition. No multiple stars in 
definition means no multiple stars in calls.

2. Multiple stars begin to look like typing errors.

3. No one ever thought to support such.

4. It would make the call process even more complex, and it is slow 
enough already.

5. It might conflict with the current implementation.

-- 
Terry Jan Reedy



From guido at python.org  Sat Sep 11 01:25:04 2010
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Sep 2010 16:25:04 -0700
Subject: [Python-ideas] [Python-Dev] Python needs a standard
	asynchronous return object
In-Reply-To: <4C8AB874.9010703@openvpn.net>
References: <4C8AB874.9010703@openvpn.net>
Message-ID: <AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>

Moving to python-ideas.

Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems
exactly what you want.

--Guido

On Fri, Sep 10, 2010 at 4:00 PM, James Yonan <james at openvpn.net> wrote:
> I'd like to propose that the Python community standardize on a "deferred"
> object for asynchronous return values, modeled after the well-thought-out
> Twisted Deferred class.
>
> With more and more Python libraries implementing asynchronicity (for example
> Futures -- PEP 3148), it's crucial to have a standard deferred object in
> place so that code using a single asynchronous reactor can interoperate with
> different asynchronous libraries.
>
> I think a lot of people don't realize how much cooler and more elegant it is
> to return a deferred object from an asynchronous function rather than using
> a generic callback approach (where you pass a function argument to the
> asynchronous function telling it where to call when the asynchronous
> operation completes).
>
> While asynchronous systems have been shown to have excellent scalability
> properties, the callback-based programming style often used in asynchronous
> programming has been criticized for breaking up the sequential readability
> of program logic.
>
> This problem is elegantly addressed by using Deferred Generators. ?Since
> Python 2.5 added enhanced generators (i.e. the capability for "yield" to
> return a value), the infrastructure is now in place to allow an asynchronous
> function to be written in a sequential style, without the use of explicit
> callbacks.
>
> See the following blog article for a nice write-up on the capability:
>
> http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html
>
> Mekk's Twisted Deferred example:
>
> @defer.inlineCallbacks
> def someFunction():
> ? ?a = 1
> ? ?b = yield deferredReturningFunction(a)
> ? ?c = yield anotherDeferredReturningFunction(a, b)
> ? ?defer.returnValue(c)
>
> What's cool about this is that between the two yield statements, the Twisted
> reactor is in control meaning that other pending asynchronous tasks can be
> attended to or the thread's remaining time slice can be yielded to the
> kernel, yet this is all accomplished without the use of multi-threading.
> ?Another interesting aspect of this approach is that since it leverages on
> Python's enhanced generators, an exception thrown inside either of the
> deferred-returning functions will be propagated through to someFunction()
> where it can be handled with try/except.
>
> Think about what this means -- this sort of emulates the "stackless" design
> pattern you would expect in Erlang or Stackless Python without leaving
> standard Python. ?And it's made possible under the hood by Python Enhanced
> Generators.
>
> Needless to say, it would be great to see this coolness be part of the
> standard Python library, instead of having every Python asynchronous library
> implement its own ad-hoc callback system.
>
> James Yonan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Sat Sep 11 02:07:19 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 11 Sep 2010 10:07:19 +1000
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
Message-ID: <AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>

On Sat, Sep 11, 2010 at 9:25 AM, Guido van Rossum <guido at python.org> wrote:
> Moving to python-ideas.
>
> Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems
> exactly what you want.

James did mention that in the post, although he didn't say what
deferreds really added beyond what futures provide, and why the
"add_done_callback" method isn't adequate to provide interoperability
between futures and deferreds (which would be odd, since Brian made
changes to that part of PEP 3148 to help with that interoperability
after discussions with Glyph).

Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope
for standardisation in this space though.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From jnoller at gmail.com  Sat Sep 11 18:03:12 2010
From: jnoller at gmail.com (Jesse Noller)
Date: Sat, 11 Sep 2010 09:03:12 -0700
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
Message-ID: <AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>

On Fri, Sep 10, 2010 at 5:07 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sat, Sep 11, 2010 at 9:25 AM, Guido van Rossum <guido at python.org> wrote:
>> Moving to python-ideas.
>>
>> Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems
>> exactly what you want.
>
> James did mention that in the post, although he didn't say what
> deferreds really added beyond what futures provide, and why the
> "add_done_callback" method isn't adequate to provide interoperability
> between futures and deferreds (which would be odd, since Brian made
> changes to that part of PEP 3148 to help with that interoperability
> after discussions with Glyph).
>
> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope
> for standardisation in this space though.
>
> Cheers,
> Nick.

That was my initial reaction as well, but I'm more than open to
hearing from Jean Paul/Glyph and the other twisted folks on this.


From guido at python.org  Sun Sep 12 04:26:50 2010
From: guido at python.org (Guido van Rossum)
Date: Sat, 11 Sep 2010 19:26:50 -0700
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
Message-ID: <AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>

(Summary: I want to make an apology, and reopen the debate. Possibly
relevant: PEP 342, PEP 380, PEP 3148, PEP 3152.)

On Sat, Sep 11, 2010 at 9:03 AM, Jesse Noller <jnoller at gmail.com> wrote:
> On Fri, Sep 10, 2010 at 5:07 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Sat, Sep 11, 2010 at 9:25 AM, Guido van Rossum <guido at python.org> wrote:
>>> Moving to python-ideas.
>>>
>>> Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems
>>> exactly what you want.
>>
>> James did mention that in the post,

Whoops. I was a bit quick at the trigger there.

>> although he didn't say what
>> deferreds really added beyond what futures provide, and why the
>> "add_done_callback" method isn't adequate to provide interoperability
>> between futures and deferreds (which would be odd, since Brian made
>> changes to that part of PEP 3148 to help with that interoperability
>> after discussions with Glyph).
>>
>> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope
>> for standardisation in this space though.
>>
>> Cheers,
>> Nick.
>
> That was my initial reaction as well, but I'm more than open to
> hearing from Jean Paul/Glyph and the other twisted folks on this.

Re-reading the OP's post[0] and the blog[1] he references, I notice
that he did not mention PEP 380 (which for the blog's example doesn't
actually add much except adding a nicer way to return a value from a
generator) but he did mention the awesomeness of not needing threads
when using deferreds. He sounds as if the python-dev community had
never heard of that style of handling concurrency, which seems
backwards: the generator-based style of doing it was introduced in PEP
342 which enabled Twisted's inline callbacks. (Though he does mention
Python Enhanced Generators which could be an implicit reference to PEP
342 -- "Coroutines via Enhanced Generators".)

But thinking about this more I don't know that it will be easy to mix
PEP 3148, which is solidly thread-based, with a PEP 342 style
scheduler (whether or not the PEP 380 enhancements are applied, or
even PEP 3152). And if we take the OP's message at face value, his
point isn't so much that Twisted is great, but that in order to
benefit maximally from PEP 342 there needs to be a standard way of
using callbacks. I think that's probably true. And comparing the
blog's examples to PEP 3148, I find Twisted's terminology rather
confusing compared to the PEP's clean Futures API (where IMO you can
ignore almost everything except result()).

Maybe it's possible to write a little framework that lets you create
Futures using either threads, processes (both supported by PEP 3148)
or generators. But I haven't tried it. And maybe the need to use
'yield' for everything that may block when using generators, but not
when using threads or processes, will make this awkward. So maybe
we'll be stuck with at least two Future-like APIs: PEP 3148 and
something else, generator-based. Or maybe PEP 3152.

So, yes, there may be something here, and let's reopen the discussion.
And I apologize for shooting first and asking questions second.

[0] http://mail.python.org/pipermail/python-dev/2010-September/103576.html
[1] http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html
-- 
--Guido van Rossum (python.org/~guido)


From solipsis at pitrou.net  Sun Sep 12 13:03:38 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 12 Sep 2010 13:03:38 +0200
Subject: [Python-ideas] [Python-Dev] Python needs a standard
	asynchronous return object
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
	<AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
Message-ID: <20100912130338.714643f8@pitrou.net>

On Sat, 11 Sep 2010 19:26:50 -0700
Guido van Rossum <guido at python.org> wrote:
> 
> But thinking about this more I don't know that it will be easy to mix
> PEP 3148, which is solidly thread-based, with a PEP 342 style
> scheduler (whether or not the PEP 380 enhancements are applied, or
> even PEP 3152).

I'm not sure why. The implementation is certainly thread-based, but
functions such as `wait(fs, timeout=None, return_when=ALL_COMPLETED)`
could be implemented in termes of a single-threaded event loop / job
scheduler.

Actually, Twisted has a similar primitive in DeferredList, although
more powerful since the DeferredList itself is a Deferred, and can
therefore be further combined, etc.:

http://twistedmatrix.com/documents/10.0.0/api/twisted.internet.defer.DeferredList.html

> And comparing the
> blog's examples to PEP 3148, I find Twisted's terminology rather
> confusing compared to the PEP's clean Futures API (where IMO you can
> ignore almost everything except result()).

Well, apart from the API which may be considered a taste issue (I have
used Deferreds long before I heard about Futures, so perhaps I'm a bit
biased), the following API doc in PEP 3148 shows that the Future model
of callbacks is less rich than Twisted's:

?add_done_callback(fn)

    Attaches a callable fn to the future that will be called when the
    future is cancelled or finishes running. fn will be called with the
    future as its only argument.

    Added callables are called in the order that they were added and
    are always called in a thread belonging to the process that added
    them. If the callable raises an Exception then it will be logged
    and ignored. If the callable raises another BaseException then
    behavior is not defined.?

With Twisted Deferreds, when a callback or errback raises an error, its
exception isn't ?logged and ignored?, it is passed to the remaining
errback chain attached to the Deferred. This is part of what makes
Deferreds more complicated to understand, but it also makes them more
powerful.

Another key point is that a callback can itself return another Deferred
object, in which case the next callback (or errback, in case of error)
will be called only once the other Deferred produces a result. This is
all handled transparently and you can freely mix callbacks that
immediately return a value, and callbacks that return a Deferred whose
final value will be available later. And the other Deferred can have
its own callback/errback chain, etc.

(just for the record, the ?final value? of a Deferred is the value
returned by the last callback in the chain)


I think the main reason, though, that people find Deferreds
inconvenient is that they force you to think in terms of
asynchronicity (well, almost: you can of course hack yourself
some code which blocks until a Deferred has a value, but it's
extremely discouraged). They would like to have officially
supported methods like `result(timeout=None)` which make simple things
(like quick scripts to fetch a bunch of URLs) simpler. Twisted is
generally used for server applications where such code is out of
question (in an async model, that is).

Regards

Antoine.




From guido at python.org  Sun Sep 12 17:49:56 2010
From: guido at python.org (Guido van Rossum)
Date: Sun, 12 Sep 2010 08:49:56 -0700
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <20100912130338.714643f8@pitrou.net>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
	<AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
	<20100912130338.714643f8@pitrou.net>
Message-ID: <AANLkTimN=Hb3jWDCtKt5PiE_CGJgiMXZyewX7QcxOCc+@mail.gmail.com>

On Sun, Sep 12, 2010 at 4:03 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sat, 11 Sep 2010 19:26:50 -0700
> Guido van Rossum <guido at python.org> wrote:
>>
>> But thinking about this more I don't know that it will be easy to mix
>> PEP 3148, which is solidly thread-based, with a PEP 342 style
>> scheduler (whether or not the PEP 380 enhancements are applied, or
>> even PEP 3152).
>
> I'm not sure why. The implementation is certainly thread-based, but
> functions such as `wait(fs, timeout=None, return_when=ALL_COMPLETED)`
> could be implemented in termes of a single-threaded event loop / job
> scheduler.

Sure, but the tricky thing is to make it pluggable so that PEP 3148
and Twisted and other frameworks can use it all together, and a single
call will accept a mixture of Futures.

I also worry that "impure" code will have a hard time -- e.g. when
mixing generator-based coroutines and thread-based futures, it would
be quite bad if a coroutine called .result() on a Future or the
.wait() function instead of yielding to the scheduler.

> Actually, Twisted has a similar primitive in DeferredList, although
> more powerful since the DeferredList itself is a Deferred, and can
> therefore be further combined, etc.:
>
> http://twistedmatrix.com/documents/10.0.0/api/twisted.internet.defer.DeferredList.html

This sounds similar to the way you can create derived futures in Java.

>> And comparing the
>> blog's examples to PEP 3148, I find Twisted's terminology rather
>> confusing compared to the PEP's clean Futures API (where IMO you can
>> ignore almost everything except result()).
>
> Well, apart from the API which may be considered a taste issue (I have
> used Deferreds long before I heard about Futures, so perhaps I'm a bit
> biased),

I heard of Deferred long before PEP 3148 was even conceived, but I
find Twisted's terminology terribly confusing while I find the PEP's
names easy to understand.

> the following API doc in PEP 3148 shows that the Future model
> of callbacks is less rich than Twisted's:
>
> ?add_done_callback(fn)
>
> ? ?Attaches a callable fn to the future that will be called when the
> ? ?future is cancelled or finishes running. fn will be called with the
> ? ?future as its only argument.
>
> ? ?Added callables are called in the order that they were added and
> ? ?are always called in a thread belonging to the process that added
> ? ?them. If the callable raises an Exception then it will be logged
> ? ?and ignored. If the callable raises another BaseException then
> ? ?behavior is not defined.?
>
> With Twisted Deferreds, when a callback or errback raises an error, its
> exception isn't ?logged and ignored?, it is passed to the remaining
> errback chain attached to the Deferred. This is part of what makes
> Deferreds more complicated to understand, but it also makes them more
> powerful.

Yeah, please do explain why Twisted has so much machinery to handle exceptions?

ISTM that the main difference is that add_done_callback() isn't meant
for callbacks that return a value. So then the exceptions that might
be raised are kind of "out of band". For any API that returns a value
I agree that raising an exception should be handled -- but in the PEP
342 world we can do that by passing exceptions back into coroutine
using throw(), so no separate "success" and "failure" callbacks are
needed.

> Another key point is that a callback can itself return another Deferred
> object, in which case the next callback (or errback, in case of error)
> will be called only once the other Deferred produces a result. This is
> all handled transparently and you can freely mix callbacks that
> immediately return a value, and callbacks that return a Deferred whose
> final value will be available later. And the other Deferred can have
> its own callback/errback chain, etc.

Yeah, that is part of what makes it so utterly confusing. PEP 380
supports a similar thing but much cleaner, without ever using
callbacks.

> (just for the record, the ?final value? of a Deferred is the value
> returned by the last callback in the chain)
>
>
> I think the main reason, though, that people find Deferreds
> inconvenient is that they force you to think in terms of
> asynchronicity (well, almost: you can of course hack yourself
> some code which blocks until a Deferred has a value, but it's
> extremely discouraged). They would like to have officially
> supported methods like `result(timeout=None)` which make simple things
> (like quick scripts to fetch a bunch of URLs) simpler. Twisted is
> generally used for server applications where such code is out of
> question (in an async model, that is).

Actually I think the main reason is historic: Twisted introduced
callback-based asynchronous (thread-less) programming when there was
no alternative in Python, and they invented both the mechanisms and
the terminology as they were figuring it all out. That is no mean
feat. But with PEP 342 (generator-based coroutines) and especially PEP
380 (yield from) there *is* an alternative, and while Twisted has
added APIs to support generators, it hasn't started to deprecate its
other APIs, and its terminology becomes hard to follow for people
(like me, frankly) who first learned this stuff through PEP 342.

-- 
--Guido van Rossum (python.org/~guido)


From solipsis at pitrou.net  Sun Sep 12 18:17:51 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 12 Sep 2010 18:17:51 +0200
Subject: [Python-ideas] [Python-Dev] Python needs a standard
	asynchronous return object
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
	<AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
	<20100912130338.714643f8@pitrou.net>
	<AANLkTimN=Hb3jWDCtKt5PiE_CGJgiMXZyewX7QcxOCc+@mail.gmail.com>
Message-ID: <20100912181751.2aa5bb32@pitrou.net>

On Sun, 12 Sep 2010 08:49:56 -0700
Guido van Rossum <guido at python.org> wrote:
> 
> Sure, but the tricky thing is to make it pluggable so that PEP 3148
> and Twisted and other frameworks can use it all together, and a single
> call will accept a mixture of Futures.

Having a common abstraction (Future or Deferred) allows for
scheduling-agnostic libraries which consume and/or produce these
abstractions (*). I'm not sure it is desireable to mix scheduling models
in a single process (let alone a single thread), though.

(*) Of course, the abstraction is somehow leaky since being called from
different threads, depending on the scheduling model, could have adverse
consequences

> ISTM that the main difference is that add_done_callback() isn't meant
> for callbacks that return a value. So then the exceptions that might
> be raised are kind of "out of band".

It implies that it's mostly useful for simple callbacks (which would
e.g. print out a success report, or set an Event to wake up another
thread). The Twisted model allows the major part of processing to occur
in the callbacks themselves, in which case proper error handling and
propagation is mandatory.

Regards

Antoine.




From guido at python.org  Sun Sep 12 18:48:20 2010
From: guido at python.org (Guido van Rossum)
Date: Sun, 12 Sep 2010 09:48:20 -0700
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <20100912181751.2aa5bb32@pitrou.net>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
	<AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
	<20100912130338.714643f8@pitrou.net>
	<AANLkTimN=Hb3jWDCtKt5PiE_CGJgiMXZyewX7QcxOCc+@mail.gmail.com>
	<20100912181751.2aa5bb32@pitrou.net>
Message-ID: <AANLkTinc-DWYvoQi0XoX9APSn9jCbGi8QN1Zbb7sipb8@mail.gmail.com>

On Sun, Sep 12, 2010 at 9:17 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 12 Sep 2010 08:49:56 -0700
> Guido van Rossum <guido at python.org> wrote:
>>
>> Sure, but the tricky thing is to make it pluggable so that PEP 3148
>> and Twisted and other frameworks can use it all together, and a single
>> call will accept a mixture of Futures.
>
> Having a common abstraction (Future or Deferred) allows for
> scheduling-agnostic libraries which consume and/or produce these
> abstractions (*). I'm not sure it is desireable to mix scheduling models
> in a single process (let alone a single thread), though.

IIRC even Twisted supports putting stuff in a thread if you really
need it. And have you looked at Go's Goroutines? They are a hybrid --
they don't map 1:1 to OS threads, but they aren't pure coroutines
either, so that if a goroutine blocks on I/O the others will still
make progress.

> (*) Of course, the abstraction is somehow leaky since being called from
> different threads, depending on the scheduling model, could have adverse
> consequences

Yeah, this is always a problem with pure async frameworks -- if one
callback or coroutine blocks by mistake, the whole world is blocked.
(So Goroutines attempt to fix this; I have no idea how successful they
are.)

>> ISTM that the main difference is that add_done_callback() isn't meant
>> for callbacks that return a value. So then the exceptions that might
>> be raised are kind of "out of band".
>
> It implies that it's mostly useful for simple callbacks (which would
> e.g. print out a success report, or set an Event to wake up another
> thread). The Twisted model allows the major part of processing to occur
> in the callbacks themselves, in which case proper error handling and
> propagation is mandatory.

A generator-based coroutines approach can do this too (just put the
work between the yields in the generator) and has all the proper
exception-propagation stuff built in since PEP 342 (PEP 380 will just
make it easier).

And a Futures-based approach can do it too -- it's not described in
PEP 3148, but you can easily design an API for wrappable Futures.

-- 
--Guido van Rossum (python.org/~guido)


From yoavglazner at gmail.com  Mon Sep 13 14:09:23 2010
From: yoavglazner at gmail.com (yoav glazner)
Date: Mon, 13 Sep 2010 14:09:23 +0200
Subject: [Python-ideas] Why not break cycles with one __del__?
Message-ID: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>

Hi!

I was thinking, why not let python gc break cycles with only one
object.__del__ ?
I don't see a problem with calling the __del__ method and then proceed
as usual (break the cycle if it wasn't already broken by __del__)

Many Thanks,

Yoav Glazner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100913/8c6809b8/attachment.html>

From jimjjewett at gmail.com  Mon Sep 13 18:16:36 2010
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 13 Sep 2010 12:16:36 -0400
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
Message-ID: <AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>

On Mon, Sep 13, 2010 at 8:09 AM, yoav glazner <yoavglazner at gmail.com> wrote:
> why not let python gc break cycles with only one
> object.__del__ ?

If you can point to the code that prevents this, please report a bug.

The last time I checked, there were proposals toeither  add a
__close__ or weaken __del__ to handle multi-__del__ cycles -- but
single-__del__ cycles were already handled OK.

-jJ


From solipsis at pitrou.net  Mon Sep 13 19:05:49 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 13 Sep 2010 19:05:49 +0200
Subject: [Python-ideas] Why not break cycles with one __del__?
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
Message-ID: <20100913190549.15f218ce@pitrou.net>

On Mon, 13 Sep 2010 12:16:36 -0400
Jim Jewett <jimjjewett at gmail.com> wrote:
> 
> The last time I checked, there were proposals toeither  add a
> __close__ or weaken __del__ to handle multi-__del__ cycles -- but
> single-__del__ cycles were already handled OK.

They aren't:

>>> class C(list):
...   def __del__(self): pass
... 
>>> c = C()
>>> c.append(c)
>>> del c
>>> import gc
>>> gc.collect()
1
>>> gc.garbage
[[[...]]]
>>> type(gc.garbage[0])
<class '__main__.C'>





From tim.peters at gmail.com  Mon Sep 13 19:25:54 2010
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 13 Sep 2010 13:25:54 -0400
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <20100913190549.15f218ce@pitrou.net>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
Message-ID: <AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>

[Jim Jewett]
>> The last time I checked ...
>> single-__del__ cycles were already handled OK.

[Antoine Pitrou]
> They aren't: ...

Antoine's right, unless things have changed dramatically since last
time I was intimate with that code.  CPython's "cyclic garbage
detection" makes no attempt to analyze cycle structure.  It infers
that all trash it sees must be in cycles simply because the trash
hasn't already been collected by the regular refcount-based gc.  The
presence of __del__ on a trash object then disqualifies it from
further analysis, but there's no analysis of cycle structure
regardless.

Of course it doesn't _have_ to be that way.  Nobody cared enough yet
to add a pile of new code to special-case cycles with a single
__del__.


From benjamin at python.org  Mon Sep 13 21:22:02 2010
From: benjamin at python.org (Benjamin)
Date: Mon, 13 Sep 2010 19:22:02 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_not_break_cycles_with_one_=5F=5Fdel?= =?utf-8?b?X18/?=
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
Message-ID: <loom.20100913T212138-458@post.gmane.org>

Tim Peters <tim.peters at ...> writes:
> Of course it doesn't _have_ to be that way.  Nobody cared enough yet
> to add a pile of new code to special-case cycles with a single
> __del__.

And hopefully no one will. That would be very brittle. 






From solipsis at pitrou.net  Mon Sep 13 22:28:08 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 13 Sep 2010 22:28:08 +0200
Subject: [Python-ideas] Why not break cycles with one __del__?
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<loom.20100913T212138-458@post.gmane.org>
Message-ID: <20100913222808.2459784a@pitrou.net>

On Mon, 13 Sep 2010 19:22:02 +0000 (UTC)
Benjamin <benjamin at python.org> wrote:
> Tim Peters <tim.peters at ...> writes:
> > Of course it doesn't _have_ to be that way.  Nobody cared enough yet
> > to add a pile of new code to special-case cycles with a single
> > __del__.
> 
> And hopefully no one will. That would be very brittle. 

Why would it be?





From fuzzyman at voidspace.org.uk  Mon Sep 13 22:36:35 2010
From: fuzzyman at voidspace.org.uk (Michael Foord)
Date: Mon, 13 Sep 2010 21:36:35 +0100
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <loom.20100913T212138-458@post.gmane.org>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<loom.20100913T212138-458@post.gmane.org>
Message-ID: <AANLkTimb8ag+WnOx_t6K_bJJJy218tdUL_-YVnQBnyHE@mail.gmail.com>

On 13 September 2010 20:22, Benjamin <benjamin at python.org> wrote:

> Tim Peters <tim.peters at ...> writes:
> > Of course it doesn't _have_ to be that way.  Nobody cared enough yet
> > to add a pile of new code to special-case cycles with a single
> > __del__.
>
> And hopefully no one will. That would be very brittle.
>
>
More brittle than what PyPy, IronPython (and presumably) jython do? (Which
is make cycles collectable by arbitrarily breaking them IIUC.)

Michael


>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
http://www.voidspace.org.uk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100913/0af6d0e0/attachment.html>

From yoavglazner at gmail.com  Mon Sep 13 22:56:09 2010
From: yoavglazner at gmail.com (yoav glazner)
Date: Mon, 13 Sep 2010 22:56:09 +0200
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <AANLkTimb8ag+WnOx_t6K_bJJJy218tdUL_-YVnQBnyHE@mail.gmail.com>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<loom.20100913T212138-458@post.gmane.org>
	<AANLkTimb8ag+WnOx_t6K_bJJJy218tdUL_-YVnQBnyHE@mail.gmail.com>
Message-ID: <AANLkTinVNyOSh9iD11+4fysUkfpTvJ_1z_6jBJ3Vr_+g@mail.gmail.com>

>
> And hopefully no one will. That would be very brittle
>>
>
Why do you hope for that? that is the "one obvious way to do it"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100913/f57d7474/attachment.html>

From benjamin at python.org  Mon Sep 13 23:31:45 2010
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 13 Sep 2010 21:31:45 +0000 (UTC)
Subject: [Python-ideas]
	=?utf-8?q?Why_not_break_cycles_with_one_=5F=5Fdel?= =?utf-8?b?X18/?=
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<loom.20100913T212138-458@post.gmane.org>
	<20100913222808.2459784a@pitrou.net>
Message-ID: <loom.20100913T233114-697@post.gmane.org>

Antoine Pitrou <solipsis at ...> writes:

> 
> On Mon, 13 Sep 2010 19:22:02 +0000 (UTC)
> Benjamin <benjamin at ...> wrote:
> > Tim Peters <tim.peters at ...> writes:
> > > Of course it doesn't _have_ to be that way.  Nobody cared enough yet
> > > to add a pile of new code to special-case cycles with a single
> > > __del__.
> > 
> > And hopefully no one will. That would be very brittle. 
> 
> Why would it be?

Because if you're cycle suddenly had more than one __del__, it would stop being
collected.






From ncoghlan at gmail.com  Mon Sep 13 23:39:00 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 14 Sep 2010 07:39:00 +1000
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
Message-ID: <AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>

On Tue, Sep 14, 2010 at 3:25 AM, Tim Peters <tim.peters at gmail.com> wrote:
> [Jim Jewett]
>>> The last time I checked ...
>>> single-__del__ cycles were already handled OK.
>
> [Antoine Pitrou]
>> They aren't: ...
>
> Antoine's right, unless things have changed dramatically since last
> time I was intimate with that code. ?CPython's "cyclic garbage
> detection" makes no attempt to analyze cycle structure. ?It infers
> that all trash it sees must be in cycles simply because the trash
> hasn't already been collected by the regular refcount-based gc. ?The
> presence of __del__ on a trash object then disqualifies it from
> further analysis, but there's no analysis of cycle structure
> regardless.

I had a skim through that code last night, and as far as I can tell it
still works that way. However, it should be noted that the cyclic GC
actually does release everything *else* in the cycle - it's solely the
objects with __del__ methods that remain alive.

There does appear to a *little* bit of structural analysis going on -
it looks like the "finalizers" list ends up containing both objects
with __del__ methods, as well as all other objects in the cyclic trash
that are reachable from the objects with __del__ methods.

> Of course it doesn't _have_ to be that way. ?Nobody cared enough yet
> to add a pile of new code to special-case cycles with a single
> __del__.

Just from skimming the code, I wonder if, once finalizers has been
figured out, the GC could further partition that list into "to_delete"
(no __del__ method), "to_finalize" (__del__ method, but all referrers
in cycle have no __del__ method) and "uncollectable" (multiple __del__
methods in cycle). Alternatively, when building finalizers, build two
lists: one for objects with __del__ methods and one for objects that
are reachable from objects with __del__ methods. Objects that appear
only in the first list could safely have their finalisers invoked,
while those that also in the latter could not.

This is definitely a case of "code talks" though - there's no
fundamental problem with the idea, but also no great incentive for
anyone to code it when __del__ is comparatively easy to avoid
(although not trivial, see Raymond's recent modifications to
OrderedDictionary to avoid exactly this issue).

Or, accept that __del__ is evil, and try to come up with a workable
proposal for that better weakref callback based scheme Jim mentioned.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From greg.ewing at canterbury.ac.nz  Tue Sep 14 04:44:25 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Sep 2010 14:44:25 +1200
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>
Message-ID: <4C8EE189.40408@canterbury.ac.nz>

Nick Coghlan wrote:
> Alternatively, when building finalizers, build two
> lists: one for objects with __del__ methods and one for objects that
> are reachable from objects with __del__ methods.

But since it's a cycle, isn't *everything* in the cycle
going to be reachable from everything else?

-- 
Greg


From tim.peters at gmail.com  Tue Sep 14 05:04:08 2010
From: tim.peters at gmail.com (Tim Peters)
Date: Mon, 13 Sep 2010 23:04:08 -0400
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <4C8EE189.40408@canterbury.ac.nz>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>
	<4C8EE189.40408@canterbury.ac.nz>
Message-ID: <AANLkTimJ_BmVwqoDNjmzRihZ3NE-p6pqoGjZDduUmLpp@mail.gmail.com>

[Nick Coghlan]
>> Alternatively, when building finalizers, build two
>> lists: one for objects with __del__ methods and one for objects that
>> are reachable from objects with __del__ methods.

[Greg Ewing]
> But since it's a cycle, isn't *everything* in the cycle
> going to be reachable from everything else?

Note that I was sloppy in saying that CPython's cyclic gc only sees
trash objects in cycles.  More accurately, it sees trash objects in
cycles, and objects (which may or may not be in cycles) reachable only
from trash objects in cycles.  For example, if objects A and B point
to each other, that's a cycle.  If A also happens to point to D, where
D has a __del__ method, and nothing else points to D, then that's a
case where D is not in a cycle, but is nevertheless trash if A and B
are trash.  And if A and B lack finalizers, then CPython's cyclic gc
will reclaim D, despite that it does have a __del__.

That pattern is exploitable too.  If, e.g., you have some resource R
that needs to be cleaned up, owned by an object A that may participate
in cycles, it's often possible to put R in a different, very simple
object with a __del__ method, and have A point to that latter object
instead.


From guido at python.org  Tue Sep 14 05:07:10 2010
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Sep 2010 20:07:10 -0700
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <AANLkTimJ_BmVwqoDNjmzRihZ3NE-p6pqoGjZDduUmLpp@mail.gmail.com>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>
	<4C8EE189.40408@canterbury.ac.nz>
	<AANLkTimJ_BmVwqoDNjmzRihZ3NE-p6pqoGjZDduUmLpp@mail.gmail.com>
Message-ID: <AANLkTi=07LWmOBc14wNGmANurKQ2qPZQ=T7zmkoDu60+@mail.gmail.com>

On Mon, Sep 13, 2010 at 8:04 PM, Tim Peters <tim.peters at gmail.com> wrote:
> [Nick Coghlan]
>>> Alternatively, when building finalizers, build two
>>> lists: one for objects with __del__ methods and one for objects that
>>> are reachable from objects with __del__ methods.
>
> [Greg Ewing]
>> But since it's a cycle, isn't *everything* in the cycle
>> going to be reachable from everything else?
>
> Note that I was sloppy in saying that CPython's cyclic gc only sees
> trash objects in cycles. ?More accurately, it sees trash objects in
> cycles, and objects (which may or may not be in cycles) reachable only
> from trash objects in cycles. ?For example, if objects A and B point
> to each other, that's a cycle. ?If A also happens to point to D, where
> D has a __del__ method, and nothing else points to D, then that's a
> case where D is not in a cycle, but is nevertheless trash if A and B
> are trash. ?And if A and B lack finalizers, then CPython's cyclic gc
> will reclaim D, despite that it does have a __del__.
>
> That pattern is exploitable too. ?If, e.g., you have some resource R
> that needs to be cleaned up, owned by an object A that may participate
> in cycles, it's often possible to put R in a different, very simple
> object with a __del__ method, and have A point to that latter object
> instead.

Yeah, I think we even recommended this pattern at some point. ISTR we
designed the new io library to exploit it.

-- 
--Guido van Rossum (python.org/~guido)


From greg.ewing at canterbury.ac.nz  Tue Sep 14 06:16:37 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 14 Sep 2010 16:16:37 +1200
Subject: [Python-ideas] Using * in indexes
Message-ID: <4C8EF725.3050807@canterbury.ac.nz>

I just found myself writing a method like this:

   def __getitem__(self, index):
     return self.data[(Ellipsis,) + index + (slice(),)]

I would have liked to write it like this:

    self.data[..., index, :]

because that would make it much easier to see what's
being done. However, that won't work if index is itself
a tuple of index elements.

So I'd like to be able to do this:

    self.data[..., *index, :]

-- 
Greg


From scott+python-ideas at scottdial.com  Tue Sep 14 07:12:37 2010
From: scott+python-ideas at scottdial.com (Scott Dial)
Date: Tue, 14 Sep 2010 01:12:37 -0400
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <AANLkTi=07LWmOBc14wNGmANurKQ2qPZQ=T7zmkoDu60+@mail.gmail.com>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>	<20100913190549.15f218ce@pitrou.net>	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>	<AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>	<4C8EE189.40408@canterbury.ac.nz>	<AANLkTimJ_BmVwqoDNjmzRihZ3NE-p6pqoGjZDduUmLpp@mail.gmail.com>
	<AANLkTi=07LWmOBc14wNGmANurKQ2qPZQ=T7zmkoDu60+@mail.gmail.com>
Message-ID: <4C8F0445.2000905@scottdial.com>

On 9/13/2010 11:07 PM, Guido van Rossum wrote:
> On Mon, Sep 13, 2010 at 8:04 PM, Tim Peters <tim.peters at gmail.com> wrote:
>> [Nick Coghlan]
>>>> Alternatively, when building finalizers, build two
>>>> lists: one for objects with __del__ methods and one for objects that
>>>> are reachable from objects with __del__ methods.
>>
>> [Greg Ewing]
>>> But since it's a cycle, isn't *everything* in the cycle
>>> going to be reachable from everything else?
>>
>> That pattern is exploitable too.  If, e.g., you have some resource R
>> that needs to be cleaned up, owned by an object A that may participate
>> in cycles, it's often possible to put R in a different, very simple
>> object with a __del__ method, and have A point to that latter object
>> instead.
> 
> Yeah, I think we even recommended this pattern at some point. ISTR we
> designed the new io library to exploit it.
> 

Yes, this topic came up some while back on this list and Tim's solution
is exactly the design pattern I suggested then:

http://mail.python.org/pipermail/python-ideas/2009-October/006222.html

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu


From ncoghlan at gmail.com  Tue Sep 14 11:51:19 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 14 Sep 2010 19:51:19 +1000
Subject: [Python-ideas] Why not break cycles with one __del__?
In-Reply-To: <4C8EE189.40408@canterbury.ac.nz>
References: <AANLkTikfUX4pOqL-kr54ua68MObzuhOFXN9c5GK1AmN8@mail.gmail.com>
	<AANLkTiky2W6k7sxsjFA4-jqS5r9w0rk3dfVmn2oF6Gdt@mail.gmail.com>
	<20100913190549.15f218ce@pitrou.net>
	<AANLkTikALR0FSkL8jzYnh2atPEDz=63P8V3oaKQNh3NC@mail.gmail.com>
	<AANLkTinVvSKxKUNgyZE5_PyZiHT55FPvj7omB9O=bmnD@mail.gmail.com>
	<4C8EE189.40408@canterbury.ac.nz>
Message-ID: <AANLkTin3pwShxd0f29ni-aMf6cPgftApPUqRg7qQ3NET@mail.gmail.com>

On Tue, Sep 14, 2010 at 12:44 PM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>>
>> Alternatively, when building finalizers, build two
>> lists: one for objects with __del__ methods and one for objects that
>> are reachable from objects with __del__ methods.
>
> But since it's a cycle, isn't *everything* in the cycle
> going to be reachable from everything else?

In addition to what Tim said, there may be more than one cycle being
collected. So you can have situations like objects, A, B C in one
cycle and D, E, F in a different cycle. Suppose A, B and D all have
__del__ methods. Then your two lists would be:

__del__ method: A, B, D
Reachable from objects with __del__ method: A, B, C, E, F

It's just another way of viewing what the OP described: cycles
containing only a single object with __del__ don't actually have an
ordering problem, so you can just call it before you destroy any of
the objects.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From mikegraham at gmail.com  Tue Sep 14 15:54:49 2010
From: mikegraham at gmail.com (Mike Graham)
Date: Tue, 14 Sep 2010 09:54:49 -0400
Subject: [Python-ideas] Using * in indexes
In-Reply-To: <4C8EF725.3050807@canterbury.ac.nz>
References: <4C8EF725.3050807@canterbury.ac.nz>
Message-ID: <AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>

On Tue, Sep 14, 2010 at 12:16 AM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> I just found myself writing a method like this:
>
> ?def __getitem__(self, index):
> ? ?return self.data[(Ellipsis,) + index + (slice(),)]
>
> I would have liked to write it like this:
>
> ? self.data[..., index, :]
>
> because that would make it much easier to see what's
> being done. However, that won't work if index is itself
> a tuple of index elements.
>
> So I'd like to be able to do this:
>
> ? self.data[..., *index, :]

If in indexes, why not when making other tuples?

Mike


From alexander.belopolsky at gmail.com  Tue Sep 14 16:09:05 2010
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Tue, 14 Sep 2010 10:09:05 -0400
Subject: [Python-ideas] Using * in indexes
In-Reply-To: <AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>
References: <4C8EF725.3050807@canterbury.ac.nz>
	<AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>
Message-ID: <AANLkTinw1S3q3+r+OsDx877tiEFmpMzXyee8m6aAPTvN@mail.gmail.com>

On Tue, Sep 14, 2010 at 9:54 AM, Mike Graham <mikegraham at gmail.com> wrote:
..
>> So I'd like to be able to do this:
>>
>> ? self.data[..., *index, :]
>
> If in indexes, why not when making other tuples?

I believe this and other unpacking generalizations are implemented in
issue #2292: http://bugs.python.org/issue2292


From greg.ewing at canterbury.ac.nz  Wed Sep 15 00:15:10 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 15 Sep 2010 10:15:10 +1200
Subject: [Python-ideas] Using * in indexes
In-Reply-To: <AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>
References: <4C8EF725.3050807@canterbury.ac.nz>
	<AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>
Message-ID: <4C8FF3EE.9020209@canterbury.ac.nz>

Mike Graham wrote:
> On Tue, Sep 14, 2010 at 12:16 AM, Greg Ewing
> <greg.ewing at canterbury.ac.nz> wrote:
> 
>>  self.data[..., *index, :]
> 
> If in indexes, why not when making other tuples?

It would be handy to be able to use it when making other
tuples, yes. There's a particularly strong motivation
for it in relation to indexes, though, because otherwise
you not only end up having to use ugly (foo,) constructs,
but you lose the ability to use any of the special
indexing syntax.

There's also a performance penalty if you end up having
to look up 'slice' a bunch of times.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Wed Sep 15 00:16:22 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 15 Sep 2010 10:16:22 +1200
Subject: [Python-ideas] Using * in indexes
In-Reply-To: <AANLkTinw1S3q3+r+OsDx877tiEFmpMzXyee8m6aAPTvN@mail.gmail.com>
References: <4C8EF725.3050807@canterbury.ac.nz>
	<AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>
	<AANLkTinw1S3q3+r+OsDx877tiEFmpMzXyee8m6aAPTvN@mail.gmail.com>
Message-ID: <4C8FF436.40305@canterbury.ac.nz>

Alexander Belopolsky wrote:

> I believe this and other unpacking generalizations are implemented in
> issue #2292: http://bugs.python.org/issue2292

Yes, it appears so. Did a PEP for that ever materialise,
or is everyone waiting until after the moratorium?

-- 
Greg


From tjreedy at udel.edu  Wed Sep 15 06:23:18 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 15 Sep 2010 00:23:18 -0400
Subject: [Python-ideas] Using * in indexes
In-Reply-To: <4C8FF436.40305@canterbury.ac.nz>
References: <4C8EF725.3050807@canterbury.ac.nz>	<AANLkTikLqtMGkeCreoPcUCrfdExMF7aXGs5fFGmQnX-D@mail.gmail.com>	<AANLkTinw1S3q3+r+OsDx877tiEFmpMzXyee8m6aAPTvN@mail.gmail.com>
	<4C8FF436.40305@canterbury.ac.nz>
Message-ID: <i6phno$u9q$1@dough.gmane.org>

On 9/14/2010 6:16 PM, Greg Ewing wrote:
> Alexander Belopolsky wrote:
>
>> I believe this and other unpacking generalizations are implemented in
>> issue #2292: http://bugs.python.org/issue2292
>
> Yes, it appears so. Did a PEP for that ever materialise,
> or is everyone waiting until after the moratorium?

The only PEP I know of is the one for what has been done:
http://www.python.org/dev/peps/pep-3132/ Extended Iterable Unpacking


-- 
Terry Jan Reedy



From glyph at twistedmatrix.com  Wed Sep 15 23:56:52 2010
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Wed, 15 Sep 2010 17:56:52 -0400
Subject: [Python-ideas] [Python-Dev] Python needs a standard
	asynchronous return object
In-Reply-To: <AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
	<AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
Message-ID: <9AF93392-544C-4539-98B2-19DB2563172D@twistedmatrix.com>

Thanks for the ping about this (I don't think I subscribe to python-ideas, so someone may have to moderate my post in).  Sorry for the delay in responding, but I've been kinda busy and cooking up these examples took a bit of thinking.

And thanks, James, for restarting this discussion.  I obviously find it interesting :).

I'm going to mix in some other stuff I found on the web archives, since it's easiest just to reply in one message.  I'm sorry that this response is a bit sprawling and doesn't have a single clear narrative, the thread thus far didn't seem to lend it to one.

For those of you who don't want to read my usual novel-length post, you can probably stop shortly after the end of the first block of code examples.

On Sep 11, 2010, at 10:26 PM, Guido van Rossum wrote:

>>> although he didn't say what
>>> deferreds really added beyond what futures provide, and why the
>>> "add_done_callback" method isn't adequate to provide interoperability
>>> between futures and deferreds (which would be odd, since Brian made
>>> changes to that part of PEP 3148 to help with that interoperability
>>> after discussions with Glyph).
>>> 
>>> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope
>>> for standardisation in this space though.
>>> 
>>> Cheers,
>>> Nick.
>> 
>> That was my initial reaction as well, but I'm more than open to
>> hearing from Jean Paul/Glyph and the other twisted folks on this.

> But thinking about this more I don't know that it will be easy to mix
> PEP 3148, which is solidly thread-based, with a PEP 342 style
> scheduler (whether or not the PEP 380 enhancements are applied, or
> even PEP 3152). And if we take the OP's message at face value, his
> point isn't so much that Twisted is great, but that in order to
> benefit maximally from PEP 342 there needs to be a standard way of
> using callbacks. I think that's probably true. And comparing the
> blog's examples to PEP 3148, I find Twisted's terminology rather
> confusing compared to the PEP's clean Futures API (where IMO you can
> ignore almost everything except result()).

That blog post was written to demonstrate why programs using generators are "... far easier to read and write ..." than ones using Deferreds, so it stands to reason it would choose an example where that helps :).

When you want to write systems that manage varying levels of parallelism within a single computation, generators can start to get pretty hairy and the "normal" Deferred way of doing things looks more straightforward.

Thinking in terms of asynchronicity is tricky, and generators can be a useful tool for promoting that understanding, but they only make it superficially easier.  For example:

>>> def serial():
>>>     results = set()
>>>     for x in ...:
>>>         results.add((yield do_something_async(x)))
>>>     return results

If you're writing an application whose parallelism calls for an asynchronous approach, after all, you presumably don't want to be standing around waiting for each network round trip to complete.  How do you re-write this so that there are always at least N outstanding do_something_async calls running in parallel?

You can sorta do it like this:

>>> def parallel(N):
>>>     results = set()
>>>     outstanding = []
>>>     for x in ...:
>>>         if len(outstanding) > N:
>>>            results.add((yield outstanding.pop(0)))
>>>         else:
>>>            outstanding.append(do_something_async(x))

but that will always block on one particular do_something_async, when you really want to say "let me know when any outstanding call is complete".  So I could handwave about 'yield any_completed(outstanding)'...

>>> def parallel(N):
>>>     results = set()
>>>     outstanding = set()
>>>     for x in ...:
>>>         if len(outstanding) > N:
>>>            results.add((yield any_completed(outstanding)))
>>>         else:
>>>            outstanding.add(do_something_async(x))

but that just begs the question of how you implement any_completed(), and I can't think of a way to do that with generators, without getting into the specifics of some Deferred-or-Future-like asynchronous result object.  You could implement such a function with such primitives, and here's what it looks like with Deferreds:

>>> def any_completed(setOfDeferreds):
>>>     d = Deferred()
>>>     called = []
>>>     def fireme(result, whichDeferred):
>>>         if not called:
>>>             called.append(True)
>>>             setOfDeferreds.remove(whichDeferred)
>>>             d.callback(result)
>>>         return result
>>>     for subd in setOfDeferreds:
>>>         subd.addBoth(fireme, subd)
>>>     return d

Here's how you do the top-level task in Twisted, without generators, in the truly-parallel fashion (keep in mind this combines the functionality of 'any_completed' and 'parallel', so it's a bit shorter):

>>> def parallel(N):
>>>     ds = DeferredSemaphore(N)
>>>     l = []
>>>     def release(result):
>>>         ds.release()
>>>         return result
>>>     def after(sem, it):
>>>         return do_something_async(it)
>>>     for x in ...:
>>>         l.append(ds.acquire().addCallback(after_acquire, x).addBoth(release))
>>>     return gatherResults(l).addCallback(set)

Some informal benchmarking has shown this method to be considerably faster (on the order of 1/2 to 1/3 as much CPU time) than at least our own inlineCallbacks generator-scheduling method.  Take this with the usual fist-sized grain of salt that you do any 'informal' benchmarks, but the difference is significant enough that I do try to refactor into this style in my own code, and I have seen performance benefits from doing this on more specific benchmarks.

This is all untested, and that's far too many lines of code to expect to work without testing, but hopefully it gives a pretty good impression of the differences in flavor between the different styles.

> Yeah, please do explain why Twisted has so much machinery to handle exceptions?

There are a lot of different implied questions here, so I'll answer a few of those.

Why does twisted.python.failure exist?  The answer to that is that we wanted an object that represented an exception as raised at a particular point, associated with a particular stack, that could live on without necessarily capturing all the state in that stack.  If you're going to report failures asynchronously, you don't necessarily want to hold a reference to every single thing in a potentially giant stack while you're waiting to send it to some network endpoint.  Also, in 1.5.2 we had no way of chaining exceptions, and this code is that old.  Finally, even if you can chain exceptions, it's a serious performance hit to have to re-raise and re-catch the same exception 4 or 5 times in order to translate it or handle it at many different layers of the stack, so a Failure is intended to encapsulate that state such that it can just be returned, in performance-sensitive areas.  (This is sort of a weak point though, since the performance of Failure itself is so terrible, for unrelated reasons.)

Why is twisted.python.failure such a god damned mess?  The answer to that is ... uh, sorry.  Yes, it is.  We should clean it up.  It was written a long time ago and the equivalent module now could be _much_ shorter, simpler, and less of a performance problem.  It just never seems to be the highest priority.  Maybe after we're done porting to py3 :).  My one defense here is that still a slight improvement over the stdlib 'traceback' module ;-).

Why do Deferreds have an errback chain rather than just handing you an exception object in the callback chain?  Basically, this is for the same reason that Python has exceptions instead of just making you check return codes.  We wanted it to be easy to say:

>>> d = getPage("http://...")
>>> def ok(page):
>>>     doSomething(...)
>>> d.addCallback(ok)

and know that the argument to 'ok' would always be what getPage promised (you don't need to typecheck it for exception-ness) and the default error behavior would be to simply bail out with a traceback, not to barrel through your success-path code wreaking havoc.

> ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value.


add_done_callback works fine with callbacks that return a value.  If it didn't, I'd be concerned, because then it would have the barrel-through-the-success-path flaw.  But, I assume the idiomatic asynchronous-code-using-Futures would look like this:

>>> f = some_future_thing(...)
>>> def my_callback(future):
>>>     result = future.result()
>>>     do_something(result)
>>> f.add_done_callback(my_callback)

This is one extra line of code as compared to the Twisted version, and chaining involves a bit more gymnastics (somehow creating more futures to return further up the stack, I guess, I haven't thought about it too hard), but it does allow you to handle exceptions with a simple 'except:', rather than calling some exception-handling methods, so I can see why some people would prefer it.

> Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward.

You've already addressed the main point that I really wanted to mention here, but I'd like to emphasize it.  Blocking and not-blocking are fundamentally different programming styles, and if you sometimes allow blocking on asynchronous results, that means you are effectively always programming in the blocking-and-threaded style and not getting much benefit from the code which does choose to be politely non-blocking.

I was somewhat pleased with the changes made to the Futures PEP because you could use them as an asynchronous result, and have things that implemented the Future API but raised an exception if you tried to wait on them.  That would at least allow some layer of stdlib compatibility.  If you are disciplined and careful, this would let you write async code which used a common interoperability mechanism, and if you weren't careful, it would blow up when you tried to use it the wrong way.

But - and I am guessing that this is the main thrust of this discussion - I do think that having Deferred in the standard library would be much, much better if we can do that.

> So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based.

Having something "generator-based" is, in my opinion, an abstraction inversion.  The things which you are yielding from these generators are asynchronous results.  There should be a specific type for asynchronous results which can be easily interacted with.  Generators are syntactic sugar for doing that interaction in a way which doesn't involve defining tons of little functions.  This is useful, and it makes the concept more accessible, so I don't say "just" syntactic sugar: but nevertheless, the generators need to be 'yield'ing something, and the type of thing that they're yielding is a Deferred-or-something-like-it.

I don't think that this is really two 'Future-like APIs'.  At least, they're not redundant, any more than having both socket.makefile() and socket.recv() is redundant.

If Future had a deferred() method rather than an add_done_callback() method, then it would always be very clear whether you had a synchronous-but-possibly-not-ready or a purely-asynchronous result.  Although it would be equally easy to just have a function that turned a Future into a Deferred by calling add_done_callback().  You can go from any arbitrary Future to a full-featured Deferred, but not the other way around.

> Or maybe PEP 3152.


I don't like PEP 3152 aesthetically on many levels, but I can't deny that it would do the job.  'cocall', though, really?  It would be nice if it read like an actual word, i.e. "yield to" or "invoke" or even just "call" or something.

In another message, where Guido is replying to Antoine:

>> I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (...)
> 
> Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out.  That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342.

I really have to go with Antoine on this one: people were confused about Deferreds long before PEP 342 came along :).  Given that Javascript environments have mostly adopted the Twisted terminology (oddly, Node.js doesn't, but Dojo and MochiKit both have pretty literal-minded Deferred translations), there are plenty of people who are familiar with the terminology but still get confused.

See the beginning of the message for why we're not deprecating our own APIs.

Once again, sorry for not compressing this down further!  If you got this far, you win a prize :).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100915/15ea2789/attachment.html>

From glyph at twistedmatrix.com  Thu Sep 16 00:13:23 2010
From: glyph at twistedmatrix.com (Glyph Lefkowitz)
Date: Wed, 15 Sep 2010 18:13:23 -0400
Subject: [Python-ideas] [Python-Dev] Python needs a standard
	asynchronous return object
In-Reply-To: <20100915220952.2058.14020740.divmod.xquotient.544@localhost.localdomain>
References: <4C8AB874.9010703@openvpn.net>
	<AANLkTinohL6+8JRN6UKCeRKv5-ULUb6bjFZ+_RsewFiV@mail.gmail.com>
	<AANLkTi=E696ywpwEeXtKw_fi0MZTbEdAyVhG833pRrYy@mail.gmail.com>
	<AANLkTingRm2DVRnG7Zm8sJZbTR5StNmGATGt1QmBUhUh@mail.gmail.com>
	<AANLkTin7eRBcpt1K_RC=buE5BasmTDBwE_TzHr97BAyy@mail.gmail.com>
	<9AF93392-544C-4539-98B2-19DB2563172D@twistedmatrix.com>
	<20100915220952.2058.14020740.divmod.xquotient.544@localhost.localdomain>
Message-ID: <FEDAABE8-9356-4429-B337-CEB9EA8FA9A4@twistedmatrix.com>


On Sep 15, 2010, at 6:09 PM, exarkun at twistedmatrix.com wrote:

> 
> Glyph meant this:
> 
>   def parallel(N):
>       ds = DeferredSemaphore(N)
>       l = []
>       for x in ...:
>           l.append(ds.run(do_something_async, it))
>       return gatherResults(l).addCallback(set)
> 
> Jean-Paul

I knew it should have looked shorter and sweeter.  Thanks.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100915/687db6c5/attachment.html>

From daniel at stutzbachenterprises.com  Thu Sep 16 17:35:14 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 16 Sep 2010 10:35:14 -0500
Subject: [Python-ideas] list.sort with a int or str key
Message-ID: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>

list.sort, sorted, and similar methods currently have a "key" argument that
accepts a callable.  Often, that leads to code looking like this:

mylist.sort(key=lambda x: x[1])
myotherlist.sort(key=lambda x: x.length)

I would like to propose that the "key" parameter be generalized to accept
str and int types, so the above code could be rewritten as follows:

mylist.sort(key=1)
myotherlist.sort(key='length')

I find the latter to be much more readable.  As a bonus, performance for
those cases would also improve.
--
Daniel Stutzbach <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100916/5f43c1ee/attachment.html>

From mwm-keyword-python.b4bdba at mired.org  Thu Sep 16 17:41:37 2010
From: mwm-keyword-python.b4bdba at mired.org (Mike Meyer)
Date: Thu, 16 Sep 2010 11:41:37 -0400
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <20100916114137.51f6f90e@bhuda.mired.org>

On Thu, 16 Sep 2010 10:35:14 -0500
Daniel Stutzbach <daniel at stutzbachenterprises.com> wrote:

> list.sort, sorted, and similar methods currently have a "key" argument that
> accepts a callable.  Often, that leads to code looking like this:
> 
> mylist.sort(key=lambda x: x[1])
> myotherlist.sort(key=lambda x: x.length)
>
> I would like to propose that the "key" parameter be generalized to accept
> str and int types, so the above code could be rewritten as follows:
> 
> mylist.sort(key=1)
> myotherlist.sort(key='length')

-1

I think the idiom using the operator module tools:

mylist.sort(key=itemgetter(1))
mylist.sort(key=attrgetter('length'))

is more readable than your proposal - it makes what's going on
explicit.

	<mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


From guido at python.org  Thu Sep 16 17:44:15 2010
From: guido at python.org (Guido van Rossum)
Date: Thu, 16 Sep 2010 08:44:15 -0700
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <AANLkTiniL5EkAVZu5CiKqjc4soZzMiaGo1WLBvLB9Aq=@mail.gmail.com>

On Thu, Sep 16, 2010 at 8:35 AM, Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> list.sort, sorted, and similar methods currently have a "key" argument that
> accepts a callable.? Often, that leads to code looking like this:
>
> mylist.sort(key=lambda x: x[1])
> myotherlist.sort(key=lambda x: x.length)
>
> I would like to propose that the "key" parameter be generalized to accept
> str and int types, so the above code could be rewritten as follows:
>
> mylist.sort(key=1)
> myotherlist.sort(key='length')
>
> I find the latter to be much more readable.

-1. I think this is too cryptic.

> As a bonus, performance for those cases would also improve.

Have you measured this? Remember that the key function is only called
N times while the number of comparisons (using the values returned
from the key function) is O(N log N).

-- 
--Guido van Rossum (python.org/~guido)


From robert.kern at gmail.com  Thu Sep 16 17:51:55 2010
From: robert.kern at gmail.com (Robert Kern)
Date: Thu, 16 Sep 2010 10:51:55 -0500
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <i6teet$mol$1@dough.gmane.org>

On 9/16/10 10:35 AM, Daniel Stutzbach wrote:
> list.sort, sorted, and similar methods currently have a "key" argument that
> accepts a callable.  Often, that leads to code looking like this:
>
> mylist.sort(key=lambda x: x[1])
> myotherlist.sort(key=lambda x: x.length)
>
> I would like to propose that the "key" parameter be generalized to accept str
> and int types, so the above code could be rewritten as follows:
>
> mylist.sort(key=1)
> myotherlist.sort(key='length')
>
> I find the latter to be much more readable.  As a bonus, performance for those
> cases would also improve.

I find the latter significantly less readable because they are special cases 
that I need to remember. Right now, you can achieve the performance and arguably 
better readability using operator.itemgetter() and operator.attrgetter():

   from operator import attrgetter, itemgetter

   mylist.sort(key=itemgetter(1))
   myotherlist.sort(key=attrgetter('length'))

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco



From bruce at leapyear.org  Thu Sep 16 18:05:53 2010
From: bruce at leapyear.org (Bruce Leban)
Date: Thu, 16 Sep 2010 09:05:53 -0700
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <AANLkTikTAVZUL+RKOxLHcaeFYboTdW4Bsoaaj+EicZDf@mail.gmail.com>

-1

key='length' could reasonably mean
    lambda a:a.length
or
    lambda a:a['length']

an explicit lambda or itemgetter/attrgetter is clearer.

--- Bruce
http://www.vroospeak.com
http://j.mp/gruyere-security



On Thu, Sep 16, 2010 at 8:35 AM, Daniel Stutzbach <
daniel at stutzbachenterprises.com> wrote:

> list.sort, sorted, and similar methods currently have a "key" argument that
> accepts a callable.  Often, that leads to code looking like this:
>
> mylist.sort(key=lambda x: x[1])
> myotherlist.sort(key=lambda x: x.length)
>
> I would like to propose that the "key" parameter be generalized to accept
> str and int types, so the above code could be rewritten as follows:
>
> mylist.sort(key=1)
> myotherlist.sort(key='length')
>
> I find the latter to be much more readable.  As a bonus, performance for
> those cases would also improve.
> --
> Daniel Stutzbach <http://stutzbachenterprises.com>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100916/0ac89de3/attachment.html>

From solipsis at pitrou.net  Thu Sep 16 18:11:29 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 16 Sep 2010 18:11:29 +0200
Subject: [Python-ideas] list.sort with a int or str key
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <20100916181129.5e39c6d4@pitrou.net>

On Thu, 16 Sep 2010 10:35:14 -0500
Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> list.sort, sorted, and similar methods currently have a "key" argument that
> accepts a callable.  Often, that leads to code looking like this:
> 
> mylist.sort(key=lambda x: x[1])
> myotherlist.sort(key=lambda x: x.length)
> 
> I would like to propose that the "key" parameter be generalized to accept
> str and int types, so the above code could be rewritten as follows:
> 
> mylist.sort(key=1)
> myotherlist.sort(key='length')

It is not obvious whether key='length' should use __getitem__ or
__getattr__. Your example claims attribute lookup but an indexed lookup
would be more consistent with key=1.

I'm quite skeptical towards this. Special cases make things harder to
remember, and foreign code more difficult to read.

Regards

Antoine.




From daniel at stutzbachenterprises.com  Thu Sep 16 18:12:37 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 16 Sep 2010 11:12:37 -0500
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <AANLkTimB_2bhyvH-dT4JKvv-zxN8dNSDtuLEV+UfVVMh@mail.gmail.com>

Since most everyone else finds it less readable, I withdraw the proposal.

Thanks for the feedback,
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100916/9a716b53/attachment.html>

From raymond.hettinger at gmail.com  Thu Sep 16 20:28:32 2010
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Thu, 16 Sep 2010 11:28:32 -0700
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
Message-ID: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com>


On Sep 16, 2010, at 8:35 AM, Daniel Stutzbach wrote:

> list.sort, sorted, and similar methods currently have a "key" argument that accepts a callable.  Often, that leads to code looking like this:
> 
> mylist.sort(key=lambda x: x[1])
> myotherlist.sort(key=lambda x: x.length)
> 
> I would like to propose that the "key" parameter be generalized to accept str and int types, so the above code could be rewritten as follows:
> 
> mylist.sort(key=1)
> myotherlist.sort(key='length')

-1 

The key= parameter is a protocol that is used across multiple tools min(). max(), groupby(), nmallest(), nlargest(), etc.  All of those would need to change to stay in-sync.

> I find the latter to be much more readable.

It also becomes harder to learn.

Multiple signatures (int or str or other callable) create more problems that they solve.

>   As a bonus, performance for those cases would also improve.

ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller().  Also, those three tools are already more flexible than the proposal, for example:

  attrgetter('lastname', 'firstname')          # key = lambda r: (r.lastname, r.firstname)
  itemgetter(0, 7)                                       # key = lambda r: (r[0], r[7])
  methodcaller('get_stats', 'size')            # key = lambda r: r.get_stats('size')

We've already got a way to do it, so the proposal is basically about saving a few characters in exchange for complexifying the protocol with a form of multiple dispatch.


Raymond



From tjreedy at udel.edu  Fri Sep 17 05:11:22 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 16 Sep 2010 23:11:22 -0400
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
	<5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com>
Message-ID: <i6um8r$s88$1@dough.gmane.org>

On 9/16/2010 2:28 PM, Raymond Hettinger wrote:

> The key= parameter is a protocol that is used across multiple tools min(). max(), groupby(), nmallest(), nlargest(), etc.  All of those would need to change to stay in-sync.
...

> ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller().  Also, those three tools are already more flexible than the proposal, for example:
>
>  attrgetter('lastname', 'firstname')  # key = lambda r: (r.lastname, r.firstname)
>  itemgetter(0, 7)                     # key = lambda r: (r[0], r[7])
>  methodcaller('get_stats', 'size')    # key = lambda r: r.get_stats('size')

It is easy to not know about these. I think the doc set could usefully 
use an expanded entry on *key functions* (that would be a 
cross-reference link) that includes examples like the above. Currently, 
for example, the min entry has "The optional keyword-only key argument 
specifies a one-argument ordering function like that used for 
list.sort()." but there is no link and going to list.sort only adds 
"that is used to extract a comparison key from each list element: 
key=str.lower. The default value is None." Perhaps we could expand that 
and make the existing cross-references into links.

-- 
Terry Jan Reedy



From masklinn at masklinn.net  Fri Sep 17 06:49:21 2010
From: masklinn at masklinn.net (Masklinn)
Date: Fri, 17 Sep 2010 10:19:21 +0530
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <i6um8r$s88$1@dough.gmane.org>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
	<5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com>
	<i6um8r$s88$1@dough.gmane.org>
Message-ID: <CE890293-FCFE-4986-A530-92884E00DECE@masklinn.net>

On 2010-09-17, at 08:41 , Terry Reedy wrote:
> On 9/16/2010 2:28 PM, Raymond Hettinger wrote:
>> The key= parameter is a protocol that is used across multiple tools min(). max(), groupby(), nmallest(), nlargest(), etc.  All of those would need to change to stay in-sync.
> ...
> 
>> ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller().  Also, those three tools are already more flexible than the proposal, for example:
>> 
>> attrgetter('lastname', 'firstname')  # key = lambda r: (r.lastname, r.firstname)
>> itemgetter(0, 7)                     # key = lambda r: (r[0], r[7])
>> methodcaller('get_stats', 'size')    # key = lambda r: r.get_stats('size')
> 
> It is easy to not know about these. I think the doc set could usefully use an expanded entry on *key functions* (that would be a cross-reference link) that includes examples like the above.

+1, in my experience, the operator module in general is fairly unknown and the attrgetter/itemgetter/methodcaller family criminally so.

It doesn't help that they're kind-of lost in a big bunch of text at the very bottom of the module.

From raymond.hettinger at gmail.com  Fri Sep 17 11:04:04 2010
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Fri, 17 Sep 2010 02:04:04 -0700
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <i6um8r$s88$1@dough.gmane.org>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
	<5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com>
	<i6um8r$s88$1@dough.gmane.org>
Message-ID: <98438F80-5D4F-48D1-B7E3-37E991F65ED1@gmail.com>


>> ISTM, the performance would be about the same as you already get from attrgetter(), itemgetter(), and methodcaller().  Also, those three tools are already more flexible than the proposal, for example:
>> 
>> attrgetter('lastname', 'firstname')  # key = lambda r: (r.lastname, r.firstname)
>> itemgetter(0, 7)                     # key = lambda r: (r[0], r[7])
>> methodcaller('get_stats', 'size')    # key = lambda r: r.get_stats('size')
> 
> It is easy to not know about these.

FWIW, those and other sorting related topics are covered in the sorting-howto:
http://wiki.python.org/moin/HowTo/Sorting/

We link to that from the main docs for sorted():
http://docs.python.org/library/functions.html#sorted


> I think the doc set could usefully use an expanded entry on *key functions*

That might also make a useful entry to the glossary.


Raymond


P.S.   I don't know that it applies here but one limitation of the docs
is that they can get too voluminous.  Already, it is a significant time
investment just to read the doc page on builtin functions.  You can
kill a whole afternoon just reading the docs for unittest and logging.
The gestalt of the language gets lost when the docs get too fat.
Instead, I like the howto write-ups because they bring together many 
thoughts on a single topic. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100917/e587674e/attachment.html>

From ncoghlan at gmail.com  Fri Sep 17 14:14:23 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 17 Sep 2010 22:14:23 +1000
Subject: [Python-ideas] list.sort with a int or str key
In-Reply-To: <i6um8r$s88$1@dough.gmane.org>
References: <AANLkTik27ch-Qkvzs45rjUfwkxsymhq0YQK+xBUac7Fx@mail.gmail.com>
	<5B7D2EAA-672E-4744-9D11-A9C4CA4CD7D4@gmail.com>
	<i6um8r$s88$1@dough.gmane.org>
Message-ID: <AANLkTi=HHWQi2DYqBDD+Yv74vDC8pBQ2e7E62H84-Rfm@mail.gmail.com>

On Fri, Sep 17, 2010 at 1:11 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> It is easy to not know about these. I think the doc set could usefully use
> an expanded entry on *key functions* (that would be a cross-reference link)
> that includes examples like the above. Currently, for example, the min entry
> has "The optional keyword-only key argument specifies a one-argument
> ordering function like that used for list.sort()." but there is no link and
> going to list.sort only adds "that is used to extract a comparison key from
> each list element: key=str.lower. The default value is None." Perhaps we
> could expand that and make the existing cross-references into links.

Tracker issue to capture this idea: http://bugs.python.org/issue9886

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From lie.1296 at gmail.com  Fri Sep 17 16:40:48 2010
From: lie.1296 at gmail.com (Lie Ryan)
Date: Sat, 18 Sep 2010 00:40:48 +1000
Subject: [Python-ideas] Cofunctions: It's alive! Its alive!
In-Reply-To: <AANLkTimHqO_0ZREJLiR3mH_jsPVfTtPXODcNh1F5fodT@mail.gmail.com>
References: <4C5D0759.30606@canterbury.ac.nz>	<AANLkTi=V=WXhSa2LPk6_OGhRDRW91vAGa0eKHT0+HuEu@mail.gmail.com>
	<AANLkTi=8i7pRiC4AiDCxB=B6gE42OWDuuFQjbEY6CBp4@mail.gmail.com>
	<AANLkTikP2d6i+x+=vbmcs7ey3TksoVHFJ1kmxOmApQUU@mail.gmail.com>
	<AANLkTikk595h5VOUvGzQnsqYRL+kXLC3zs=udzxTT1=z@mail.gmail.com>
	<4C60FE37.2020303@canterbury.ac.nz>
	<AANLkTimHqO_0ZREJLiR3mH_jsPVfTtPXODcNh1F5fodT@mail.gmail.com>
Message-ID: <i6vujt$pr3$1@dough.gmane.org>

On 08/11/10 01:57, Guido van Rossum wrote:
> - Would it be sufficient if codef was a decorator instead of a
> keyword? (This new keyword in particular chafes me, since we've been
> so successful at overloading 'def' for so many meanings -- functions,
> methods, class methods, static methods, properties...)

+1. I'd like to see this implemented as decorator (perhaps with special
casing by the VM if necessary), and see how this cofunction will be used
in wider practice before deciding whether the syntax sugar is necessary.

The decorator could live as a built-in function or as stdlib module
(from cofunction import cofunction), and be clearly marked as experimental.



From raymond.hettinger at gmail.com  Fri Sep 17 21:44:53 2010
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Fri, 17 Sep 2010 12:44:53 -0700
Subject: [Python-ideas] New 3.x restriction in list comprehensions
Message-ID: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>

In Python2, you can transform:

  r = []
  for x in 2, 4, 6:
       r.append(x*x+1)

into:

   r = [x*x+1 for x in 2, 4, 6]

In Python3, the first still works but the second gives a SyntaxError.
It wants the 2, 4, 6 to have parentheses.

The good parts of the change:
 + it matches what genexps do
 + that simplifies the grammar a bit (listcomps bodies and genexp bodies)
 + a listcomp can be reliably transformed to a genexp

The bad parts:
 + The restriction wasn't necessary (we could undo it)
 + It makes 2-to-3 conversion a bit harder
 + It no longer parallels other paren-free tuple constructions:
        return x, y
        yield x, y
        t = x, y
           ...
 + It particular, it no longer parallels regular for-loop syntax

The last part is the one that seems the most problematic.
If you write for-loops day in and day out with the unrestricted
syntax, you (or least me) will tend to do the wrong thing when
writing a list comprehension.  It is a bit jarring to get the SyntaxError
when the code looks correct -- it took me a bit of fiddling to figure-out
what was going on.

My question for the group is whether it would be a good
idea to drop the new restriction.


Raymond



From raymond.hettinger at gmail.com  Fri Sep 17 22:00:08 2010
From: raymond.hettinger at gmail.com (Raymond Hettinger)
Date: Fri, 17 Sep 2010 13:00:08 -0700
Subject: [Python-ideas] New 3.x restriction on number of keyword arguments
Message-ID: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>

One of the use cases for named tuples is to have them be automatically created from a SQL query or CSV header.  Sometimes (but not often), those can have a huge number of columns.  In Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields.  In Python 3.x, there is a SyntaxError when there are more than 255 fields.

The origin of the change was a hack to fit positional argument counts and keyword-only argument counts in a single oparg in the python opcode encoding.

ISTM, this is an implementation specific hack and there is no reason that other implementations would have the same restriction (unless their starting point is Python's bytecode).  

The good news is that long argument lists are uncommon.  They probably only arise in cases with dynamically created functions and classes.  Most people are unaffected.

The bad news is that an implementation detail has become visible and added a language restriction.  The 255 limit seems weird to me in a version of Python that has gone to lengths to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users.

Is there any support here for trying to get smarter about the keyword-only argument implementation?  The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram.  If the new restriction isn't necessary, it would be great to remove it.


Raymond 

From matthew.russell at ovi.com  Fri Sep 17 22:03:34 2010
From: matthew.russell at ovi.com (Matthew Russell)
Date: Fri, 17 Sep 2010 21:03:34 +0100
Subject: [Python-ideas] New 3.x restriction in list comprehensions
In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
Message-ID: <1284753814.365.142.camel@stone>

Personally, I tend to always add parens to tuple expressions since
it removes any and all ambiguity about when they're required or ney.

I'd actually prefer it if parens were always required, but can
appreciate that might/would offend those who prefer otherwise.

>>> for (a, b) in d.items():
...      process(a, b)

>>> def items(t):
...    return (a, b)

Always using parens means that when refactoring one can avoid the
extra mental step of 'are the parens required in use with python feature
<F>>'

Additionally, in some language features, the use of parens has become
required to squash warts:

>>> try:
...    a = b[k]
>>> except (KeyError, IndexError), no_item:
...    a = handle(no_item)


Regards,
Matt

On Fri, 2010-09-17 at 12:44 -0700, Raymond Hettinger wrote: 
> In Python2, you can transform:
>   r = []
>   for x in 2, 4, 6:
>        r.append(x*x+1)
> 
> into:
> 
>    r = [x*x+1 for x in 2, 4, 6]
> 
> In Python3, the first still works but the second gives a SyntaxError.
> It wants the 2, 4, 6 to have parentheses.
> 
> The good parts of the change:
>  + it matches what genexps do
>  + that simplifies the grammar a bit (listcomps bodies and genexp bodies)
>  + a listcomp can be reliably transformed to a genexp
> 
> The bad parts:
>  + The restriction wasn't necessary (we could undo it)
>  + It makes 2-to-3 conversion a bit harder
>  + It no longer parallels other paren-free tuple constructions:
>         return x, y
>         yield x, y
>         t = x, y
>            ...
>  + It particular, it no longer parallels regular for-loop syntax
> 
> The last part is the one that seems the most problematic.
> If you write for-loops day in and day out with the unrestricted
> syntax, you (or least me) will tend to do the wrong thing when
> writing a list comprehension.  It is a bit jarring to get the SyntaxError
> when the code looks correct -- it took me a bit of fiddling to figure-out
> what was going on.
> 
> My question for the group is whether it would be a good
> idea to drop the new restriction.
> 
> 
> Raymond
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


--------------------------------------------------------------
Ovi Mail: Making email access easy
http://mail.ovi.com



From python at mrabarnett.plus.com  Fri Sep 17 22:23:49 2010
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 17 Sep 2010 21:23:49 +0100
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
Message-ID: <4C93CE55.1030308@mrabarnett.plus.com>

On 17/09/2010 21:00, Raymond Hettinger wrote:
> One of the use cases for named tuples is to have them be
> automatically created from a SQL query or CSV header.  Sometimes (but
> not often), those can have a huge number of columns.  In Python 2.x,
> it worked just fine -- we had a test for a named tuple with 5000
> fields.  In Python 3.x, there is a SyntaxError when there are more
> than 255 fields.
>
> The origin of the change was a hack to fit positional argument counts
> and keyword-only argument counts in a single oparg in the python
> opcode encoding.
>
> ISTM, this is an implementation specific hack and there is no reason
> that other implementations would have the same restriction (unless
> their starting point is Python's bytecode).
>
> The good news is that long argument lists are uncommon.  They
> probably only arise in cases with dynamically created functions and
> classes.  Most people are unaffected.
>
> The bad news is that an implementation detail has become visible and
> added a language restriction.  The 255 limit seems weird to me in a
> version of Python that has gone to lengths to unify ints and longs so
> that char/short/long boundaries stop manifesting themselves to
> users.
>
> Is there any support here for trying to get smarter about the
> keyword-only argument implementation?  The 255 limit does not seem
> unreasonably low, but then it was once thought that no one would ever
> need more that 640k of ram.  If the new restriction isn't necessary,
> it would be great to remove it.
>
Strings can be any length, lists can be any length, even the humble int
can be any length!

It does seem unPythonic to have a low limit like that.

I think that the implementation hack needs a bit of a rethink if that's
what it's causing, IMHO.


From python at mrabarnett.plus.com  Fri Sep 17 22:27:37 2010
From: python at mrabarnett.plus.com (MRAB)
Date: Fri, 17 Sep 2010 21:27:37 +0100
Subject: [Python-ideas] New 3.x restriction in list comprehensions
In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
Message-ID: <4C93CF39.5090406@mrabarnett.plus.com>

On 17/09/2010 20:44, Raymond Hettinger wrote:
> In Python2, you can transform:
>
>    r = []
>    for x in 2, 4, 6:
>         r.append(x*x+1)
>
> into:
>
>     r = [x*x+1 for x in 2, 4, 6]
>
> In Python3, the first still works but the second gives a SyntaxError.
> It wants the 2, 4, 6 to have parentheses.
>
> The good parts of the change:
>   + it matches what genexps do
>   + that simplifies the grammar a bit (listcomps bodies and genexp bodies)
>   + a listcomp can be reliably transformed to a genexp
>
> The bad parts:
>   + The restriction wasn't necessary (we could undo it)
>   + It makes 2-to-3 conversion a bit harder
>   + It no longer parallels other paren-free tuple constructions:
>          return x, y
>          yield x, y
>          t = x, y
>             ...
>   + It particular, it no longer parallels regular for-loop syntax
>
> The last part is the one that seems the most problematic.
> If you write for-loops day in and day out with the unrestricted
> syntax, you (or least me) will tend to do the wrong thing when
> writing a list comprehension.  It is a bit jarring to get the SyntaxError
> when the code looks correct -- it took me a bit of fiddling to figure-out
> what was going on.
>
> My question for the group is whether it would be a good
> idea to drop the new restriction.
>
Listcomps look more like genexps than for loops, so they should
probably have the same syntax retrictions (or lack of), IMHO.


From solipsis at pitrou.net  Fri Sep 17 23:11:46 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 17 Sep 2010 23:11:46 +0200
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
Message-ID: <20100917231146.23f0cef1@pitrou.net>

On Fri, 17 Sep 2010 13:00:08 -0700
Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
> One of the use cases for named tuples is to have them be automatically created from a SQL
> query or CSV header.  Sometimes (but not often), those can have a huge number of columns.  In 
> Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields.  In
> Python 3.x, there is a SyntaxError when there are more than 255 fields.

I don't understand your explanation. You can't pass a namedtuple using
the **kw convention:

>>> import collections
>>> T = collections.namedtuple('a', 'b c d')
>>> t = T(1,2,3)
>>> def f(**a): pass
... 
>>> f(**t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() argument after ** must be a mapping, not a


Besides, even if that worked, you are doing an intermediate conversion
to a dict, which is wasteful. Why not simply pass the namedtuple as a
regular parameter?

> The bad news is that an implementation detail has become visible and added a language
> restriction.  The 255 limit seems weird to me in a version of Python that has gone to lengths
> to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users.

Well, it sounds like a theoretical worry of no practical value to me.
The **kw notation is meant to marshal passing of actual keyword args,
which are going to be explicitly typed in either at the call site or at
the function definition site (ignoring any proxies in-between). Nobody
is going to type more than 255 keyword arguments by hand. And there's
generated code, but since it's generated they can easily find a
workaround anyway.

> If the new restriction isn't necessary, it would be great to remove it.

I assume the restriction is useful since, according to your explanation,
it improves the encoding of opcodes.

Of course, we could switch bytecode to use a standard 32-bit word
size, but someone has to propose a patch.

Regards

Antoine.




From cs at zip.com.au  Fri Sep 17 23:05:46 2010
From: cs at zip.com.au (Cameron Simpson)
Date: Sat, 18 Sep 2010 07:05:46 +1000
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <4C93CE55.1030308@mrabarnett.plus.com>
References: <4C93CE55.1030308@mrabarnett.plus.com>
Message-ID: <20100917210546.GA32088@cskk.homeip.net>

On 17Sep2010 21:23, MRAB <python at mrabarnett.plus.com> wrote:
| On 17/09/2010 21:00, Raymond Hettinger wrote:
| >One of the use cases for named tuples is to have them be
| >automatically created from a SQL query or CSV header.  Sometimes (but
| >not often), those can have a huge number of columns.  In Python 2.x,
| >it worked just fine -- we had a test for a named tuple with 5000
| >fields.  In Python 3.x, there is a SyntaxError when there are more
| >than 255 fields.
| >
| >The origin of the change was a hack to fit positional argument counts
| >and keyword-only argument counts in a single oparg in the python
| >opcode encoding.
[...]
| >Is there any support here for trying to get smarter about the
| >keyword-only argument implementation? [...]
|
| Strings can be any length, lists can be any length, even the humble int
| can be any length!
| It does seem unPythonic to have a low limit like that.

A big +10 from me. Implementation internals should not cause language
level limitations.

If there's a (entirely reasonable IMHO) desire to get
the opcode small, the count should be encoded in a compact be extendable
form. (I speak here with no idea how inflexible the opcode readers are.)

As an example, I use a personal encoding for natural numbers scheme
where values below 128 fit in one byte, 128 or more set the top bit on
leading bytes to indicate followon bytes, so values up to 16383 fit in
two bytes and so on arbitrarily. Compact and simple but unbounded.

Is something like that tractable for the Python opcodes?

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

I am returning this otherwise good typing paper to you because someone has
printed gibberish all over it and put your name at the top.
        - English Professor, Ohio University


From solipsis at pitrou.net  Fri Sep 17 23:21:33 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 17 Sep 2010 23:21:33 +0200
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
References: <4C93CE55.1030308@mrabarnett.plus.com>
	<20100917210546.GA32088@cskk.homeip.net>
Message-ID: <20100917232133.6088424a@pitrou.net>

On Sat, 18 Sep 2010 07:05:46 +1000
Cameron Simpson <cs at zip.com.au> wrote:
> 
> As an example, I use a personal encoding for natural numbers scheme
> where values below 128 fit in one byte, 128 or more set the top bit on
> leading bytes to indicate followon bytes, so values up to 16383 fit in
> two bytes and so on arbitrarily. Compact and simple but unbounded.

Well, you are proposing that we (Python core maintainers) live with
additional complication in one of the most central and critical parts of
the interpreter, just so that we satisfy some theoretical impulse for
"consistency". That doesn't sound reasonable.

(and, sure, the variable-length encoding wouldn't be very complicated;
it would still be more complicated than it needs to be, and that's
already a problem)

For the record, have you been hit by this problem, or do you even think
you might be hit by it in the near future?

Thank you

Antoine.




From tjreedy at udel.edu  Fri Sep 17 23:32:04 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 17 Sep 2010 17:32:04 -0400
Subject: [Python-ideas] New 3.x restriction in list comprehensions
In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
Message-ID: <i70mol$duj$1@dough.gmane.org>

On 9/17/2010 3:44 PM, Raymond Hettinger wrote:
> In Python2, you can transform:
>
>    r = []
>    for x in 2, 4, 6:
>         r.append(x*x+1)

   for x in 2,4,6:
     yield x*x+1

also works in 2/3.x
>
> into:
>
>     r = [x*x+1 for x in 2, 4, 6]
>
> In Python3, the first still works but the second gives a SyntaxError.
> It wants the 2, 4, 6 to have parentheses.
>
> The good parts of the change:
>   + it matches what genexps do

Is the restriction necessary for genexps? If the parser could handle
[x*x+1 for x in 2, 4, 6]
is
(x*x+1 for x in 2, 4, 6)
impossible, perhaps due to paren confusion?

>   + that simplifies the grammar a bit (listcomps bodies and genexp bodies)
>   + a listcomp can be reliably transformed to a genexp
>
> The bad parts:
>   + The restriction wasn't necessary (we could undo it)
>   + It makes 2-to-3 conversion a bit harder
>   + It no longer parallels other paren-free tuple constructions:
>          return x, y
>          yield x, y
>          t = x, y
>             ...
>   + It particular, it no longer parallels regular for-loop syntax
>
> The last part is the one that seems the most problematic.
> If you write for-loops day in and day out with the unrestricted
> syntax, you (or least me) will tend to do the wrong thing when
> writing a list comprehension.  It is a bit jarring to get the SyntaxError
> when the code looks correct -- it took me a bit of fiddling to figure-out
> what was going on.
>
> My question for the group is whether it would be a good
> idea to drop the new restriction.

3.x is in a sense more consistent than 2.x in that converting a for loop 
with a bare tuple always requires addition of parentheses rather than 
just sometimes. Never requiring parens would be even better to me if it 
did not make the implementation too messy.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Fri Sep 17 23:50:00 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 17 Sep 2010 17:50:00 -0400
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
Message-ID: <i70nq9$hte$1@dough.gmane.org>

On 9/17/2010 4:00 PM, Raymond Hettinger wrote:
> One of the use cases for named tuples is to have them be
> automatically created from a SQL query or CSV header.  Sometimes (but
> not often), those can have a huge number of columns.  In Python 2.x,
> it worked just fine -- we had a test for a named tuple with 5000
> fields.  In Python 3.x, there is a SyntaxError when there are more
> than 255 fields.

So, when the test failed due to the code change, the test was simply 
removed?

> The origin of the change was a hack to fit positional argument counts
> and keyword-only argument counts in a single oparg in the python
> opcode encoding.

I do not remember any discussion of adding such a language restriction, 
though I could have forgotten or missed it. As near as I can tell, it is 
undocumented. While there are undocumented limits to the interpreter, 
like nesting depth, this one is so low that I would consider the 
discrepancy between doc and behavior a bug.

-- 
Terry Jan Reedy



From alexander.belopolsky at gmail.com  Fri Sep 17 23:50:15 2010
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Fri, 17 Sep 2010 17:50:15 -0400
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
Message-ID: <3F05AB9C-2353-429F-8343-9777C4F2F874@gmail.com>





On Sep 17, 2010, at 4:00 PM, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
..
> 
> Is there any support here for trying to get smarter about the keyword-only argument implementation?  The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram.  If the new restriction isn't necessary, it would be great to remove 

This has been requested before, but rejected for the lack of a valid use case. See issue 1636.   I think supporting huge named tuples for the benefit of database applications is a valid use case. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100917/d76508c0/attachment.html>

From cs at zip.com.au  Fri Sep 17 23:56:55 2010
From: cs at zip.com.au (Cameron Simpson)
Date: Sat, 18 Sep 2010 07:56:55 +1000
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <20100917232133.6088424a@pitrou.net>
References: <20100917232133.6088424a@pitrou.net>
Message-ID: <20100917215655.GA7813@cskk.homeip.net>

On 17Sep2010 23:21, Antoine Pitrou <solipsis at pitrou.net> wrote:
| On Sat, 18 Sep 2010 07:05:46 +1000
| Cameron Simpson <cs at zip.com.au> wrote:
| > As an example, I use a personal encoding for natural numbers scheme
| > where values below 128 fit in one byte, 128 or more set the top bit on
| > leading bytes to indicate followon bytes, so values up to 16383 fit in
| > two bytes and so on arbitrarily. Compact and simple but unbounded.
| 
| Well, you are proposing that we (Python core maintainers) live with
| additional complication in one of the most central and critical parts of
| the interpreter, just so that we satisfy some theoretical impulse for
| "consistency". That doesn't sound reasonable. [...]
| For the record, have you been hit by this problem, or do you even think
| you might be hit by it in the near future?

Me, no. But arbitrary _syntactic_ constraints in an otherwise flexible
language grate. I was only suggesting a compactness-supporting approach,
not lobbying very hard for making the devs use it.

I'm +10 on removing the syntactic constraint, not on hacking the opcode
definitons.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Withdrawing in disgust is not the same as conceding.
        - Jon Adams <jadams at sea06f.sea06.navy.mil>


From dirkjan at ochtman.nl  Sat Sep 18 00:00:57 2010
From: dirkjan at ochtman.nl (Dirkjan Ochtman)
Date: Sat, 18 Sep 2010 00:00:57 +0200
Subject: [Python-ideas] New 3.x restriction in list comprehensions
In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
Message-ID: <AANLkTin8M+cRhou9VM4igb8UZ-9zRis4fc4x6zce_WQP@mail.gmail.com>

On Fri, Sep 17, 2010 at 21:44, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
> My question for the group is whether it would be a good
> idea to drop the new restriction.

I like the restriction and would actually advocate having it for
regular for-loops too (though that would be a big no-no, I guess).

Here's why I never use them without parenthesis, in python 2:

>>> (1 if True else 3, 4)
(1, 4)
>>> (lambda x: x * x, 6)
(<function <lambda> at 0x100475ed8>, 6)
>>> [i for i in 2, 3]
[2, 3]
>>> (i for i in 2, 3)
  File "<stdin>", line 1
    (i for i in 2, 3)
                 ^
SyntaxError: invalid syntax

And in Python 3:

>>> (1 if True else 3, 4)
(1, 4)
>>> (lambda x: x * x, 6)
(<function <lambda> at 0x7f4ef41785a0>, 6)
>>> [i for i in 2, 3]
  File "<stdin>", line 1
    [i for i in 2, 3]
                 ^
SyntaxError: invalid syntax
>>> (i for i in 2, 3)
  File "<stdin>", line 1
    (i for i in 2, 3)
                 ^
SyntaxError: invalid syntax

Cheers,

Dirkjan


From guido at python.org  Sat Sep 18 02:16:39 2010
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Sep 2010 17:16:39 -0700
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
Message-ID: <AANLkTin=67t6uyJGigcNmVbSbOFZ1XV0Spvp-yUtP9kj@mail.gmail.com>

On Fri, Sep 17, 2010 at 1:00 PM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
> One of the use cases for named tuples is to have them be automatically created from a SQL query or CSV header. ?Sometimes (but not often), those can have a huge number of columns. ?In Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. ?In Python 3.x, there is a SyntaxError when there are more than 255 fields.
>
> The origin of the change was a hack to fit positional argument counts and keyword-only argument counts in a single oparg in the python opcode encoding.
>
> ISTM, this is an implementation specific hack and there is no reason that other implementations would have the same restriction (unless their starting point is Python's bytecode).
>
> The good news is that long argument lists are uncommon. ?They probably only arise in cases with dynamically created functions and classes. ?Most people are unaffected.
>
> The bad news is that an implementation detail has become visible and added a language restriction. ?The 255 limit seems weird to me in a version of Python that has gone to lengths to unify ints and longs so that char/short/long boundaries stop manifesting themselves to users.
>
> Is there any support here for trying to get smarter about the keyword-only argument implementation? ?The 255 limit does not seem unreasonably low, but then it was once thought that no one would ever need more that 640k of ram. ?If the new restriction isn't necessary, it would be great to remove it.

+256 on removing this limit from the language.

I've come across code generators that produced quite insane-looking
code that worked perfectly fine because Python's grammar has no (or
very large) limits, and I consider this a language feature. I've also
written code where there was a good reason to use **kwds in the
function definition and another good reason to pass **kwds to the call
where the kwds passed could be huge.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Sat Sep 18 02:18:21 2010
From: guido at python.org (Guido van Rossum)
Date: Fri, 17 Sep 2010 17:18:21 -0700
Subject: [Python-ideas] New 3.x restriction in list comprehensions
In-Reply-To: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
Message-ID: <AANLkTi=GiLokRBH4XzvhYqQ8Sfz139wkSCxbZWwA0Qe=@mail.gmail.com>

On Fri, Sep 17, 2010 at 12:44 PM, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
> In Python2, you can transform:
>
> ?r = []
> ?for x in 2, 4, 6:
> ? ? ? r.append(x*x+1)
>
> into:
>
> ? r = [x*x+1 for x in 2, 4, 6]
>
> In Python3, the first still works but the second gives a SyntaxError.
> It wants the 2, 4, 6 to have parentheses.
>
> The good parts of the change:
> ?+ it matches what genexps do
> ?+ that simplifies the grammar a bit (listcomps bodies and genexp bodies)
> ?+ a listcomp can be reliably transformed to a genexp
>
> The bad parts:
> ?+ The restriction wasn't necessary (we could undo it)
> ?+ It makes 2-to-3 conversion a bit harder
> ?+ It no longer parallels other paren-free tuple constructions:
> ? ? ? ?return x, y
> ? ? ? ?yield x, y
> ? ? ? ?t = x, y
> ? ? ? ? ? ...
> ?+ It particular, it no longer parallels regular for-loop syntax
>
> The last part is the one that seems the most problematic.
> If you write for-loops day in and day out with the unrestricted
> syntax, you (or least me) will tend to do the wrong thing when
> writing a list comprehension. ?It is a bit jarring to get the SyntaxError
> when the code looks correct -- it took me a bit of fiddling to figure-out
> what was going on.
>
> My question for the group is whether it would be a good
> idea to drop the new restriction.

This was intentional. It parallels genexps and it avoids an ambiguity
(for the human reader -- I know the parser has no problem with it :-).

Please don't change this back. (It would violate the moratorium too...)

-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Sat Sep 18 09:28:42 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 18 Sep 2010 17:28:42 +1000
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <20100917231146.23f0cef1@pitrou.net>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
	<20100917231146.23f0cef1@pitrou.net>
Message-ID: <AANLkTikVWfNu00SksvpeMx_nxng_wEu0CtHK1rUsoxoA@mail.gmail.com>

On Sat, Sep 18, 2010 at 7:11 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Fri, 17 Sep 2010 13:00:08 -0700
> Raymond Hettinger
> <raymond.hettinger at gmail.com> wrote:
>> One of the use cases for named tuples is to have them be automatically created from a SQL
>> query or CSV header. ?Sometimes (but not often), those can have a huge number of columns. ?In
>> Python 2.x, it worked just fine -- we had a test for a named tuple with 5000 fields. ?In
>> Python 3.x, there is a SyntaxError when there are more than 255 fields.
>
> I don't understand your explanation. You can't pass a namedtuple using
> the **kw convention:

But you do need to *initialise* the named tuple after you create it.
If it's a big tuple, then all of those field values need to be passed
in either as positional arguments or as keyword arguments. A
restriction to 255 parameters means that named tuples with more than
255 fields become a lot less useful.

Merging the parameter count into the opcode as an optimisation when
the number of parameters is < 256 is fine. *Disallowing* parameter
counts >= 255 is not.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From ncoghlan at gmail.com  Sat Sep 18 09:39:11 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 18 Sep 2010 17:39:11 +1000
Subject: [Python-ideas] New 3.x restriction in list comprehensions
In-Reply-To: <AANLkTin8M+cRhou9VM4igb8UZ-9zRis4fc4x6zce_WQP@mail.gmail.com>
References: <1F0CB196-F980-4B3D-B2F1-1969C35FE580@gmail.com>
	<AANLkTin8M+cRhou9VM4igb8UZ-9zRis4fc4x6zce_WQP@mail.gmail.com>
Message-ID: <AANLkTik_=ZkQh+ZJ6WgiXN_atpH9s1dMyN4SN+bB8V88@mail.gmail.com>

On Sat, Sep 18, 2010 at 8:00 AM, Dirkjan Ochtman <dirkjan at ochtman.nl> wrote:
> On Fri, Sep 17, 2010 at 21:44, Raymond Hettinger
> <raymond.hettinger at gmail.com> wrote:
>> My question for the group is whether it would be a good
>> idea to drop the new restriction.
>
> I like the restriction and would actually advocate having it for
> regular for-loops too (though that would be a big no-no, I guess).

Yep, I tend to parenthesise tuples even when it isn't strictly
necessary as well. Even if the parser doesn't care, it makes it a lot
easier for human readers (including myself when I have to go back and
read that code). (I have similar objections to people that rely on
precedence ordering too heavily in complicated expressions - even if
the compiler understands them correctly, many readers won't know the
precedence table off by heart. Judicious use of parentheses turns code
those readers would otherwise have to think about into something which
is obviously correct even at a glance).

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From greg.ewing at canterbury.ac.nz  Sat Sep 18 10:29:02 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 18 Sep 2010 20:29:02 +1200
Subject: [Python-ideas] New 3.x restriction on number of
	keyword	arguments
In-Reply-To: <20100917210546.GA32088@cskk.homeip.net>
References: <4C93CE55.1030308@mrabarnett.plus.com>
	<20100917210546.GA32088@cskk.homeip.net>
Message-ID: <4C94784E.1040702@canterbury.ac.nz>

Cameron Simpson wrote:

> If there's a (entirely reasonable IMHO) desire to get
> the opcode small, the count should be encoded in a compact be extendable
> form.

I suspect it's more because it was easier to do it that
way than to track down all the places that assume a bytecode
never has more than one 16-bit operand.

-- 
Greg


From lie.1296 at gmail.com  Sat Sep 18 16:23:59 2010
From: lie.1296 at gmail.com (Lie Ryan)
Date: Sun, 19 Sep 2010 00:23:59 +1000
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
Message-ID: <i72i06$5bs$1@dough.gmane.org>

On 09/18/10 06:00, Raymond Hettinger wrote:
> The good news is that long argument lists are uncommon.  They
> probably only arise in cases with dynamically created functions and
> classes.  Most people are unaffected.

How about showing a Warning when trying to create a large namedtuple?
The Warning contains a reference to a bug issue, and should describe
that if they really, really need to have this limitation removed, then
they should ask in the bug report. Just so that we don't complicate the
code unnecessarily without a real usage.

In Python, classes are largely syntax sugar for a dictionary anyway, if
they needed such a large namedtuple, they should probably reconsider
using dictionary or list or real classes instead.



From taleinat at gmail.com  Sun Sep 19 11:08:28 2010
From: taleinat at gmail.com (Tal Einat)
Date: Sun, 19 Sep 2010 11:08:28 +0200
Subject: [Python-ideas] New 3.x restriction on number of keyword
	arguments
In-Reply-To: <i72i06$5bs$1@dough.gmane.org>
References: <589C8BF5-F11F-4E10-A7ED-6627EF625E1C@gmail.com>
	<i72i06$5bs$1@dough.gmane.org>
Message-ID: <AANLkTin4dGqYf58e9ZL3Bc5UqUnEmijD+JLiZmDiNHgU@mail.gmail.com>

Lie Ryan wrote:

> On 09/18/10 06:00, Raymond Hettinger wrote:
> > The good news is that long argument lists are uncommon.  They
> > probably only arise in cases with dynamically created functions and
> > classes.  Most people are unaffected.
>
> How about showing a Warning when trying to create a large namedtuple?
> The Warning contains a reference to a bug issue, and should describe
> that if they really, really need to have this limitation removed, then
> they should ask in the bug report. Just so that we don't complicate the
> code unnecessarily without a real usage.
>
> In Python, classes are largely syntax sugar for a dictionary anyway, if
> they needed such a large namedtuple, they should probably reconsider
> using dictionary or list or real classes instead.
>

+1 on removing the restriction, just because I find large namedtuples
useful.

I work with large tables of data and often use namedtuples for their
compactness. Python dictionaries have a large memory overhead compared to
tuples. This restriction could seriously hamper my future efforts to migrate
to Python 3.

- Tal Einat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100919/933790d6/attachment.html>

From james at openvpn.net  Mon Sep 20 23:41:35 2010
From: james at openvpn.net (James Yonan)
Date: Mon, 20 Sep 2010 15:41:35 -0600
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
Message-ID: <4C97D50F.1000908@openvpn.net>

I think that Glyph hit the nail on the head when he said that "you can 
go from any arbitrary Future to a full-featured Deferred, but not the 
other way around."

This is exactly my concern, and the reason why I think it's important 
for Python to standardize on an async result type that is sufficiently 
general that it can accommodate the different kinds of async semantics 
in common use in the Python world today.

If you don't think this is a problem, just Google for "twisted vs. 
tornado".  While the debate is sometimes passionate and rude, it points 
to the fragmentation that has occured in the Python async space due to 
the lack of direction from the standard library.  And there's a real 
cost to this fragmentation -- it's not easy to build an application that 
uses different async frameworks when there's no standardized result 
object or reactor model.

My concern is that PEP 3148 was really designed for the purpose of 
thread and process pooling, and that the Future object is designed with 
the minimum functionality required to achieve this end.  The problem is 
that the Future object starts to look like a stripped-down version of a 
Twisted Deferred.  And that begs the question of why are we 
standardizing on the special case and not the general case?

Wouldn't it be better to break this into two problems:

* Develop a full-featured standard async result type and reactor model 
to facilitate interoperability of different async libraries.  This would 
consist of a standard async result type and an abstract base class for a 
reactor model.

* Let PEP 3148 focus on the problem of thread and process pooling and 
leverage on the above async result type.

The semantics that a general async type should support include:

1. Semantics that allow you to define a callback channel for results and 
and optionally a separate channel for exceptions as well.

2. Semantics that offer the flexibility of working with async results at 
the callback level or at the generator level (having a separate channel 
for exceptions makes it easy for the generator decorator implementation 
(that facilitates "yield function_returning_async_object()") to dispatch 
exceptions into the caller).

3. Semantics that can easily be used to pass results and exceptions back 
from thread or process pools.

4. Semantics that allow for aggregate processing of parallel 
asynchronous results, such as "fire async result when all of the async 
results in an async set have fired" or "fire async result when the first 
result from an async set has fired."

Deferreds presently support all of the above.  My point here is not so 
much that Deferreds should be the standard, but that whatever standard 
is chosen, that the semantics be general enough that different async 
Python libraries/platforms can interoperate.

James

> Thanks for the ping about this (I don't think I subscribe to python-ideas, so someone may have to moderate my post in).  Sorry for the delay in responding, but I've been kinda busy and cooking up these examples took a bit of thinking.
> 
> And thanks, James, for restarting this discussion.  I obviously find it interesting :).
> 
> I'm going to mix in some other stuff I found on the web archives, since it's easiest just to reply in one message.  I'm sorry that this response is a bit sprawling and doesn't have a single clear narrative, the thread thus far didn't seem to lend it to one.
> 
> For those of you who don't want to read my usual novel-length post, you can probably stop shortly after the end of the first block of code examples.
> 
> On Sep 11, 2010, at 10:26 PM, Guido van Rossum wrote:
> 
>>>> although he didn't say what
>>>> deferreds really added beyond what futures provide, and why the
>>>> "add_done_callback" method isn't adequate to provide interoperability
>>>> between futures and deferreds (which would be odd, since Brian made
>>>> changes to that part of PEP 3148 to help with that interoperability
>>>> after discussions with Glyph).
>>>> 
>>>> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope
>>>> for standardisation in this space though.
>>>> 
>>>> Cheers,
>>>> Nick.
>>> 
>>> That was my initial reaction as well, but I'm more than open to
>>> hearing from Jean Paul/Glyph and the other twisted folks on this.
> 
>> But thinking about this more I don't know that it will be easy to mix
>> PEP 3148, which is solidly thread-based, with a PEP 342 style
>> scheduler (whether or not the PEP 380 enhancements are applied, or
>> even PEP 3152). And if we take the OP's message at face value, his
>> point isn't so much that Twisted is great, but that in order to
>> benefit maximally from PEP 342 there needs to be a standard way of
>> using callbacks. I think that's probably true. And comparing the
>> blog's examples to PEP 3148, I find Twisted's terminology rather
>> confusing compared to the PEP's clean Futures API (where IMO you can
>> ignore almost everything except result()).
> 
> That blog post was written to demonstrate why programs using generators are "... far easier to read and write ..." than ones using Deferreds, so it stands to reason it would choose an example where that helps :).
> 
> When you want to write systems that manage varying levels of parallelism within a single computation, generators can start to get pretty hairy and the "normal" Deferred way of doing things looks more straightforward.
> 
> Thinking in terms of asynchronicity is tricky, and generators can be a useful tool for promoting that understanding, but they only make it superficially easier.  For example:
> 
>>>> def serial():
>>>>     results = set()
>>>>     for x in ...:
>>>>         results.add((yield do_something_async(x)))
>>>>     return results
> 
> If you're writing an application whose parallelism calls for an asynchronous approach, after all, you presumably don't want to be standing around waiting for each network round trip to complete.  How do you re-write this so that there are always at least N outstanding do_something_async calls running in parallel?
> 
> You can sorta do it like this:
> 
>>>> def parallel(N):
>>>>     results = set()
>>>>     outstanding = []
>>>>     for x in ...:
>>>>         if len(outstanding) > N:
>>>>            results.add((yield outstanding.pop(0)))
>>>>         else:
>>>>            outstanding.append(do_something_async(x))
> 
> but that will always block on one particular do_something_async, when you really want to say "let me know when any outstanding call is complete".  So I could handwave about 'yield any_completed(outstanding)'...
> 
>>>> def parallel(N):
>>>>     results = set()
>>>>     outstanding = set()
>>>>     for x in ...:
>>>>         if len(outstanding) > N:
>>>>            results.add((yield any_completed(outstanding)))
>>>>         else:
>>>>            outstanding.add(do_something_async(x))
> 
> but that just begs the question of how you implement any_completed(), and I can't think of a way to do that with generators, without getting into the specifics of some Deferred-or-Future-like asynchronous result object.  You could implement such a function with such primitives, and here's what it looks like with Deferreds:
> 
>>>> def any_completed(setOfDeferreds):
>>>>     d = Deferred()
>>>>     called = []
>>>>     def fireme(result, whichDeferred):
>>>>         if not called:
>>>>             called.append(True)
>>>>             setOfDeferreds.remove(whichDeferred)
>>>>             d.callback(result)
>>>>         return result
>>>>     for subd in setOfDeferreds:
>>>>         subd.addBoth(fireme, subd)
>>>>     return d
> 
> Here's how you do the top-level task in Twisted, without generators, in the truly-parallel fashion (keep in mind this combines the functionality of 'any_completed' and 'parallel', so it's a bit shorter):
> 
>>>> def parallel(N):
>>>>     ds = DeferredSemaphore(N)
>>>>     l = []
>>>>     def release(result):
>>>>         ds.release()
>>>>         return result
>>>>     def after(sem, it):
>>>>         return do_something_async(it)
>>>>     for x in ...:
>>>>         l.append(ds.acquire().addCallback(after_acquire, x).addBoth(release))
>>>>     return gatherResults(l).addCallback(set)
> 
> Some informal benchmarking has shown this method to be considerably faster (on the order of 1/2 to 1/3 as much CPU time) than at least our own inlineCallbacks generator-scheduling method.  Take this with the usual fist-sized grain of salt that you do any 'informal' benchmarks, but the difference is significant enough that I do try to refactor into this style in my own code, and I have seen performance benefits from doing this on more specific benchmarks.
> 
> This is all untested, and that's far too many lines of code to expect to work without testing, but hopefully it gives a pretty good impression of the differences in flavor between the different styles.
> 
>> Yeah, please do explain why Twisted has so much machinery to handle exceptions?
> 
> There are a lot of different implied questions here, so I'll answer a few of those.
> 
> Why does twisted.python.failure exist?  The answer to that is that we wanted an object that represented an exception as raised at a particular point, associated with a particular stack, that could live on without necessarily capturing all the state in that stack.  If you're going to report failures asynchronously, you don't necessarily want to hold a reference to every single thing in a potentially giant stack while you're waiting to send it to some network endpoint.  Also, in 1.5.2 we had no way of chaining exceptions, and this code is that old.  Finally, even if you can chain exceptions, it's a serious performance hit to have to re-raise and re-catch the same exception 4 or 5 times in order to translate it or handle it at many different layers of the stack, so a Failure is intended to encapsulate that state such that it can just be returned, in performance-sensitive areas.  (This is sort of a weak point though, since the performance of Failure itself is so terrible, for u
nrelated reasons.)
> 
> Why is twisted.python.failure such a god damned mess?  The answer to that is ... uh, sorry.  Yes, it is.  We should clean it up.  It was written a long time ago and the equivalent module now could be _much_ shorter, simpler, and less of a performance problem.  It just never seems to be the highest priority.  Maybe after we're done porting to py3 :).  My one defense here is that still a slight improvement over the stdlib 'traceback' module ;-).
> 
> Why do Deferreds have an errback chain rather than just handing you an exception object in the callback chain?  Basically, this is for the same reason that Python has exceptions instead of just making you check return codes.  We wanted it to be easy to say:
> 
>>>> d = getPage("http://...")
>>>> def ok(page):
>>>>     doSomething(...)
>>>> d.addCallback(ok)
> 
> and know that the argument to 'ok' would always be what getPage promised (you don't need to typecheck it for exception-ness) and the default error behavior would be to simply bail out with a traceback, not to barrel through your success-path code wreaking havoc.
> 
>> ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value.
> 
> 
> add_done_callback works fine with callbacks that return a value.  If it didn't, I'd be concerned, because then it would have the barrel-through-the-success-path flaw.  But, I assume the idiomatic asynchronous-code-using-Futures would look like this:
> 
>>>> f = some_future_thing(...)
>>>> def my_callback(future):
>>>>     result = future.result()
>>>>     do_something(result)
>>>> f.add_done_callback(my_callback)
> 
> This is one extra line of code as compared to the Twisted version, and chaining involves a bit more gymnastics (somehow creating more futures to return further up the stack, I guess, I haven't thought about it too hard), but it does allow you to handle exceptions with a simple 'except:', rather than calling some exception-handling methods, so I can see why some people would prefer it.
> 
>> Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward.
> 
> You've already addressed the main point that I really wanted to mention here, but I'd like to emphasize it.  Blocking and not-blocking are fundamentally different programming styles, and if you sometimes allow blocking on asynchronous results, that means you are effectively always programming in the blocking-and-threaded style and not getting much benefit from the code which does choose to be politely non-blocking.
> 
> I was somewhat pleased with the changes made to the Futures PEP because you could use them as an asynchronous result, and have things that implemented the Future API but raised an exception if you tried to wait on them.  That would at least allow some layer of stdlib compatibility.  If you are disciplined and careful, this would let you write async code which used a common interoperability mechanism, and if you weren't careful, it would blow up when you tried to use it the wrong way.
> 
> But - and I am guessing that this is the main thrust of this discussion - I do think that having Deferred in the standard library would be much, much better if we can do that.
> 
>> So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based.
> 
> Having something "generator-based" is, in my opinion, an abstraction inversion.  The things which you are yielding from these generators are asynchronous results.  There should be a specific type for asynchronous results which can be easily interacted with.  Generators are syntactic sugar for doing that interaction in a way which doesn't involve defining tons of little functions.  This is useful, and it makes the concept more accessible, so I don't say "just" syntactic sugar: but nevertheless, the generators need to be 'yield'ing something, and the type of thing that they're yielding is a Deferred-or-something-like-it.
> 
> I don't think that this is really two 'Future-like APIs'.  At least, they're not redundant, any more than having both socket.makefile() and socket.recv() is redundant.
> 
> If Future had a deferred() method rather than an add_done_callback() method, then it would always be very clear whether you had a synchronous-but-possibly-not-ready or a purely-asynchronous result.  Although it would be equally easy to just have a function that turned a Future into a Deferred by calling add_done_callback().  You can go from any arbitrary Future to a full-featured Deferred, but not the other way around.
> 
>> Or maybe PEP 3152.
> 
> 
> I don't like PEP 3152 aesthetically on many levels, but I can't deny that it would do the job.  'cocall', though, really?  It would be nice if it read like an actual word, i.e. "yield to" or "invoke" or even just "call" or something.
> 
> In another message, where Guido is replying to Antoine:
> 
>>> I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (...)
>> 
>> Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out.  That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342.
> 
> I really have to go with Antoine on this one: people were confused about Deferreds long before PEP 342 came along :).  Given that Javascript environments have mostly adopted the Twisted terminology (oddly, Node.js doesn't, but Dojo and MochiKit both have pretty literal-minded Deferred translations), there are plenty of people who are familiar with the terminology but still get confused.
> 
> See the beginning of the message for why we're not deprecating our own APIs.
> 
> Once again, sorry for not compressing this down further!  If you got this far, you win a prize :).


From guido at python.org  Tue Sep 21 01:49:04 2010
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Sep 2010 16:49:04 -0700
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <4C97D50F.1000908@openvpn.net>
References: <4C97D50F.1000908@openvpn.net>
Message-ID: <AANLkTin3h9vi7u+-2Mwg4hXPXDFNZ8QfH-69-kfc9nCp@mail.gmail.com>

On Mon, Sep 20, 2010 at 2:41 PM, James Yonan <james at openvpn.net> wrote:
> I think that Glyph hit the nail on the head when he said that "you can go
> from any arbitrary Future to a full-featured Deferred, but not the other way
> around."

Where by "go from X to Y" you mean "take a program written using X and
change it to use Y", right?

> This is exactly my concern, and the reason why I think it's important for
> Python to standardize on an async result type that is sufficiently general
> that it can accommodate the different kinds of async semantics in common use
> in the Python world today.

I think I get your gist.

Unfortunately there's only a small number of people who know enough
about async semantics in order to write the PEP that is needed.

> If you don't think this is a problem, just Google for "twisted vs. tornado".
> ?While the debate is sometimes passionate and rude,

Is it ever distanced and polite? :-)

> it points to the
> fragmentation that has occured in the Python async space due to the lack of
> direction from the standard library. ?And there's a real cost to this
> fragmentation -- it's not easy to build an application that uses different
> async frameworks when there's no standardized result object or reactor
> model.

But, circularly, the lack of direction from the standard library is
that nobody has contributed an async framework to the standard library
since asyncore was added in, oh, 1999.

> My concern is that PEP 3148 was really designed for the purpose of thread
> and process pooling, and that the Future object is designed with the minimum
> functionality required to achieve this end. ?The problem is that the Future
> object starts to look like a stripped-down version of a Twisted Deferred.
> ?And that begs the question of why are we standardizing on the special case
> and not the general case?

Because we could reach agreement fairly quickly on PEP 3148. There are
some core contributors who know threads and processes inside out, and
after several rounds of comments (a lot, really) they were satisfied.

At this point it is probably best to forget about PEP 3148 if you want
to improve the async situation in the stdlib, and start thinking about
that async PEP instead.

> Wouldn't it be better to break this into two problems:
>
> * Develop a full-featured standard async result type and reactor model to
> facilitate interoperability of different async libraries. ?This would
> consist of a standard async result type and an abstract base class for a
> reactor model.

Unless you want to propose to include Twisted into the stdlib, this is
not going to be ready for inclusion into Python 3.2.

> * Let PEP 3148 focus on the problem of thread and process pooling and
> leverage on the above async result type.

But PEP 3148 *is* ready for inclusion in Python 3.2. So you've got the
ordering wrong. It doesn't make sense to hold up PEP 3148, waiting for
the perfect solution to appear. In fact, the changes that were made to
PEP 3148 at Glyph's suggestion are probably all you are going to get
regarding PEP 3148.

> The semantics that a general async type should support include:
>
> 1. Semantics that allow you to define a callback channel for results and and
> optionally a separate channel for exceptions as well.
>
> 2. Semantics that offer the flexibility of working with async results at the
> callback level or at the generator level (having a separate channel for
> exceptions makes it easy for the generator decorator implementation (that
> facilitates "yield function_returning_async_object()") to dispatch
> exceptions into the caller).
>
> 3. Semantics that can easily be used to pass results and exceptions back
> from thread or process pools.
>
> 4. Semantics that allow for aggregate processing of parallel asynchronous
> results, such as "fire async result when all of the async results in an
> async set have fired" or "fire async result when the first result from an
> async set has fired."
>
> Deferreds presently support all of the above. ?My point here is not so much
> that Deferreds should be the standard, but that whatever standard is chosen,
> that the semantics be general enough that different async Python
> libraries/platforms can interoperate.

Do you want to champion a PEP? I hope you do -- it will be a long
march but rewarding, especially if you get the Tornado folks to
participate and contribute.

-- 
--Guido van Rossum (python.org/~guido)


From andrew at bemusement.org  Tue Sep 21 07:39:11 2010
From: andrew at bemusement.org (Andrew Bennetts)
Date: Tue, 21 Sep 2010 15:39:11 +1000
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <AANLkTin3h9vi7u+-2Mwg4hXPXDFNZ8QfH-69-kfc9nCp@mail.gmail.com>
References: <4C97D50F.1000908@openvpn.net>
	<AANLkTin3h9vi7u+-2Mwg4hXPXDFNZ8QfH-69-kfc9nCp@mail.gmail.com>
Message-ID: <20100921053911.GD18831@aihal.home.puzzling.org>

Guido van Rossum wrote:
[...]
> 
> Unless you want to propose to include Twisted into the stdlib, this is
> not going to be ready for inclusion into Python 3.2.

I don't think anyone has suggested "include Twisted".  What is being suggested
is "include twisted.internet.defer, or something about as useful."

Let's consider just how hard it would be to just adding
twisted/internet/defer.py to the stdlib (possibly as 'deferred.py').  It's
already almost a standalone module, especially if pared back to just the
Deferred class and maybe one or two of the most useful helpers (e.g.
gatherResults, to take a list of Deferreds and turn them into a single Deferred
that fires when they have all fired).

The two most problematic dependencies would be:

 1) twisted.python.log, which for these purposes could be replaced with a call
    to a user-replaceable hook whenever an unhandled error occurs (similiar to
    sys.excepthook).
 2) twisted.python.failure... this one is harder.  As glyph said, it provides
    "an object that represent[s] an exception as raised at a particular point,
    associated with a particular stack".  But also, as he said, it's a mess and
    could use a clean up.  Cleaning it up or thinking of a simpler replacement
    is not insurmountable, but probably too ambitious for Python 3.2's schedule.

My point is that adding the Deferred abstraction to the stdlib is a *much*
smaller and more reasonable proposition than "include Twisted."

-Andrew.



From jnoller at gmail.com  Tue Sep 21 15:25:13 2010
From: jnoller at gmail.com (Jesse Noller)
Date: Tue, 21 Sep 2010 09:25:13 -0400
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <20100921053911.GD18831@aihal.home.puzzling.org>
References: <4C97D50F.1000908@openvpn.net>
	<AANLkTin3h9vi7u+-2Mwg4hXPXDFNZ8QfH-69-kfc9nCp@mail.gmail.com>
	<20100921053911.GD18831@aihal.home.puzzling.org>
Message-ID: <AANLkTikS1Wh+JFK7RgE=iVRLbSiEgdSntc0WQmHEZZXy@mail.gmail.com>

On Tue, Sep 21, 2010 at 1:39 AM, Andrew Bennetts <andrew at bemusement.org> wrote:
> Guido van Rossum wrote:
> [...]
>>
>> Unless you want to propose to include Twisted into the stdlib, this is
>> not going to be ready for inclusion into Python 3.2.
>
> I don't think anyone has suggested "include Twisted". ?What is being suggested
> is "include twisted.internet.defer, or something about as useful."
>
> Let's consider just how hard it would be to just adding
> twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). ?It's
> already almost a standalone module, especially if pared back to just the
> Deferred class and maybe one or two of the most useful helpers (e.g.
> gatherResults, to take a list of Deferreds and turn them into a single Deferred
> that fires when they have all fired).
>
> The two most problematic dependencies would be:
>
> ?1) twisted.python.log, which for these purposes could be replaced with a call
> ? ?to a user-replaceable hook whenever an unhandled error occurs (similiar to
> ? ?sys.excepthook).
> ?2) twisted.python.failure... this one is harder. ?As glyph said, it provides
> ? ?"an object that represent[s] an exception as raised at a particular point,
> ? ?associated with a particular stack". ?But also, as he said, it's a mess and
> ? ?could use a clean up. ?Cleaning it up or thinking of a simpler replacement
> ? ?is not insurmountable, but probably too ambitious for Python 3.2's schedule.
>
> My point is that adding the Deferred abstraction to the stdlib is a *much*
> smaller and more reasonable proposition than "include Twisted."
>
> -Andrew.

No on was seriously proposing including twisted wholesale. There has
been discussion, off and on *for years* about doing including a
stripped down deferred object; and yet no one has stepped up to *do
it*, so it might be hilariously easy, it might be a 40 line module,
but it doesn't matter if no one steps up to do the pep, and commit the
code, and commit to maintaining it.

jesse


From ncoghlan at gmail.com  Tue Sep 21 15:40:28 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 21 Sep 2010 23:40:28 +1000
Subject: [Python-ideas] [Python-Dev] Python needs a standard
 asynchronous return object
In-Reply-To: <AANLkTikS1Wh+JFK7RgE=iVRLbSiEgdSntc0WQmHEZZXy@mail.gmail.com>
References: <4C97D50F.1000908@openvpn.net>
	<AANLkTin3h9vi7u+-2Mwg4hXPXDFNZ8QfH-69-kfc9nCp@mail.gmail.com>
	<20100921053911.GD18831@aihal.home.puzzling.org>
	<AANLkTikS1Wh+JFK7RgE=iVRLbSiEgdSntc0WQmHEZZXy@mail.gmail.com>
Message-ID: <AANLkTimqevMD8xtr1m5P5-uaVsQnhXcrmQkt-mR=5VBb@mail.gmail.com>

On Tue, Sep 21, 2010 at 11:25 PM, Jesse Noller <jnoller at gmail.com> wrote:
>  There has
> been discussion, off and on *for years* about doing including a
> stripped down deferred object; and yet no one has stepped up to *do
> it*, so it might be hilariously easy, it might be a 40 line module,
> but it doesn't matter if no one steps up to do the pep, and commit the
> code, and commit to maintaining it.

Indeed. Thread and process pools had similarly been talked about for
quite some time before Brian stepped up to actually do the work of
writing and championing PEP 3148.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From michael.s.gilbert at gmail.com  Tue Sep 21 20:44:52 2010
From: michael.s.gilbert at gmail.com (Michael Gilbert)
Date: Tue, 21 Sep 2010 14:44:52 -0400
Subject: [Python-ideas] Including elementary mathematical functions in the
 python data model
Message-ID: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>

Hi,

It would be really nice if elementary mathematical operations such as
sin/cosine (via __sin__ and __cos__) were available as base parts of
the python data model [0].  This would make it easier to write new math
classes, and it would eliminate the ugliness of things like self.exp().

This would also eliminate the need for separate math and cmath
libraries since those could be built into the default float and complex
types.  Of course if those libs were removed, that would be a potential
backwards compatibility issue.

It would also help new users who just want to do math and don't know
that they need to import separate classes just for elementary math
functionality.

I think full coverage of the elementary function set would be the goal
(i.e. exp, sqrt, ln, trig, and hyperbolic functions).  This would not
include special functions since that would be overkill, and they are
already handled well by scipy and numpy.

Anyway, just a thought.

Best wishes,
Mike

[0] http://docs.python.org/reference/datamodel.html


From ncoghlan at gmail.com  Tue Sep 21 23:53:09 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 22 Sep 2010 07:53:09 +1000
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
Message-ID: <AANLkTim9+mRDHvfS+C9FFPhWP1djZ=vhZebP8Km2n91=@mail.gmail.com>

On Wed, Sep 22, 2010 at 4:44 AM, Michael Gilbert
<michael.s.gilbert at gmail.com> wrote:
> Hi,
>
> It would be really nice if elementary mathematical operations such as
> sin/cosine (via __sin__ and __cos__) were available as base parts of
> the python data model [0]. ?This would make it easier to write new math
> classes, and it would eliminate the ugliness of things like self.exp().
>
> This would also eliminate the need for separate math and cmath
> libraries since those could be built into the default float and complex
> types. ?Of course if those libs were removed, that would be a potential
> backwards compatibility issue.
>
> It would also help new users who just want to do math and don't know
> that they need to import separate classes just for elementary math
> functionality.
>
> I think full coverage of the elementary function set would be the goal
> (i.e. exp, sqrt, ln, trig, and hyperbolic functions). ?This would not
> include special functions since that would be overkill, and they are
> already handled well by scipy and numpy.

I think the basic problem here is that, by comparison to the basic
syntax-driven options, the additional functionality covered by the
math, cmath and decimal modules is much harder to implement both
correctly and efficiently. It's hard enough making good algorithms
that work on a single data type with a known representation, let alone
ones which work on arbitrary data types.

Also, needing exp, sqrt, ln, trig and hyperbolic functions is
*significantly* less common than the core mathematical options, so
telling people to do "from math import *" if they want to do a lot of
mathematical operations at the interactive prompt isn't much of a
hurdle.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From cs at zip.com.au  Thu Sep 23 00:31:36 2010
From: cs at zip.com.au (Cameron Simpson)
Date: Thu, 23 Sep 2010 08:31:36 +1000
Subject: [Python-ideas] Python needs a standard asynchronous return
	object
In-Reply-To: <4C97D50F.1000908@openvpn.net>
References: <4C97D50F.1000908@openvpn.net>
Message-ID: <20100922223136.GA23975@cskk.homeip.net>

On 20Sep2010 15:41, James Yonan <james at openvpn.net> wrote:
[...]
| * Develop a full-featured standard async result type and reactor
| model to facilitate interoperability of different async libraries.
| This would consist of a standard async result type and an abstract
| base class for a reactor model.
| 
| * Let PEP 3148 focus on the problem of thread and process pooling
| and leverage on the above async result type.
| 
| The semantics that a general async type should support include:
| 
| 1. Semantics that allow you to define a callback channel for results
| and and optionally a separate channel for exceptions as well.
| 
| 2. Semantics that offer the flexibility of working with async
| results at the callback level or at the generator level (having a
| separate channel for exceptions makes it easy for the generator
| decorator implementation (that facilitates "yield
| function_returning_async_object()") to dispatch exceptions into the
| caller).
| 
| 3. Semantics that can easily be used to pass results and exceptions
| back from thread or process pools.
[...]

Just to address this particular aspect (return types and notification),
I have my own futures-like module, where the equivalent of a Future is
called a LateFunction.

There are only 3 basic types of return in my model:

  there's a .report() method in the main (Executor equivalent) class
  that yields LateFunctions as they complete.

  A LateFunction has two basic get-the result methods. Having made a
  LateFunction:
    LF = Later.defer(func)

  You can either go:
    result = LF()
  This waits for func's ompletion and returns func's return value.
  If func raises an exception, this raises that exception.

  Or you can go:
    result, exc_info = LF.wait()
  which returns:
    result, None
  if func completed without exception and
    None, exc_info
  if an exception was raised, where exc_info is a 3-tuple as from
  sys.exc_info().

At any rate, when looking for completion you can either get
LateFunctions as they complete via .report(), or function results plain
(that may raise exceptions) or function (results xor exceptions).

This makes implementing the separate streams (results vs exceptions) models
trivial if it is desired while keeping the LateFunction interface simple
(few interface methods).

Yes, I know there's no timeout stuff in there :-(

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

By God, Mr. Chairman, at this moment I stand astonished at my own moderation!
        - Baron Robert Clive of Plassey


From tristanz at gmail.com  Thu Sep 23 06:41:19 2010
From: tristanz at gmail.com (Tristan Zajonc)
Date: Thu, 23 Sep 2010 00:41:19 -0400
Subject: [Python-ideas] Python needs a standard asynchronous return
	object
In-Reply-To: <20100922223136.GA23975@cskk.homeip.net>
References: <4C97D50F.1000908@openvpn.net>
	<20100922223136.GA23975@cskk.homeip.net>
Message-ID: <AANLkTimA4d9wTbrWD=+4ewKMMK1LPSnAcwaywgAnYKQG@mail.gmail.com>

I'm not an expert on this subject by any stretch, but have been
following the discussion with interest.

One of the more interesting ideas out of Microsoft in the last few
years is their Reactive Framework
(http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which
implements IObserver and IObservable as the dual to IEnumerator and
IEnumerable.  This makes operators on events just as composable as
operators on enumerables.  It also comes after several other attempts
to formalize a standard async programming pattern.  The ideas seam
pretty generic, since they've released a javascript version of the
approach as well.

The basic interface is very simple, consisting of a subscribe method
on IObservable and on_next, on_completed, and on_error methods for
IObserver.  The power comes from the extension methods, similar to
itertools, defined in the Observable class (http://bit.ly/acBhbP).
These methods provide a huge range of composable functionality.

For instance, using a chaining style, consider a async webclient
module that takes a bunch of urls:

responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com'])
responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body))

The filter is nonblocking and returns another observable.  The first()
blocks and returns after the first document is received.  The do calls
a method. Multiple async streams can be composed together in all sorts
of ways.  For instance,

http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com'])
https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com'])
http.zip(https).filter(lambda x, y: x.status == 200 and y.status ==
200).start(lambda x, y: slow_save(x, y))

This never blocks.  It downloads both the https and http versions of
web pages, zips them into a new observable, filters sites with both
http and https, and then saves asynchronously the remaining sites.  I
personally find this easy to reason about, and much easier than
manually specifying a callback chain.  Errors and completed events
propagate through these chains intuitively. "Marble diagrams" help
with intuition here (http://bit.ly/cl7Oad).

All you need to do is implement the observable interface and you get
all the composibility for free. Or you can just use any number of
simple methods to convert things to observables
(http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")).
 Or use decorators.  If the observable interface became standard, all
future async libraries would be composable, and their would also be a
growing collection of observabletools.

As somebody who is new to async programming, I quite quickly grasped
this reactive approach even though I was otherwise completely
unfamiliar with C#.   While it may be due to my lack of experience, I
still get confused when thinking about callback chains and error
channels.  For instance, I have no idea how to zip an async http call
and a mongodb call into a simple observable that returns a tuple when
both respond and then alerts the user.  This would be as simple as

webclient.get().zip(mongodb.get()).start(flash_completed_message)

or maybe it's more pythonic to write

obstools.start(obstools.zip(mongodb.get(), webclient.get),
flash_completed_message)

although I've never like this inside out style.

But perhaps I missed the point of this thread?

Tristan

On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson <cs at zip.com.au> wrote:
> On 20Sep2010 15:41, James Yonan <james at openvpn.net> wrote:
> [...]
> | * Develop a full-featured standard async result type and reactor
> | model to facilitate interoperability of different async libraries.
> | This would consist of a standard async result type and an abstract
> | base class for a reactor model.
> |
> | * Let PEP 3148 focus on the problem of thread and process pooling
> | and leverage on the above async result type.
> |
> | The semantics that a general async type should support include:
> |
> | 1. Semantics that allow you to define a callback channel for results
> | and and optionally a separate channel for exceptions as well.
> |
> | 2. Semantics that offer the flexibility of working with async
> | results at the callback level or at the generator level (having a
> | separate channel for exceptions makes it easy for the generator
> | decorator implementation (that facilitates "yield
> | function_returning_async_object()") to dispatch exceptions into the
> | caller).
> |
> | 3. Semantics that can easily be used to pass results and exceptions
> | back from thread or process pools.
> [...]
>
> Just to address this particular aspect (return types and notification),
> I have my own futures-like module, where the equivalent of a Future is
> called a LateFunction.
>
> There are only 3 basic types of return in my model:
>
> ?there's a .report() method in the main (Executor equivalent) class
> ?that yields LateFunctions as they complete.
>
> ?A LateFunction has two basic get-the result methods. Having made a
> ?LateFunction:
> ? ?LF = Later.defer(func)
>
> ?You can either go:
> ? ?result = LF()
> ?This waits for func's ompletion and returns func's return value.
> ?If func raises an exception, this raises that exception.
>
> ?Or you can go:
> ? ?result, exc_info = LF.wait()
> ?which returns:
> ? ?result, None
> ?if func completed without exception and
> ? ?None, exc_info
> ?if an exception was raised, where exc_info is a 3-tuple as from
> ?sys.exc_info().
>
> At any rate, when looking for completion you can either get
> LateFunctions as they complete via .report(), or function results plain
> (that may raise exceptions) or function (results xor exceptions).
>
> This makes implementing the separate streams (results vs exceptions) models
> trivial if it is desired while keeping the LateFunction interface simple
> (few interface methods).
>
> Yes, I know there's no timeout stuff in there :-(
>
> Cheers,
> --
> Cameron Simpson <cs at zip.com.au> DoD#743
> http://www.cskk.ezoshosting.com/cs/
>
> By God, Mr. Chairman, at this moment I stand astonished at my own moderation!
> ? ? ? ?- Baron Robert Clive of Plassey
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>


From tristanz at gmail.com  Thu Sep 23 07:04:34 2010
From: tristanz at gmail.com (Tristan Zajonc)
Date: Thu, 23 Sep 2010 01:04:34 -0400
Subject: [Python-ideas] Python needs a standard asynchronous return
	object
In-Reply-To: <AANLkTimA4d9wTbrWD=+4ewKMMK1LPSnAcwaywgAnYKQG@mail.gmail.com>
References: <4C97D50F.1000908@openvpn.net>
	<20100922223136.GA23975@cskk.homeip.net>
	<AANLkTimA4d9wTbrWD=+4ewKMMK1LPSnAcwaywgAnYKQG@mail.gmail.com>
Message-ID: <AANLkTi=E+w7VDS3BWB69k68YuHkXKwTmkHz69Lx5u8rc@mail.gmail.com>

I should note that it should be possible to convert the twisted,
twisted, eventlet, monocle, and other existing async libraries to
observables pretty easily.  The Javascript Rx library, for instance,
already wraps the events from dojo, extjs, google maps, jquery, google
translate, microsoft translate, mootools, prototype, raphael,
virtualearth, and yui3, and keeps adding others to enable
composability between different event driven widgets/frameworks.

Tristan

On Thu, Sep 23, 2010 at 12:41 AM, Tristan Zajonc <tristanz at gmail.com> wrote:
> I'm not an expert on this subject by any stretch, but have been
> following the discussion with interest.
>
> One of the more interesting ideas out of Microsoft in the last few
> years is their Reactive Framework
> (http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which
> implements IObserver and IObservable as the dual to IEnumerator and
> IEnumerable. ?This makes operators on events just as composable as
> operators on enumerables. ?It also comes after several other attempts
> to formalize a standard async programming pattern. ?The ideas seam
> pretty generic, since they've released a javascript version of the
> approach as well.
>
> The basic interface is very simple, consisting of a subscribe method
> on IObservable and on_next, on_completed, and on_error methods for
> IObserver. ?The power comes from the extension methods, similar to
> itertools, defined in the Observable class (http://bit.ly/acBhbP).
> These methods provide a huge range of composable functionality.
>
> For instance, using a chaining style, consider a async webclient
> module that takes a bunch of urls:
>
> responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com'])
> responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body))
>
> The filter is nonblocking and returns another observable. ?The first()
> blocks and returns after the first document is received. ?The do calls
> a method. Multiple async streams can be composed together in all sorts
> of ways. ?For instance,
>
> http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com'])
> https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com'])
> http.zip(https).filter(lambda x, y: x.status == 200 and y.status ==
> 200).start(lambda x, y: slow_save(x, y))
>
> This never blocks. ?It downloads both the https and http versions of
> web pages, zips them into a new observable, filters sites with both
> http and https, and then saves asynchronously the remaining sites. ?I
> personally find this easy to reason about, and much easier than
> manually specifying a callback chain. ?Errors and completed events
> propagate through these chains intuitively. "Marble diagrams" help
> with intuition here (http://bit.ly/cl7Oad).
>
> All you need to do is implement the observable interface and you get
> all the composibility for free. Or you can just use any number of
> simple methods to convert things to observables
> (http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")).
> ?Or use decorators. ?If the observable interface became standard, all
> future async libraries would be composable, and their would also be a
> growing collection of observabletools.
>
> As somebody who is new to async programming, I quite quickly grasped
> this reactive approach even though I was otherwise completely
> unfamiliar with C#. ? While it may be due to my lack of experience, I
> still get confused when thinking about callback chains and error
> channels. ?For instance, I have no idea how to zip an async http call
> and a mongodb call into a simple observable that returns a tuple when
> both respond and then alerts the user. ?This would be as simple as
>
> webclient.get().zip(mongodb.get()).start(flash_completed_message)
>
> or maybe it's more pythonic to write
>
> obstools.start(obstools.zip(mongodb.get(), webclient.get),
> flash_completed_message)
>
> although I've never like this inside out style.
>
> But perhaps I missed the point of this thread?
>
> Tristan
>
> On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson <cs at zip.com.au> wrote:
>> On 20Sep2010 15:41, James Yonan <james at openvpn.net> wrote:
>> [...]
>> | * Develop a full-featured standard async result type and reactor
>> | model to facilitate interoperability of different async libraries.
>> | This would consist of a standard async result type and an abstract
>> | base class for a reactor model.
>> |
>> | * Let PEP 3148 focus on the problem of thread and process pooling
>> | and leverage on the above async result type.
>> |
>> | The semantics that a general async type should support include:
>> |
>> | 1. Semantics that allow you to define a callback channel for results
>> | and and optionally a separate channel for exceptions as well.
>> |
>> | 2. Semantics that offer the flexibility of working with async
>> | results at the callback level or at the generator level (having a
>> | separate channel for exceptions makes it easy for the generator
>> | decorator implementation (that facilitates "yield
>> | function_returning_async_object()") to dispatch exceptions into the
>> | caller).
>> |
>> | 3. Semantics that can easily be used to pass results and exceptions
>> | back from thread or process pools.
>> [...]
>>
>> Just to address this particular aspect (return types and notification),
>> I have my own futures-like module, where the equivalent of a Future is
>> called a LateFunction.
>>
>> There are only 3 basic types of return in my model:
>>
>> ?there's a .report() method in the main (Executor equivalent) class
>> ?that yields LateFunctions as they complete.
>>
>> ?A LateFunction has two basic get-the result methods. Having made a
>> ?LateFunction:
>> ? ?LF = Later.defer(func)
>>
>> ?You can either go:
>> ? ?result = LF()
>> ?This waits for func's ompletion and returns func's return value.
>> ?If func raises an exception, this raises that exception.
>>
>> ?Or you can go:
>> ? ?result, exc_info = LF.wait()
>> ?which returns:
>> ? ?result, None
>> ?if func completed without exception and
>> ? ?None, exc_info
>> ?if an exception was raised, where exc_info is a 3-tuple as from
>> ?sys.exc_info().
>>
>> At any rate, when looking for completion you can either get
>> LateFunctions as they complete via .report(), or function results plain
>> (that may raise exceptions) or function (results xor exceptions).
>>
>> This makes implementing the separate streams (results vs exceptions) models
>> trivial if it is desired while keeping the LateFunction interface simple
>> (few interface methods).
>>
>> Yes, I know there's no timeout stuff in there :-(
>>
>> Cheers,
>> --
>> Cameron Simpson <cs at zip.com.au> DoD#743
>> http://www.cskk.ezoshosting.com/cs/
>>
>> By God, Mr. Chairman, at this moment I stand astonished at my own moderation!
>> ? ? ? ?- Baron Robert Clive of Plassey
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>


From ziade.tarek at gmail.com  Thu Sep 23 16:37:21 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 16:37:21 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
Message-ID: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>

Hello,

ABC __subclasshook__ implementations will only check that the method
is present in the class. That's the case for example in
collections.Container. It will check that the __contains__ method is
present but that's it. It won't check that the method has only one
argument. e.g. __contains__(self, x)

The problem is that the implemented method could have a different list
of arguments and will eventually fail.

Using inspect, we could check in __subclasshook__ that the arguments
defined are the same than the ones defined in the abstractmethod.--
the name and the ordering.

I can even think of a small function in ABC for that:
same_signature(method1, method2) => True or False

Regards
Tarek

-- 
Tarek Ziad? | http://ziade.org


From guido at python.org  Thu Sep 23 16:53:37 2010
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Sep 2010 07:53:37 -0700
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
Message-ID: <AANLkTikVadkRd9Ou1DVduxuXV7PfiP4-TX-ZGoMRxRsw@mail.gmail.com>

That is not a new idea. So far I have always rejected it because I
worry about both false positives and false negatives. Trying to
enforce that the method *behaves* as it should (or even its return
type) is hopeless; there can be a variety of reasons to modify the
argument list while still conforming to (the intent of) the interface.
I also worry that it will slow everything down.

That said, if you want to provide a standard mechanism that can
*optionally* be turned on to check argument conformance, e.g. by using
a class or method decorator on the subclass, I would be fine with that
(as long as it runs purely at class-definition time; it shouldn't slow
down class instantiation or method calls). It will probably even find
some bugs. It will also surely have to be tuned to avoid certain
classes false positives.

--Guido

On Thu, Sep 23, 2010 at 7:37 AM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> Hello,
>
> ABC __subclasshook__ implementations will only check that the method
> is present in the class. That's the case for example in
> collections.Container. It will check that the __contains__ method is
> present but that's it. It won't check that the method has only one
> argument. e.g. __contains__(self, x)
>
> The problem is that the implemented method could have a different list
> of arguments and will eventually fail.
>
> Using inspect, we could check in __subclasshook__ that the arguments
> defined are the same than the ones defined in the abstractmethod.--
> the name and the ordering.
>
> I can even think of a small function in ABC for that:
> same_signature(method1, method2) => True or False
>
> Regards
> Tarek
>
> --
> Tarek Ziad? | http://ziade.org
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
--Guido van Rossum (python.org/~guido)


From daniel at stutzbachenterprises.com  Thu Sep 23 16:54:55 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 23 Sep 2010 09:54:55 -0500
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
Message-ID: <AANLkTinn0jjpFe7n6mX_v_7UhZGb9B12Mt8ZpbMgP6-4@mail.gmail.com>

On Thu, Sep 23, 2010 at 9:37 AM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:

> The problem is that the implemented method could have a different list
> of arguments and will eventually fail.


A slightly different argument list is okay if it is more permissive.  For
example, the collections.Sequence ABC defines a count method with one
parameter.  However, the list implementation's count method takes one
mandatory parameter plus two optional parameters.  I'm not sure how easy it
would be to detect a valid but more general signature.

You might be interested in the related Issue 9731 ("Add ABCMeta.has_methods
and tests that use it").

-- 
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100923/ea38a9a7/attachment.html>

From ziade.tarek at gmail.com  Thu Sep 23 17:01:29 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 17:01:29 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTikVadkRd9Ou1DVduxuXV7PfiP4-TX-ZGoMRxRsw@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<AANLkTikVadkRd9Ou1DVduxuXV7PfiP4-TX-ZGoMRxRsw@mail.gmail.com>
Message-ID: <AANLkTi=m2egDVP+RisK3cpFzGG1CgT1OaHs+2UN26NVs@mail.gmail.com>

On Thu, Sep 23, 2010 at 4:53 PM, Guido van Rossum <guido at python.org> wrote:
> That is not a new idea. So far I have always rejected it because I
> worry about both false positives and false negatives. Trying to
> enforce that the method *behaves* as it should (or even its return
> type) is hopeless; there can be a variety of reasons to modify the
> argument list while still conforming to (the intent of) the interface.
> I also worry that it will slow everything down.

Right

>
> That said, if you want to provide a standard mechanism that can
> *optionally* be turned on to check argument conformance, e.g. by using
> a class or method decorator on the subclass, I would be fine with that
> (as long as it runs purely at class-definition time; it shouldn't slow
> down class instantiation or method calls). It will probably even find
> some bugs. It will also surely have to be tuned to avoid certain
> classes false positives.

I'll experiment on this and come back :)

Regards
Tarek


From ziade.tarek at gmail.com  Thu Sep 23 17:08:03 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 17:08:03 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTinn0jjpFe7n6mX_v_7UhZGb9B12Mt8ZpbMgP6-4@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<AANLkTinn0jjpFe7n6mX_v_7UhZGb9B12Mt8ZpbMgP6-4@mail.gmail.com>
Message-ID: <AANLkTikcN6S+FCOZJJSF7RHf8wfSbdhMSGyyaMn_nrBi@mail.gmail.com>

On Thu, Sep 23, 2010 at 4:54 PM, Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> On Thu, Sep 23, 2010 at 9:37 AM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
>>
>> The problem is that the implemented method could have a different list
>> of arguments and will eventually fail.
>
> A slightly different argument list is okay if it is more permissive. ?For
> example, the collections.Sequence ABC defines a count method with one
> parameter. ?However, the list implementation's count method takes one
> mandatory parameter plus two optional parameters. ?I'm not sure how easy it
> would be to detect a valid but more general signature.

Well, with inspect it's possible to see if the extra parameters have
defaults values, thus making calls without them still working.

> You might be interested in the related Issue 9731 ("Add ABCMeta.has_methods
> and tests that use it").

Ah... interesting.. has_methods could possibly have an option to check
for the signature

--will hack on that when I find some time--

> --
> Daniel Stutzbach, Ph.D.
> President, Stutzbach Enterprises, LLC
>



-- 
Tarek Ziad? | http://ziade.org


From solipsis at pitrou.net  Thu Sep 23 17:39:55 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 23 Sep 2010 17:39:55 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
Message-ID: <20100923173955.4fc0bb03@pitrou.net>

On Thu, 23 Sep 2010 16:37:21 +0200
Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> 
> The problem is that the implemented method could have a different list
> of arguments and will eventually fail.
> 
> Using inspect, we could check in __subclasshook__ that the arguments
> defined are the same than the ones defined in the abstractmethod.--
> the name and the ordering.

I don't think we should steer in the type checking direction.
After all, the Python philosophy of dynamicity (dynamism?) is
articulated around the idea that checking types "ahead of time" is
useless. IMO, ABCs should be used more as a convention for documenting
what capabilities a class claims to expose, than for type checking.

(also, you'll have a hard time checking methods with *args or **kwargs
parameters)

Regards

Antoine.




From ziade.tarek at gmail.com  Thu Sep 23 18:18:49 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 18:18:49 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <20100923173955.4fc0bb03@pitrou.net>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
Message-ID: <AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>

On Thu, Sep 23, 2010 at 5:39 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 23 Sep 2010 16:37:21 +0200
> Tarek Ziad? <ziade.tarek at gmail.com> wrote:
>>
>> The problem is that the implemented method could have a different list
>> of arguments and will eventually fail.
>>
>> Using inspect, we could check in __subclasshook__ that the arguments
>> defined are the same than the ones defined in the abstractmethod.--
>> the name and the ordering.
>
> I don't think we should steer in the type checking direction.
> After all, the Python philosophy of dynamicity (dynamism?) is
> articulated around the idea that checking types "ahead of time" is
> useless. IMO, ABCs should be used more as a convention for documenting
> what capabilities a class claims to expose, than for type checking.

I think it goes further than documentation at this point. ABC is
present and used in the stdlib, not the doc.
So asking a class about its capabilities is a feature we provide for
third-party code.

Also, not sure what you mean about the "ahead of time", but ABCs can
be used with issubclass() to check that an object quacks like it
should.

This is not opposed to dynamicity.


>
> (also, you'll have a hard time checking methods with *args or **kwargs
> parameters)

True, but I don't expect the ABC to define abstract methods with vague
arguments. And if it is so, there's no point checking them in that
case. So it should definitely be something optional.

Regards,
Tarek

>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
Tarek Ziad? | http://ziade.org


From solipsis at pitrou.net  Thu Sep 23 18:32:49 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 23 Sep 2010 18:32:49 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
Message-ID: <1285259569.3178.9.camel@localhost.localdomain>

Le jeudi 23 septembre 2010 ? 18:18 +0200, Tarek Ziad? a ?crit :
> >> Using inspect, we could check in __subclasshook__ that the arguments
> >> defined are the same than the ones defined in the abstractmethod.--
> >> the name and the ordering.
> >
> > I don't think we should steer in the type checking direction.
> > After all, the Python philosophy of dynamicity (dynamism?) is
> > articulated around the idea that checking types "ahead of time" is
> > useless. IMO, ABCs should be used more as a convention for documenting
> > what capabilities a class claims to expose, than for type checking.
> 
> I think it goes further than documentation at this point. ABC is
> present and used in the stdlib, not the doc.
> So asking a class about its capabilities is a feature we provide for
> third-party code.

This feature already exists, as you mention, using issubclass() or
isinstance(). What you are asking for is a different feature: check that
a class has an appropriate implementation of the advertised
capabilities. Traditionally, this is best left to unit testing (or other
forms of test-based checking).

Do you have an use case where unit testing would not be appropriate for
this?

> > (also, you'll have a hard time checking methods with *args or **kwargs
> > parameters)
> 
> True, but I don't expect the ABC to define abstract methods with vague
> arguments.

It depends on the arguments. And the implementation could definitely use
*args or **kwargs arguments, especially if it acts as a proxy.

Regards

Antoine.




From ziade.tarek at gmail.com  Thu Sep 23 19:51:35 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 19:51:35 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <1285259569.3178.9.camel@localhost.localdomain>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
Message-ID: <AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>

On Thu, Sep 23, 2010 at 6:32 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
...
> This feature already exists, as you mention, using issubclass() or
> isinstance(). What you are asking for is a different feature: check that
> a class has an appropriate implementation of the advertised
> capabilities. Traditionally, this is best left to unit testing (or other
> forms of test-based checking).
>
> Do you have an use case where unit testing would not be appropriate for
> this?

Why are you thinking about unit tests  ? Don't you ever use
issubclass/isinstance in your programs ?

Checking signatures using ABC when you create a plugin system is one
use case for instance.

>
>> > (also, you'll have a hard time checking methods with *args or **kwargs
>> > parameters)
>>
>> True, but I don't expect the ABC to define abstract methods with vague
>> arguments.
>
> It depends on the arguments. And the implementation could definitely use
> *args or **kwargs arguments, especially if it acts as a proxy.

Sure but ISTM that most of the time signatures are well defined, and
proxies lives in an upper layer.

Regards
Tarek


From solipsis at pitrou.net  Thu Sep 23 20:01:33 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 23 Sep 2010 20:01:33 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
Message-ID: <1285264893.3178.14.camel@localhost.localdomain>

Le jeudi 23 septembre 2010 ? 19:51 +0200, Tarek Ziad? a ?crit :
> On Thu, Sep 23, 2010 at 6:32 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> ...
> > This feature already exists, as you mention, using issubclass() or
> > isinstance(). What you are asking for is a different feature: check that
> > a class has an appropriate implementation of the advertised
> > capabilities. Traditionally, this is best left to unit testing (or other
> > forms of test-based checking).
> >
> > Do you have an use case where unit testing would not be appropriate for
> > this?
> 
> Why are you thinking about unit tests  ? Don't you ever use
> issubclass/isinstance in your programs ?

Sorry, you don't seem to be answering the question.
Why wouldn't the implementor of the class use unit tests to check that
his/her class implements the desired ABC?

> Checking signatures using ABC when you create a plugin system is one
> use case for instance.

Again, why do you want to check signatures? Do you not trust plugin
authors to write plugins?

Also, why do you think checking signatures is actually useful? It only
checks that the signature is right, not that the expected semantics are
observed. The argument for checking method signature in advance is as
weak as the argument for checking types at compile time.

> > It depends on the arguments. And the implementation could definitely use
> > *args or **kwargs arguments, especially if it acts as a proxy.
> 
> Sure but ISTM that most of the time signatures are well defined, and
> proxies lives in an upper layer.

Not really. If I write a file object wrapper that proxies some methods
to an other file object, I don't want to re-type all method signatures
(including default args) by hand.

Regards

Antoine.




From tjreedy at udel.edu  Thu Sep 23 20:39:01 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 23 Sep 2010 14:39:01 -0400
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>	<20100923173955.4fc0bb03@pitrou.net>	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
Message-ID: <i7g6s6$nml$1@dough.gmane.org>

If I were writing a class intended to implement an particular ABC, I 
would be happy to have an automated check function that might catch 
errors. 100% testing is hard to achieve.

-- 
Terry Jan Reedy



From solipsis at pitrou.net  Thu Sep 23 20:52:24 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 23 Sep 2010 20:52:24 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<i7g6s6$nml$1@dough.gmane.org>
Message-ID: <20100923205224.3fc27060@pitrou.net>

On Thu, 23 Sep 2010 14:39:01 -0400
Terry Reedy <tjreedy at udel.edu> wrote:
> If I were writing a class intended to implement an particular ABC, I 
> would be happy to have an automated check function that might catch 
> errors. 100% testing is hard to achieve.

How would an automatic check function solve anything, if you don't test
that the class does what is expected?

Again, this is exactly the argument for compile-time type checking, and
it is routinely pointed out that it is mostly useless.





From ziade.tarek at gmail.com  Thu Sep 23 20:59:07 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 20:59:07 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <1285264893.3178.14.camel@localhost.localdomain>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<1285264893.3178.14.camel@localhost.localdomain>
Message-ID: <AANLkTinW0zNMYRpwr64xY7hmKGboBpSk66eocHF9jisX@mail.gmail.com>

On Thu, Sep 23, 2010 at 8:01 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le jeudi 23 septembre 2010 ? 19:51 +0200, Tarek Ziad? a ?crit :
>> On Thu, Sep 23, 2010 at 6:32 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> ...
>> > This feature already exists, as you mention, using issubclass() or
>> > isinstance(). What you are asking for is a different feature: check that
>> > a class has an appropriate implementation of the advertised
>> > capabilities. Traditionally, this is best left to unit testing (or other
>> > forms of test-based checking).
>> >
>> > Do you have an use case where unit testing would not be appropriate for
>> > this?
>>
>> Why are you thinking about unit tests ?? Don't you ever use
>> issubclass/isinstance in your programs ?
>
> Sorry, you don't seem to be answering the question.
> Why wouldn't the implementor of the class use unit tests to check that
> his/her class implements the desired ABC?

That's fine indeed. Now, why wouldn't the implementor of an
application use ABC to check that the third party class he's about to
load in his app implements the desired ABC?


>
>> Checking signatures using ABC when you create a plugin system is one
>> use case for instance.
>
> Again, why do you want to check signatures? Do you not trust plugin
> authors to write plugins?
>
> Also, why do you think checking signatures is actually useful? It only
> checks that the signature is right, not that the expected semantics are
> observed. The argument for checking method signature in advance is as
> weak as the argument for checking types at compile time.

Sorry but it seems that you are now advocating against ABC altogether.

Checking the methods, and optionally their attributes is just a deeper
operation on something that already exists.

It's fine to use those only in your tests, but why do you object that
someone would want to use them in their app.

This is completely orthogonal to the discussion which is: extend a
method checker to check attributes.

>
>> > It depends on the arguments. And the implementation could definitely use
>> > *args or **kwargs arguments, especially if it acts as a proxy.
>>
>> Sure but ISTM that most of the time signatures are well defined, and
>> proxies lives in an upper layer.
>
> Not really. If I write a file object wrapper that proxies some methods
> to an other file object, I don't want to re-type all method signatures
> (including default args) by hand.

In that case I am curious to see why you would have file I/O method
with extra *args/**kwargs. You should handle this kind of set up in
the constructor and keep the methods similar. (and avoid extra re-type
actually)

Regards
Tarek

>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
Tarek Ziad? | http://ziade.org


From ziade.tarek at gmail.com  Thu Sep 23 21:00:12 2010
From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Thu, 23 Sep 2010 21:00:12 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <20100923205224.3fc27060@pitrou.net>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<i7g6s6$nml$1@dough.gmane.org> <20100923205224.3fc27060@pitrou.net>
Message-ID: <AANLkTi=7dqEooqxFAfC6FB_sLsDyqH4vCx+PCUfPh-Mh@mail.gmail.com>

On Thu, Sep 23, 2010 at 8:52 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 23 Sep 2010 14:39:01 -0400
> Terry Reedy <tjreedy at udel.edu> wrote:
>> If I were writing a class intended to implement an particular ABC, I
>> would be happy to have an automated check function that might catch
>> errors. 100% testing is hard to achieve.
>
> How would an automatic check function solve anything, if you don't test
> that the class does what is expected?
>
> Again, this is exactly the argument for compile-time type checking, and
> it is routinely pointed out that it is mostly useless.

So are you in favor of the removal of all kind of type checking
mechanism in Python ?

>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
Tarek Ziad? | http://ziade.org


From daniel at stutzbachenterprises.com  Thu Sep 23 21:03:52 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Thu, 23 Sep 2010 14:03:52 -0500
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <20100923205224.3fc27060@pitrou.net>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<i7g6s6$nml$1@dough.gmane.org> <20100923205224.3fc27060@pitrou.net>
Message-ID: <AANLkTikEkgF3hTm3Fd=v2X_7gY-1T2+hhXD_4QheD4tt@mail.gmail.com>

On Thu, Sep 23, 2010 at 1:52 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> How would an automatic check function solve anything, if you don't test
> that the class does what is expected?
>

Automated checks are a good way to help ensure that your test coverage is
good.  If the automated check fails and all the other tests pass, it means
there's been an oversight in both functionality and tests.

This isn't a purely theoretical concern.  See Issues 9212 and 9213 for cases
where a class purported to support an ABC but wasn't actually supplying all
the methods.

-- 
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100923/a8062d0b/attachment.html>

From solipsis at pitrou.net  Thu Sep 23 21:26:23 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 23 Sep 2010 21:26:23 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTinW0zNMYRpwr64xY7hmKGboBpSk66eocHF9jisX@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<1285264893.3178.14.camel@localhost.localdomain>
	<AANLkTinW0zNMYRpwr64xY7hmKGboBpSk66eocHF9jisX@mail.gmail.com>
Message-ID: <1285269983.3178.46.camel@localhost.localdomain>

Le jeudi 23 septembre 2010 ? 20:59 +0200, Tarek Ziad? a ?crit :
> 
> That's fine indeed. Now, why wouldn't the implementor of an
> application use ABC to check that the third party class he's about to
> load in his app implements the desired ABC?

Why would he? What does it provide him exactly? A false sense of
security / robustness?

> > Also, why do you think checking signatures is actually useful? It only
> > checks that the signature is right, not that the expected semantics are
> > observed. The argument for checking method signature in advance is as
> > weak as the argument for checking types at compile time.
> 
> Sorry but it seems that you are now advocating against ABC altogether.

As I said, I believe ABCs are useful mainly for documentation purposes;
that is, for conveying an /intent/.
Thinking that ABCs guarantee anything about quality or conformity of the
implementation sounds wrong to me.

(the other reason for using ABCs is to provide default implementations
of some methods, like the io ABCs do)

> This is completely orthogonal to the discussion which is: extend a
> method checker to check attributes.

It's not really orthogonal. I'm opposing the idea that programmatically
checking the conformity of method signatures is useful; I also think
it's *not* a good thing to advocate to Python programmers coming from
other languages.

> In that case I am curious to see why you would have file I/O method
> with extra *args/**kwargs.

def seek(self, *args):
    return self.realfileobj.seek(*args)

> So are you in favor of the removal of all kind of type checking
> mechanism in Python ?

"Type" checking is simply done when necessary. It is duck typing.
Even in the case of ABCs, method calls are still duck-typed. For
example, if you look at the io ABCs and concrete classes, a
BufferedReader won't check that you are giving it a RawIOBase to wrap
access to.

Regards

Antoine.




From guido at python.org  Thu Sep 23 21:26:48 2010
From: guido at python.org (Guido van Rossum)
Date: Thu, 23 Sep 2010 12:26:48 -0700
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <20100923205224.3fc27060@pitrou.net>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<i7g6s6$nml$1@dough.gmane.org> <20100923205224.3fc27060@pitrou.net>
Message-ID: <AANLkTikQmJWVuvuf91-26uPW_mzx=XUu0RuuM3YphX8x@mail.gmail.com>

On Thu, Sep 23, 2010 at 11:52 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Thu, 23 Sep 2010 14:39:01 -0400
> Terry Reedy <tjreedy at udel.edu> wrote:
>> If I were writing a class intended to implement an particular ABC, I
>> would be happy to have an automated check function that might catch
>> errors. 100% testing is hard to achieve.
>
> How would an automatic check function solve anything, if you don't test
> that the class does what is expected?
>
> Again, this is exactly the argument for compile-time type checking, and
> it is routinely pointed out that it is mostly useless.

That may be the party line of dynamic-language diehards, but that
doesn't make it true. There are plenty of times when compile-time
checking can save the day, and typically, the larger a system, the
more useful it becomes. Antoine, can you back off your attempts to
prove that the proposed feature is useless and instead help designing
the details of the feature (or if you can't or don't want to help
there, just stay out of the discussion)?

-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Thu Sep 23 23:42:12 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 24 Sep 2010 07:42:12 +1000
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
Message-ID: <AANLkTim-Gik+p8DSfE5c_A2cg-OYLc6z4NZsET2kLqzO@mail.gmail.com>

On Fri, Sep 24, 2010 at 2:18 AM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> I think it goes further than documentation at this point. ABC is
> present and used in the stdlib, not the doc.
> So asking a class about its capabilities is a feature we provide for
> third-party code.

Minor nit - we can only ask a fairly limited subset of questions along
these lines (i.e. does *this* class/instance implement *this* ABC?).
More interesting questions like "which ABCs does this class/instance
explicitly implement?" are currently impossible (see
http://bugs.python.org/issue5405).

Back on topic - I like Guido's approach. While we can debate the
merits of LBYL signature checking forever without reaching agreement
(for the record, my opinion is that static checks should be thought of
as a bunch of implicit unit tests that you get "for free"), providing
a way to explicitly request ABC signature checks in the abc module
probably isn't a bad idea. If nothing else, invoking that check can
become a recommended part of the unit test suite for classes that
claim to implement ABCs. Is getting the method signatures right
*sufficient* for ABC compliance? No. Is it *necessary*? Yes. It's the
latter point that makes this feature potentially worth standardising.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From tjreedy at udel.edu  Fri Sep 24 01:15:08 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 23 Sep 2010 19:15:08 -0400
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <20100923205224.3fc27060@pitrou.net>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>	<20100923173955.4fc0bb03@pitrou.net>	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>	<1285259569.3178.9.camel@localhost.localdomain>	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>	<i7g6s6$nml$1@dough.gmane.org>
	<20100923205224.3fc27060@pitrou.net>
Message-ID: <i7gn1t$v7m$1@dough.gmane.org>

On 9/23/2010 2:52 PM, Antoine Pitrou wrote:
> On Thu, 23 Sep 2010 14:39:01 -0400
> Terry Reedy<tjreedy at udel.edu>  wrote:
>> If I were writing a class intended to implement an particular ABC, I
>> would be happy to have an automated check function that might catch
>> errors. 100% testing is hard to achieve.
>
> How would an automatic check function solve anything, if you don't test
> that the class does what is expected?

If all tests are written with calls by position, as is my habit and 
general preference, they will not catch argument name mismatches that 
would trip up someone who prefers call by keyword or any 
introspection-by-name process.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Fri Sep 24 02:24:13 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 23 Sep 2010 20:24:13 -0400
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTikQmJWVuvuf91-26uPW_mzx=XUu0RuuM3YphX8x@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>	<20100923173955.4fc0bb03@pitrou.net>	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>	<1285259569.3178.9.camel@localhost.localdomain>	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>	<i7g6s6$nml$1@dough.gmane.org>
	<20100923205224.3fc27060@pitrou.net>
	<AANLkTikQmJWVuvuf91-26uPW_mzx=XUu0RuuM3YphX8x@mail.gmail.com>
Message-ID: <i7gr3g$cet$1@dough.gmane.org>

On 9/23/2010 3:26 PM, Guido van Rossum wrote:
> On Thu, Sep 23, 2010 at 11:52 AM, Antoine Pitrou<solipsis at pitrou.net>  wrote:
>> On Thu, 23 Sep 2010 14:39:01 -0400
>> Terry Reedy<tjreedy at udel.edu>  wrote:
>>> If I were writing a class intended to implement an particular ABC, I
>>> would be happy to have an automated check function that might catch
>>> errors. 100% testing is hard to achieve.
>>
>> How would an automatic check function solve anything, if you don't test
>> that the class does what is expected?
>>
>> Again, this is exactly the argument for compile-time type checking, and
>> it is routinely pointed out that it is mostly useless.
>
> That may be the party line of dynamic-language diehards, but that
> doesn't make it true. There are plenty of times when compile-time
> checking can save the day, and typically, the larger a system, the
> more useful it becomes.

Sometimes you surprise me with your non-dogmatic practicality. I do 
hope, though, that you continue to reject C-like braces {;-}.

 > Antoine, can you back off your attempts to
> prove that the proposed feature is useless and instead help designing
> the details of the feature (or if you can't or don't want to help
> there, just stay out of the discussion)?

Yes, let the cat scratch his itch and see what he produces.

Since unittests have been brought up, I have a idea and question.
Can this work? Split the current test suite for a concrete class that 
implements one of the ABCs into concrete-specific and ABC-general 
portions, with the abstract part parameterized by concrete class.

For instance, split test/test_dict.py into test_dict.py and 
test_Mapping.py, where the latter has all tests that test compliance 
with the Mapping ABC (or whatever it is called) and the former keeps all 
the dict-specific extension tests. Rewrite test_Mapping so it is not 
dict specific, so one could write something like

class MyMapping():
   "Implement exactly the Mapping ABC with no extras."
...
if __name__ == '__main__':
   from test import test_Mapping as tM
   tM.concrete = MyMapping
   tM.runtests()

This is similar to but not the same as splitting tests into generic and 
CPython parts, the latter for reuse by other implementations of the 
interpreter. (For dicts, test_dict.py could still be so split, or a 
portion of it made conditional on the platform.) This idea is for reuse 
of tests by other implementations of ABCs, whatever interpreter 
implementation they run under.

The underlying question is whether ABCs are intended to be an integral 
part of Python3 or just an optional extra tucked away in a corner (which 
is how many, including me, still tend to view them)? If the former, then 
to me they should, if possible, be supported by a semantic validation 
test suite.

In a way, I am agreeing with Antoine's objection that signature 
validation is not enough, but with the opposite suggestion of extend 
rather than reject Tarek's idea of providing auto test tools that make 
writing and using ABCs easier.

-- 
Terry Jan Reedy



From digitalxero at gmail.com  Fri Sep 24 05:42:41 2010
From: digitalxero at gmail.com (Dj Gilcrease)
Date: Thu, 23 Sep 2010 23:42:41 -0400
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
Message-ID: <AANLkTi=5Y5pPPsO71uGOb9wCc55Bo2t8dDvWAFMePq=Z@mail.gmail.com>

On Thu, Sep 23, 2010 at 1:51 PM, Tarek Ziad? <ziade.tarek at gmail.com> wrote:
> Why are you thinking about unit tests ?? Don't you ever use
> issubclass/isinstance in your programs ?
>
> Checking signatures using ABC when you create a plugin system is one
> use case for instance.

This is something that I have implemented (before ABCs) in plugin
systems I use. When loading the plugin I validate all methods exist
and that each method has the correct number of required arguments, I
generally dont check argument name as my plugin systems all pass by
position instead of keyword. If the signature I am checking contains
*args it automatically passes the check. If the plugin fails the check
I dont load it.

On Thu, Sep 23, 2010 at 2:01 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Again, why do you want to check signatures? Do you not trust plugin
> authors to write plugins?

No, no I dont. I have had several plugin authors come to me
complaining that the plugin system is broken because it wont load
their plugin (even with a fairly detailed error message).


Dj Gilcrease
?____
( | ? ? \ ?o ? ?() ? | ?o ?|`|
? | ? ? ?| ? ? ?/`\_/| ? ? ?| | ? ,__ ? ,_, ? ,_, ? __, ? ?, ? ,_,
_| ? ? ?| | ? ?/ ? ? ?| ?| ? |/ ? / ? ? ?/ ? | ? |_/ ?/ ? ?| ? / \_|_/
(/\___/ ?|/ ?/(__,/ ?|_/|__/\___/ ? ?|_/|__/\__/|_/\,/ ?|__/
? ? ? ? ?/|
? ? ? ? ?\|


From andrew at bemusement.org  Fri Sep 24 07:58:00 2010
From: andrew at bemusement.org (Andrew Bennetts)
Date: Fri, 24 Sep 2010 15:58:00 +1000
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <i7gr3g$cet$1@dough.gmane.org>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<i7g6s6$nml$1@dough.gmane.org> <20100923205224.3fc27060@pitrou.net>
	<AANLkTikQmJWVuvuf91-26uPW_mzx=XUu0RuuM3YphX8x@mail.gmail.com>
	<i7gr3g$cet$1@dough.gmane.org>
Message-ID: <20100924055800.GA2633@aihal.home.puzzling.org>

Terry Reedy wrote:
[...]
> Since unittests have been brought up, I have a idea and question.
> Can this work? Split the current test suite for a concrete class
> that implements one of the ABCs into concrete-specific and
> ABC-general portions, with the abstract part parameterized by
> concrete class.

FWIW, bzr's test suite has this facility, and bzr plugins that implement
various bzr interfaces will have tests for those interfaces
automatically applied.  (Being a Python 2.4+ project, bzr doesn't
actually use the ABCs feature, but we certainly use the concept of
"interface with many implemenations".)

E.g. if you define a new Transport (in bzr terms, a thing like FTP,
HTTP, etc) you probably want to make sure it complies with bzrlib's
expectations for Transports.  So you can include a get_test_permutations
function in your module that returns a list of (transport_class,
server_class) pairs.  [Unsurprisingly you need a test server to run
against, although for transports like LocalTransport (local filesystem
access) they can be very simple.]

It works very well, and is very useful both for bzrlib itself and
plugins.  We have ?per-implementation? tests for: branch, bzrdir,
repository, interrepository, merger, transport, tree, workingtree,
uifactory, and more.  Look for bzrlib/tests/per_*.

It's not necessarily easy to write all those tests.  The more complex an
interface, the more likely it is you'll have many tests for that
interface that don't really apply to all implementations ? for instance
some Transports are read-only, or don't support list_dir, etc.  So tests
that involve those need to specifically check for that capability and
raise NotApplicable, and finding the exact right way to do that can be
tricky.  It's often easier to say ?if isinstance(thing,
ParticularImplementation): ...?, but that quickly erodes the
applicability of those tests for new implementations.

Also tricky is when the setup or even assertions for some tests needs to
vary considerably by implementation: how complicated is your
parameterisation interface going to have to be?

bzr has found it worthwhile, so I do encourage trying it.  I'd use
Robert Collins' http://launchpad.net/testscenarios library if I were
providing this infrastructure in a suite that doesn't already have this
approach; it's basically a distillation of the infrastructure developed
in bzrlib.tests.

-Andrew.


From g.brandl at gmx.net  Fri Sep 24 09:15:00 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 24 Sep 2010 09:15:00 +0200
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
Message-ID: <i7hj68$k4l$1@dough.gmane.org>

Am 23.09.2010 16:37, schrieb Tarek Ziad?:
> Hello,
> 
> ABC __subclasshook__ implementations will only check that the method
> is present in the class. That's the case for example in
> collections.Container. It will check that the __contains__ method is
> present but that's it. It won't check that the method has only one
> argument. e.g. __contains__(self, x)
> 
> The problem is that the implemented method could have a different list
> of arguments and will eventually fail.

I'm not concerned about this in the least.  Whoever implements a special
method with the wrong signature has more pressing problems than a false-
positive ABC subclass check.  And AFAIK, our ABCs only check for special
methods.

> Using inspect, we could check in __subclasshook__ that the arguments
> defined are the same than the ones defined in the abstractmethod.--
> the name and the ordering.

"ordering"?

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



From daniel at stutzbachenterprises.com  Fri Sep 24 16:17:19 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Fri, 24 Sep 2010 09:17:19 -0500
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <i7gr3g$cet$1@dough.gmane.org>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>
	<20100923173955.4fc0bb03@pitrou.net>
	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>
	<1285259569.3178.9.camel@localhost.localdomain>
	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>
	<i7g6s6$nml$1@dough.gmane.org> <20100923205224.3fc27060@pitrou.net>
	<AANLkTikQmJWVuvuf91-26uPW_mzx=XUu0RuuM3YphX8x@mail.gmail.com>
	<i7gr3g$cet$1@dough.gmane.org>
Message-ID: <AANLkTikSw4-G6tFj_14CoW7PsmBRcxLOR-h4wbHy7E1P@mail.gmail.com>

On Thu, Sep 23, 2010 at 7:24 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> Can this work? Split the current test suite for a concrete class that
> implements one of the ABCs into concrete-specific and ABC-general portions,
> with the abstract part parameterized by concrete class.
>
> For instance, split test/test_dict.py into test_dict.py and
> test_Mapping.py, where the latter has all tests that test compliance with
> the Mapping ABC (or whatever it is called) and the former keeps all the
> dict-specific extension tests. Rewrite test_Mapping so it is not dict
> specific, so one could write something like
>

As a heavy user of the ABCs in the collections module, that would be
awesome. :-)  It would make my life a lot easier when I'm writing tests to
go along with an ABC-derived class.  I have 8 such classes on PyPi
(heapdict.heapdict and blist.*), plus more in private repositories.

There is some code vaguely along those lines in the existing unit tests.
 For example, Lib/test/seq_tests.py contains tests common to sequences.
 However, that was written before collections.Sequence came along and the
pre-2.6 definition of "sequence" only loosely correlates with a
collections.Sequence.

-- 
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100924/6c645338/attachment.html>

From tjreedy at udel.edu  Fri Sep 24 18:20:49 2010
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 24 Sep 2010 12:20:49 -0400
Subject: [Python-ideas] ABC: what about the method arguments ?
In-Reply-To: <AANLkTikSw4-G6tFj_14CoW7PsmBRcxLOR-h4wbHy7E1P@mail.gmail.com>
References: <AANLkTinTqnRq8aGUTT0uzZp=sVask+82We6FPmv5CQfb@mail.gmail.com>	<20100923173955.4fc0bb03@pitrou.net>	<AANLkTi=rEYCNrd8_hcUUi4wsdjOa00a=OPumhcF=joQ1@mail.gmail.com>	<1285259569.3178.9.camel@localhost.localdomain>	<AANLkTintC7n26fQ5KAybne=-qAWDTnEEE+Adsp=Cxjtz@mail.gmail.com>	<i7g6s6$nml$1@dough.gmane.org>
	<20100923205224.3fc27060@pitrou.net>	<AANLkTikQmJWVuvuf91-26uPW_mzx=XUu0RuuM3YphX8x@mail.gmail.com>	<i7gr3g$cet$1@dough.gmane.org>
	<AANLkTikSw4-G6tFj_14CoW7PsmBRcxLOR-h4wbHy7E1P@mail.gmail.com>
Message-ID: <i7ij53$1up$1@dough.gmane.org>

On 9/24/2010 10:17 AM, Daniel Stutzbach wrote:
> On Thu, Sep 23, 2010 at 7:24 PM, Terry Reedy
> <tjreedy at udel.edu
> <mailto:tjreedy at udel.edu>> wrote:
>
>     Can this work? Split the current test suite for a concrete class
>     that implements one of the ABCs into concrete-specific and
>     ABC-general portions, with the abstract part parameterized by
>     concrete class.
>
>     For instance, split test/test_dict.py into test_dict.py and
>     test_Mapping.py, where the latter has all tests that test compliance
>     with the Mapping ABC (or whatever it is called) and the former keeps
>     all the dict-specific extension tests. Rewrite test_Mapping so it is
>     not dict specific, so one could write something like

Reading the responses, I realized that I am already doing a simplified 
version of my suggestion for functions rather than classes. For didactic 
purposes, I am writing multiple implementations of multiple abstract 
functions. I embody a test for a particular function in an iterable of 
input-output pairs (where the 'output' can also be an exception class). 
I use that with a custom super test function that tests one or more 
callables against the pairs. It works great and it is easy to add 
another implementation or more pairs.

> As a heavy user of the ABCs in the collections module, that would be
> awesome. :-)  It would make my life a lot easier when I'm writing tests
> to go along with an ABC-derived class.  I have 8 such classes on PyPi
> (heapdict.heapdict and blist.*), plus more in private repositories.
>
> There is some code vaguely along those lines in the existing unit tests.
>   For example, Lib/test/seq_tests.py contains tests common to sequences.
>   However, that was written before collections.Sequence came along and
> the pre-2.6 definition of "sequence" only loosely correlates with a
> collections.Sequence.

Well, pick one existing test file, revise and extend and perhaps split, 
start a tracker issue with proposed patch, get comments, and perhaps 
commit it. If you do, add terry.reedy as nosy.

-- 
Terry Jan Reedy



From greg.ewing at canterbury.ac.nz  Sat Sep 25 03:55:55 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 25 Sep 2010 13:55:55 +1200
Subject: [Python-ideas]
 =?utf-8?q?=5BPython-Dev=5D_os=2Epath_function_for_?=
 =?utf-8?b?4oCcZ2V0IHRoZSByZWFsIGZpbGVuYW1l4oCd?=
In-Reply-To: <877hia4tte.fsf_-_@benfinney.id.au>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <877hia4tte.fsf_-_@benfinney.id.au>
Message-ID: <4C9D56AB.2060602@canterbury.ac.nz>

Ben Finney wrote:

> Your heuristics seem to assume there will only ever be a maximum of one
> match, which is false. I present the following example:
> 
>     $ ls foo/
>         bAr.dat  BaR.dat  bar.DAT

There should perhaps be an extra step at the beginning:

0) Test whether the specified path refers to an existing
file. If not, raise an exception.

If that passes, and the file system is case-sensitive, then
there must be a directory entry that is an exact match, so
it will be returned by step 1.

If the file system is case-insensitive, then there can be
at most one entry that matches except for case, and it must
be the one we're looking for, so there is no need for the
extra test in step 2.

So the revised algorithm is:

0) Test whether the specified path refers to an existing
    file. If not, raise an exception.

1) Search the directory for an exact match, return it if found.

2) Search for a match ignoring case, and return one if found.

3) Otherwise, raise an exception.

There's also some prior art that might be worth looking at:
On Windows, Python checks to see whether the file name of an
imported module has the same case as the name being imported,
which is a similar problem in some ways.

> It seems to me this whole thing should be hashed out on ?python-ideas?.

Good point -- I've redirected the discussion there.

-- 
Greg



From greg.ewing at canterbury.ac.nz  Sat Sep 25 03:56:06 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 25 Sep 2010 13:56:06 +1200
Subject: [Python-ideas] [Python-Dev] os.path.normcase rationale?
In-Reply-To: <AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
Message-ID: <4C9D56B6.9050908@canterbury.ac.nz>

Guido van Rossum wrote:

> Maybe the API could be called os.path.unnormpath(), since it is in a
> sense the opposite of normpath() (which removes case) ?

Cute, but not very intuitive. Something like actualpath()
might be better -- although that's somewhat arbitrarily
different from realpath().

-- 
Greg


From python at mrabarnett.plus.com  Sat Sep 25 04:14:51 2010
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 25 Sep 2010 03:14:51 +0100
Subject: [Python-ideas] [Python-Dev] os.path.normcase rationale?
In-Reply-To: <4C9D56B6.9050908@canterbury.ac.nz>
References: <4C9531A7.10405@simplistix.co.uk>	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>	<4C9C79DA.7000506@simplistix.co.uk>	<20100924121737.309071FA5C2@kimball.webabinitio.net>	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>	<4C9D21E8.1080005@canterbury.ac.nz>
	<4C9D298A.3010407@g.nevcal.com>	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
Message-ID: <4C9D5B1B.3020709@mrabarnett.plus.com>

On 25/09/2010 02:56, Greg Ewing wrote:
> Guido van Rossum wrote:
>
>> Maybe the API could be called os.path.unnormpath(), since it is in a
>> sense the opposite of normpath() (which removes case) ?
>
> Cute, but not very intuitive. Something like actualpath()
> might be better -- although that's somewhat arbitrarily
> different from realpath().
>
'actualcase' perhaps? Does it need to end in 'path'?


From solipsis at pitrou.net  Sat Sep 25 12:11:42 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 25 Sep 2010 12:11:42 +0200
Subject: [Python-ideas] [Python-Dev] os.path.normcase rationale?
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
Message-ID: <20100925121142.74fe35e1@pitrou.net>

On Sat, 25 Sep 2010 13:56:06 +1200
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> 
> > Maybe the API could be called os.path.unnormpath(), since it is in a
> > sense the opposite of normpath() (which removes case) ?
> 
> Cute, but not very intuitive. Something like actualpath()
> might be better -- although that's somewhat arbitrarily
> different from realpath().

Again, why not simply improve realpath()?




From ben+python at benfinney.id.au  Sat Sep 25 16:00:57 2010
From: ben+python at benfinney.id.au (Ben Finney)
Date: Sun, 26 Sep 2010 00:00:57 +1000
Subject: [Python-ideas] =?utf-8?b?4oCYb3MucGF0aC5mb2/igJkgZnVuY3Rpb24gdG8g?=
 =?utf-8?q?get_the_name_of_a_filesystem_entry_=28was=3A_=5BPython-Dev=5D_o?=
 =?utf-8?q?s=2Epath=2Enormcase_rationale=3F=29?=
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
Message-ID: <87vd5t3oti.fsf_-_@benfinney.id.au>

Antoine Pitrou <solipsis at pitrou.net>
writes:

> Again, why not simply improve realpath()?

Because that already does what it says it does.

The behaviour being asked for is distinct from what ?os.path.normcase?
and ?os.path.realpath? are meant to do, so that behaviour belongs in a
different place from those two.

-- 
 \           ?Value your freedom or you will lose it, teaches history. |
  `\     ?Don't bother us with politics,? respond those who don't want |
_o__)                               to learn.? ?Richard Stallman, 2002 |
Ben Finney



From solipsis at pitrou.net  Sat Sep 25 16:11:57 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 25 Sep 2010 16:11:57 +0200
Subject: [Python-ideas] reusing realpath()
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
Message-ID: <20100925161157.17059398@pitrou.net>

On Sun, 26 Sep 2010 00:00:57 +1000
Ben Finney <ben+python at benfinney.id.au> wrote:
> Antoine Pitrou <solipsis at pitrou.net>
> writes:
> 
> > Again, why not simply improve realpath()?
> 
> Because that already does what it says it does.

So what? The behaviour of fetching the canonical name can be added to
the behaviour of resolving symlinks. It wouldn't be incompatible with
the current behaviour AFAICT. And it would be better than adding yet
another function to our m?nagerie of path-normalizing functions.
We already have abspath(), normpath(), normcase(), realpath() -- all
with very descriptive names as you might notice. We don't need another
function.

Regards

Antoine.




From guido at python.org  Sat Sep 25 22:55:30 2010
From: guido at python.org (Guido van Rossum)
Date: Sat, 25 Sep 2010 13:55:30 -0700
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <20100925161157.17059398@pitrou.net>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
Message-ID: <AANLkTi=SPocJ=C+d-yzHxVbtwV3TdZ=rEEqukmDwumHc@mail.gmail.com>

On Sat, Sep 25, 2010 at 7:11 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 26 Sep 2010 00:00:57 +1000
> Ben Finney <ben+python at benfinney.id.au> wrote:
>> Antoine Pitrou <solipsis at pitrou.net>
>> writes:
>>
>> > Again, why not simply improve realpath()?
>>
>> Because that already does what it says it does.
>
> So what? The behaviour of fetching the canonical name can be added to
> the behaviour of resolving symlinks. It wouldn't be incompatible with
> the current behaviour AFAICT. And it would be better than adding yet
> another function to our m?nagerie of path-normalizing functions.
> We already have abspath(), normpath(), normcase(), realpath() -- all
> with very descriptive names as you might notice. We don't need another
> function.

There's no need to get all emotional or sarcastic about it. You might
have noticed the risks of sarcasm on this list recently.

Instead, it should be possibly to analyze how realpath() is currently
used and see if changing it as desired is likely to break any code.

TBH, I am personally on the fence and would like to see an analysis
including the current and desired behavior in the following cases:

- Windows
- OS X
- Other Unixoid systems

Also take into account:

- Filesystems whose case behavior is the opposite of the platform
default (all three support such filesystems through system
configuration and/or mounting)
- Relative paths
- Paths containing symlinks

In any case it is much easier to design and implement the best
possible functionality if you don't also have to be backward
compatible with an existing function. I think it might be useful to
call this new API (let's call it "casefulpath" while we wait for a
better name to come to us :-) on a relative path without having the
answer turned into an absolute path -- if that's desired it's easy
enough to call abspath() or realpath() on the result.

-- 
--Guido van Rossum (python.org/~guido)


From solipsis at pitrou.net  Sat Sep 25 23:04:03 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 25 Sep 2010 23:04:03 +0200
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <AANLkTi=SPocJ=C+d-yzHxVbtwV3TdZ=rEEqukmDwumHc@mail.gmail.com>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
	<AANLkTi=SPocJ=C+d-yzHxVbtwV3TdZ=rEEqukmDwumHc@mail.gmail.com>
Message-ID: <1285448643.17320.1.camel@localhost.localdomain>

Le samedi 25 septembre 2010 ? 13:55 -0700, Guido van Rossum a ?crit :
> 
> There's no need to get all emotional or sarcastic about it. You might
> have noticed the risks of sarcasm on this list recently.

Ironic considering the naming of the language :)
Anyway:

> I think it might be useful to
> call this new API (let's call it "casefulpath" while we wait for a
> better name to come to us :-)

realcase() ?




From pjenvey at underboss.org  Sat Sep 25 23:57:42 2010
From: pjenvey at underboss.org (Philip Jenvey)
Date: Sat, 25 Sep 2010 14:57:42 -0700
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <20100925161157.17059398@pitrou.net>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
Message-ID: <E10B9E3E-5848-40A1-AA39-FF55B06FF41C@underboss.org>


On Sep 25, 2010, at 7:11 AM, Antoine Pitrou wrote:

> On Sun, 26 Sep 2010 00:00:57 +1000
> Ben Finney <ben+python at benfinney.id.au> wrote:
>> Antoine Pitrou <solipsis at pitrou.net>
>> writes:
>> 
>>> Again, why not simply improve realpath()?
>> 
>> Because that already does what it says it does.
> 
> So what? The behaviour of fetching the canonical name can be added to
> the behaviour of resolving symlinks. It wouldn't be incompatible with
> the current behaviour AFAICT. And it would be better than adding yet
> another function to our m?nagerie of path-normalizing functions.
> We already have abspath(), normpath(), normcase(), realpath() -- all
> with very descriptive names as you might notice. We don't need another
> function.

realpath's docs describe its result as "the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system)".

"Canonical" should describe the behavior we're after, with the correct case of the filename as it is actually stored on disk.

But isn't realpath modeled after POSIX realpath(3)? realpath(3) doesn't seem to clearly guarantee the original name as stored on disk either. However realpath(3) on OSX 10.6 with case-insensitive HFS+ does return the original name as it was stored. Do any other platforms do this and do we care about maintaining parity with realpath(3)?

--
Philip Jenvey

From greg.ewing at canterbury.ac.nz  Sun Sep 26 01:02:00 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 26 Sep 2010 11:02:00 +1200
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <20100925161157.17059398@pitrou.net>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
Message-ID: <4C9E7F68.9030308@canterbury.ac.nz>

Antoine Pitrou wrote:

> So what? The behaviour of fetching the canonical name can be added to
> the behaviour of resolving symlinks.

Finding the actual name (I wouldn't call it "canonical",
since that term could be ambiguous) requires reading the
contents of entire directories at each step, which could
be noticeably less efficient than what realpath() currently
does. Users who only want symlinks expanded might object
to that.

An option could be added to realpath(), but then we're
into constant-parameter territory.

-- 
Greg


From ncoghlan at gmail.com  Sun Sep 26 10:17:49 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 26 Sep 2010 18:17:49 +1000
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <4C9E7F68.9030308@canterbury.ac.nz>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
	<4C9E7F68.9030308@canterbury.ac.nz>
Message-ID: <AANLkTingcjvesxuzOaX-TwoVnVnm5GohGV0uQgvANN2Q@mail.gmail.com>

On Sun, Sep 26, 2010 at 9:02 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Antoine Pitrou wrote:
>
>> So what? The behaviour of fetching the canonical name can be added to
>> the behaviour of resolving symlinks.
>
> Finding the actual name (I wouldn't call it "canonical",
> since that term could be ambiguous) requires reading the
> contents of entire directories at each step, which could
> be noticeably less efficient than what realpath() currently
> does. Users who only want symlinks expanded might object
> to that.
>
> An option could be added to realpath(), but then we're
> into constant-parameter territory.

Constant parameter territory isn't *necessarily* a bad thing if the
number of parameters is sufficiently high. In particular, if you have
one basic command (say, "give me the canonical path for this
possibly-non-canonical path I already have") with a gazillion
different variants (*ahem*), then a single function with well-named
boolean parameters (to explain "this is what I really mean by
'canonical path'") is likely to be much easier for people to remember
than trying to create a concise-yet-meaningful mnemonic for each
variant.

So we shouldn't dismiss out of hand the idea of a keyword-only
"swiss-army" path normalisation function that can at least be queried
via help() if you forget the exact spelling for the various
parameters.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From dickinsm at gmail.com  Sun Sep 26 13:05:11 2010
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 26 Sep 2010 12:05:11 +0100
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
Message-ID: <AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>

On Tue, Sep 21, 2010 at 7:44 PM, Michael Gilbert
<michael.s.gilbert at gmail.com> wrote:
> It would be really nice if elementary mathematical operations such as
> sin/cosine (via __sin__ and __cos__) were available as base parts of
> the python data model [0]. ?This would make it easier to write new math
> classes, and it would eliminate the ugliness of things like self.exp().
>
> This would also eliminate the need for separate math and cmath
> libraries since those could be built into the default float and complex
> types.

Hmm.  Are you proposing adding 'sin', 'cos', etc. as new builtins?  If
so, I think this is a nonstarter:  the number of Python builtins is
deliberately kept quite small, and adding all these functions (we
could argue about which ones, but it seems to me that you're talking
about around 18 new builtins---e.g., 6 trig and inverse trig, 6
hyperbolic and inverse hyperbolic, exp, expm1, log, log10, log1p,
sqrt) would enlarge it considerably.  For many users, those functions
would just be additional bloat in builtins, and there's possibility of
confusion with existing variables with the same name ('log' seems like
a particular candidate for this; 'sin' less likely, but who knows ;-).

A less invasive proposal would be just to introduce __sin__, etc.
magic methods and have math.sin delegate to <type>.__sin__;  i.e.,
have math.sin work in exactly the same way that math.floor and
math.ceil currently work.  That would be quite nice for e.g., the
decimal module:  you'd be able to write something like:

from math import sqrt
root = (-b + sqrt(b*b - 4*a*c)) / (2*a)

to compute the root of a quadratic equation, and it would work
regardless of whether a, b, c were Decimal instances or floats.

I'm not sure how I feel about the entailed magic method explosion, though.

Mark


From ncoghlan at gmail.com  Sun Sep 26 14:07:50 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 26 Sep 2010 22:07:50 +1000
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
Message-ID: <AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>

On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson <dickinsm at gmail.com> wrote:
> A less invasive proposal would be just to introduce __sin__, etc.
> magic methods and have math.sin delegate to <type>.__sin__; ?i.e.,
> have math.sin work in exactly the same way that math.floor and
> math.ceil currently work. ?That would be quite nice for e.g., the
> decimal module: ?you'd be able to write something like:
>
> from math import sqrt
> root = (-b + sqrt(b*b - 4*a*c)) / (2*a)
>
> to compute the root of a quadratic equation, and it would work
> regardless of whether a, b, c were Decimal instances or floats.
>
> I'm not sure how I feel about the entailed magic method explosion, though.

Couple that with the extra function call overhead (since these
wouldn't have real typeslots) and it still seems like a less than
stellar idea.

As another use case for solid, efficient generic function support
though... great idea :)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From solipsis at pitrou.net  Sun Sep 26 14:25:29 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 26 Sep 2010 14:25:29 +0200
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
Message-ID: <20100926142529.79ffaabd@pitrou.net>

On Sun, 26 Sep 2010 22:07:50 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson <dickinsm-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org> wrote:
> > A less invasive proposal would be just to introduce __sin__, etc.
> > magic methods and have math.sin delegate to <type>.__sin__; ?i.e.,
> > have math.sin work in exactly the same way that math.floor and
> > math.ceil currently work. ?That would be quite nice for e.g., the
> > decimal module: ?you'd be able to write something like:
> >
> > from math import sqrt
> > root = (-b + sqrt(b*b - 4*a*c)) / (2*a)
> >
> > to compute the root of a quadratic equation, and it would work
> > regardless of whether a, b, c were Decimal instances or floats.
> >
> > I'm not sure how I feel about the entailed magic method explosion, though.
> 
> Couple that with the extra function call overhead (since these
> wouldn't have real typeslots) and it still seems like a less than
> stellar idea.
> 
> As another use case for solid, efficient generic function support
> though... great idea :)

At the cost of even more execution overhead? :)

Regards

Antoine.




From ncoghlan at gmail.com  Sun Sep 26 14:34:06 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 26 Sep 2010 22:34:06 +1000
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <20100926142529.79ffaabd@pitrou.net>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
	<20100926142529.79ffaabd@pitrou.net>
Message-ID: <AANLkTi=uS+SjOYKBjkcXrecy5ZXNWnpRXyK8K_kkijVk@mail.gmail.com>

On Sun, Sep 26, 2010 at 10:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Couple that with the extra function call overhead (since these
>> wouldn't have real typeslots) and it still seems like a less than
>> stellar idea.
>>
>> As another use case for solid, efficient generic function support
>> though... great idea :)
>
> At the cost of even more execution overhead? :)

I did put that "efficient" in there for a reason! Now, I'm not saying
anything about how *reasonable* that idea is, but I can dream ;)

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From solipsis at pitrou.net  Sun Sep 26 14:38:46 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 26 Sep 2010 14:38:46 +0200
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
	<20100926142529.79ffaabd@pitrou.net>
	<AANLkTi=uS+SjOYKBjkcXrecy5ZXNWnpRXyK8K_kkijVk@mail.gmail.com>
Message-ID: <20100926143846.7021d807@pitrou.net>

On Sun, 26 Sep 2010 22:34:06 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sun, Sep 26, 2010 at 10:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> >> Couple that with the extra function call overhead (since these
> >> wouldn't have real typeslots) and it still seems like a less than
> >> stellar idea.
> >>
> >> As another use case for solid, efficient generic function support
> >> though... great idea :)
> >
> > At the cost of even more execution overhead? :)
> 
> I did put that "efficient" in there for a reason! Now, I'm not saying
> anything about how *reasonable* that idea is, but I can dream ;)

Well, I can't see how it could be less than the overhead involved in a
sqrt(x) -> x.__sqrt__() indirection anyway.

When I read Mark's example, I wondered why he didn't simply write
x**0.5 instead of sqrt(x), but it turns out it doesn't work on
decimals.

cheers

Antoine.




From masklinn at masklinn.net  Sun Sep 26 14:33:14 2010
From: masklinn at masklinn.net (Masklinn)
Date: Sun, 26 Sep 2010 14:33:14 +0200
Subject: [Python-ideas] Including elementary mathematical functions in
	the python data model
In-Reply-To: <AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
Message-ID: <369EA2B0-54ED-4204-96F9-408C4B8CB5BE@masklinn.net>

On 2010-09-26, at 14:07 , Nick Coghlan wrote:
> On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson <dickinsm at gmail.com> wrote:
>> A less invasive proposal would be just to introduce __sin__, etc.
>> magic methods and have math.sin delegate to <type>.__sin__;  i.e.,
>> have math.sin work in exactly the same way that math.floor and
>> math.ceil currently work.  That would be quite nice for e.g., the
>> decimal module:  you'd be able to write something like:
>> 
>> from math import sqrt
>> root = (-b + sqrt(b*b - 4*a*c)) / (2*a)
>> 
>> to compute the root of a quadratic equation, and it would work
>> regardless of whether a, b, c were Decimal instances or floats.
>> 
>> I'm not sure how I feel about the entailed magic method explosion, though.
> 
> Couple that with the extra function call overhead (since these
> wouldn't have real typeslots) and it still seems like a less than
> stellar idea.
> 
> As another use case for solid, efficient generic function support
> though... great idea :)
> 
> Cheers,
> Nick.

Couldn't that also be managed via ABCs for numerical types? Make sqrt & al methods of those types, and roll out in the sunset, no? The existing `math` functions could check on the presence of those methods (or the input types being instances of the ABCs they need), and fall back on the current implementations if they don't match.

From jason.orendorff at gmail.com  Sun Sep 26 17:48:35 2010
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Sun, 26 Sep 2010 10:48:35 -0500
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
Message-ID: <AANLkTinpNheZX6ksJqOBMd2Y_whP8PeAopmaeUfhbxNb@mail.gmail.com>

On Sun, Sep 26, 2010 at 7:07 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sun, Sep 26, 2010 at 9:05 PM, Mark Dickinson <dickinsm at gmail.com> wrote:
>> A less invasive proposal would be just to introduce __sin__, etc.
>> magic methods [...]
>>
>> I'm not sure how I feel about the entailed magic method explosion, though.
>
> Couple that with the extra function call overhead (since these
> wouldn't have real typeslots) and it still seems like a less than
> stellar idea.

This could certainly be implemented so as to be fast for floats and
flexible for everything else.

-j


From greg.ewing at canterbury.ac.nz  Mon Sep 27 00:21:47 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 27 Sep 2010 10:21:47 +1200
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <AANLkTingcjvesxuzOaX-TwoVnVnm5GohGV0uQgvANN2Q@mail.gmail.com>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
	<4C9E7F68.9030308@canterbury.ac.nz>
	<AANLkTingcjvesxuzOaX-TwoVnVnm5GohGV0uQgvANN2Q@mail.gmail.com>
Message-ID: <4C9FC77B.1000104@canterbury.ac.nz>

Nick Coghlan wrote:

> Constant parameter territory isn't *necessarily* a bad thing if the
> number of parameters is sufficiently high.

That's true, but the number of parameters wouldn't be
high in this case.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Mon Sep 27 00:29:19 2010
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 27 Sep 2010 10:29:19 +1200
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
Message-ID: <4C9FC93F.9020708@canterbury.ac.nz>

Nick Coghlan wrote:

> Couple that with the extra function call overhead (since these
> wouldn't have real typeslots) and it still seems like a less than
> stellar idea.
> 
> As another use case for solid, efficient generic function support
> though... great idea :)

Could a generic function mechanism be made to have any
less overhead, though?

-- 
Greg


From ncoghlan at gmail.com  Mon Sep 27 14:15:27 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 27 Sep 2010 22:15:27 +1000
Subject: [Python-ideas] reusing realpath()
In-Reply-To: <4C9FC77B.1000104@canterbury.ac.nz>
References: <4C9531A7.10405@simplistix.co.uk>
	<AANLkTim6m00hVqRT9LTfXz=gaEmMEdxrCvk7jpF-3Lch@mail.gmail.com>
	<4C9C79DA.7000506@simplistix.co.uk>
	<20100924121737.309071FA5C2@kimball.webabinitio.net>
	<AANLkTi==y+pDw7h4KiBf0mX+CBVxS9Fw-oUX16zJ8bpi@mail.gmail.com>
	<AANLkTinz1H+j_uVmH+uOgdSU=6Aw0tJvZhqQ-SQpDRdB@mail.gmail.com>
	<4C9D21E8.1080005@canterbury.ac.nz> <4C9D298A.3010407@g.nevcal.com>
	<AANLkTikEjnbxRj_4mgwF0mBFqDS3g2pNzMicb1tkGbAO@mail.gmail.com>
	<4C9D56B6.9050908@canterbury.ac.nz>
	<20100925121142.74fe35e1@pitrou.net>
	<87vd5t3oti.fsf_-_@benfinney.id.au>
	<20100925161157.17059398@pitrou.net>
	<4C9E7F68.9030308@canterbury.ac.nz>
	<AANLkTingcjvesxuzOaX-TwoVnVnm5GohGV0uQgvANN2Q@mail.gmail.com>
	<4C9FC77B.1000104@canterbury.ac.nz>
Message-ID: <AANLkTimTjB24kJHjKX2XoPda_JNoPHF=r4G1+hR2iHkn@mail.gmail.com>

On Mon, Sep 27, 2010 at 8:21 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>
>> Constant parameter territory isn't *necessarily* a bad thing if the
>> number of parameters is sufficiently high.
>
> That's true, but the number of parameters wouldn't be
> high in this case.

How high is high enough? Just in realpath, normpath, normcase we
already have 3 options, with the "match the existing case-preserving
filename if it exists" variant requested in this discussion making it
4. Supporting platform appropriate Unicode normalisation would make it
5.

Note that I'm not saying the swiss-army function is necessarily the
right answer here, but remembering "use os.realpath to get canonical
filenames" and then having a bunch of flags to enable/disable various
aspects of the normalisation (defaulting to the current implementation
of only expanding symlinks) fits my brain more easily than remembering
the distinctions between the tasks that currently correspond to each
function name. If there really isn't a name that makes sense for the
new variant, then maybe adding some constant parameters to one of the
existing methods is the way to go.

realpath and normpath are the two most likely candidates to use as a
basis for such an approach. If realpath was used as a basis, then it
would gain keyword-only parameters along the lines of
"expand_links=True", "collapse=False", "lower_case=False",
"match_case=False". Setting both lower_case=True and match_case=True
would trigger ValueError, but the API with separate boolean flags is
easier to use than one with a single tri-state parameter for the case
conversion. If normcase was used as a basis instead, then symlink
expansion would remain a separate operation and normpath would gain
"collapse=True", "lower_case=False", "match_case=False" as
keyword-only parameters.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From ncoghlan at gmail.com  Mon Sep 27 14:20:14 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 27 Sep 2010 22:20:14 +1000
Subject: [Python-ideas] Including elementary mathematical functions in
 the python data model
In-Reply-To: <4C9FC93F.9020708@canterbury.ac.nz>
References: <20100921144452.3cfd118b.michael.s.gilbert@gmail.com>
	<AANLkTikpj2LV2FGYOpz6eby+hO54paXUz1WxRrmy0t01@mail.gmail.com>
	<AANLkTimDg9kapXLdvBbs7akT=M0nYJUycjrYCJUwh0k8@mail.gmail.com>
	<4C9FC93F.9020708@canterbury.ac.nz>
Message-ID: <AANLkTi=Sskee-GZkL2g8ggtSF4KSmgrUk5SGXoyHNJX9@mail.gmail.com>

On Mon, Sep 27, 2010 at 8:29 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>
>> Couple that with the extra function call overhead (since these
>> wouldn't have real typeslots) and it still seems like a less than
>> stellar idea.
>>
>> As another use case for solid, efficient generic function support
>> though... great idea :)
>
> Could a generic function mechanism be made to have any
> less overhead, though?

See my response to Antoine - probably not. Although, as has been
pointed out by others, by doing the check for PyFloat_CheckExact early
and running the fast path immediately if that check passes, you can
avoid most of the overhead in the common case, even when using
pseudo-typeslots. So performance impact likely isn't a major factor
here after all.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From denis.spir at gmail.com  Tue Sep 28 10:27:07 2010
From: denis.spir at gmail.com (spir)
Date: Tue, 28 Sep 2010 10:27:07 +0200
Subject: [Python-ideas] multiline string notation
Message-ID: <20100928102707.5b3467ac@o>

Hello,



multiline string

By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well:



-1- no need for a separate multiline string notation

A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote.
No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this?



-2- trimming of indentation

On my computer, calling the following function:
    def write():
        if True:
            print """To be or not to be,
            that is the question."""
results in the following output:
    |To be or not to be,
    |        that is the question.
This is certainly not the programmer's intent. To get what is expected, one should write instead:
    def write():
        if True:
            print """To be or not to be,
    that is the question."""
...which distorts the visual presentation of code by breaking correct indentation.
To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like:
    def write():
        if True:
            print "To be or not to be,\n" + \
            "that is the question."
(Actually, the '+' can be here omitted, but this fact is not commonly known.)

My project uses a visual structure ? la python (and no curly braces). Indentation is removed by the arser from the significant part of code even inside strings (and also comments). This allows the programmer preserving clean source outline, while having multiline text be simply written as is. In other words, the following routine would work as you guess (':' is assignment sign):
    write : action
         if true
            terminal.write "To be or not to be,
            that is the question."

I imagine the python parser replaces indentation by block-delimiting tokens (analog in role to C braces). My language's parser thus has a preprocessing phase that would transform the above piece of code above to:
    write : action
    {
    if true
    {
    terminal.write "To be or not to be,
    that is the question."
    }
    }
The preprocess routine is actually easier than it would be with python rules, since one can trim indents systematically, without any exception for strings (and comments).



Thank you for reading,
Denis

(*) namely WML, scripting language of the game called Wesnoth
(**) This is true for 1-pass parsers (like PEG), as well as for 2-pass ones (with separate lexical phase).
-- -- -- -- -- -- --
vit esse estrany ?

spir.wikidot.com



From mwm-keyword-python.b4bdba at mired.org  Tue Sep 28 10:58:42 2010
From: mwm-keyword-python.b4bdba at mired.org (Mike Meyer)
Date: Tue, 28 Sep 2010 04:58:42 -0400
Subject: [Python-ideas] multiline string notation
In-Reply-To: <20100928102707.5b3467ac@o>
References: <20100928102707.5b3467ac@o>
Message-ID: <20100928045842.346bb9d0@bhuda.mired.org>

On Tue, 28 Sep 2010 10:27:07 +0200
spir <denis.spir at gmail.com> wrote:

> Hello,
> 
> 
> 
> multiline string
> 
> By recently studying a game scripting language (*), and designing a toy language of mine, I realised the following 2 facts, that may be relevant for python as well:
> 
> 
> 
> -1- no need for a separate multiline string notation
> 
> A single string format can deal text including newlines, without any syntactic or parsing (**) issue: a string notation just ends with the second quote.
> No idea why python introduced that distinction (and would like to know it); possibly for historic reason? The only advantage of """...""" seems to be that this format allows literal quotes in strings; am I right on this?

No, you're not. The ' form allows literal "'s, and vice versa. The
reason for the triple-quoted string is to allow simple multi-line
string literals.

The reason you want both single and multi-line string literals is so
the parser can properly flag the error line when you forget to
terminate the far more common single-line literal. Not as important
now that nearly everything does syntax coloring, but still a nice
feature.

> -2- trimming of indentation
> 
> On my computer, calling the following function:
>     def write():
>         if True:
>             print """To be or not to be,
>             that is the question."""
> results in the following output:
>     |To be or not to be,
>     |        that is the question.
> This is certainly not the programmer's intent. To get what is expected, one should write instead:
>     def write():
>         if True:
>             print """To be or not to be,
>     that is the question."""
> ...which distorts the visual presentation of code by breaking correct indentation.
> To have a multiline text written on multiple lines and preserve indentation, one needs to use more complicated forms like:
>     def write():
>         if True:
>             print "To be or not to be,\n" + \
>             "that is the question."
> (Actually, the '+' can be here omitted, but this fact is not commonly known.)

And in 3.x, where print is a function instead of a statement, it could
be (leaving off the optional "+"):

def write():
  if True:
     print("To be or not to be,\n"
           "that is the question.")

So -1 for this idea.

       <mike


-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


From ncoghlan at gmail.com  Tue Sep 28 12:49:04 2010
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 28 Sep 2010 20:49:04 +1000
Subject: [Python-ideas] multiline string notation
In-Reply-To: <20100928102707.5b3467ac@o>
References: <20100928102707.5b3467ac@o>
Message-ID: <AANLkTi=QBwgv4Fg7guhZeup=4hRk=W0NSq51YSf9jJqE@mail.gmail.com>

These two questions are ones where good arguments can be made in both
directions.

Having explicit notation for multi-line strings is primarily a benefit
for readability and error detection. The readability benefit is that
it flags to the reader that the next string literal may cover several
lines. As Mike noted, the error detection benefit is that the parser
can more readily detect a missing end-quote from a normal string
instead of inadvertently treating the entire rest of the file as part
of the string and giving a relatively useless error regarding EOF
while parsing a string.

Stripping leading whitespace even inside strings is potentially
convenient for the programmer, but breaks the tokenisation stream.
String literals are meant to be atomic. Having the parser digging
inside them to declare certain whitespace to not be part of the string
despite its presence in the source code is certainly a valid design
choice a language could make when defining its grammar, but would
actually be a fairly significant change for Python.

For Python, these two rules are a case of "status quo wins a
stalemate". Changing Python's behaviour in this area would be
difficult and time-consuming for negligible benefit, so it really
isn't worth doing.

Cheers,
Nick.

-- 
Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia


From taleinat at gmail.com  Tue Sep 28 12:57:08 2010
From: taleinat at gmail.com (Tal Einat)
Date: Tue, 28 Sep 2010 12:57:08 +0200
Subject: [Python-ideas] multiline string notation
In-Reply-To: <20100928102707.5b3467ac@o>
References: <20100928102707.5b3467ac@o>
Message-ID: <AANLkTimri80Rq0-C_VcyeNj7vzYs0HK3PhfeFUz48ssM@mail.gmail.com>

>
> -2- trimming of indentation
>
> On my computer, calling the following function:
>    def write():
>        if True:
>            print """To be or not to be,
>            that is the question."""
> results in the following output:
>    |To be or not to be,
>    |        that is the question.
> This is certainly not the programmer's intent. To get what is expected, one
> should write instead:
>    def write():
>        if True:
>            print """To be or not to be,
>    that is the question."""
> ...which distorts the visual presentation of code by breaking correct
> indentation.
> To have a multiline text written on multiple lines and preserve
> indentation, one needs to use more complicated forms like:
>    def write():
>        if True:
>            print "To be or not to be,\n" + \
>            "that is the question."
> (Actually, the '+' can be here omitted, but this fact is not commonly
> known.)
>
>
Have you heard of textwrap.dedent()? I usually would write this as:

def write():
    if True:
        print textwrap.dedent("""\
            To be or not to be,
            that is the question.""")

- Tal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100928/fc2fcd11/attachment.html>

From solipsis at pitrou.net  Tue Sep 28 14:57:04 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Sep 2010 14:57:04 +0200
Subject: [Python-ideas] Prefetching on buffered IO files
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTin6UQ73yH3DFrP8s_Wswwq0qdODH=i+en8_qZyW@mail.gmail.com>
Message-ID: <20100928145704.2fb2e382@pitrou.net>


Hello,

(moved to python-ideas)

On Mon, 27 Sep 2010 17:39:45 -0700
Guido van Rossum <guido at python.org> wrote:
> On Mon, Sep 27, 2010 at 3:41 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > While trying to solve #3873 (poor performance of pickle on file
> > objects, due to the overhead of calling read() with very small values),
> > it occurred to me that the prefetching facilities offered by
> > BufferedIOBase are not flexible and efficient enough.
> 
> I haven't read the whole bug but there seem to be lots of different
> smaller issues there, right?

The bug entry is quite old and at first the slowness had to do with the
pure Python IO layer. Now the remaining performance difference with
Python 2 is entirely caused by the following core issue:

> It seems that one (unfortunate)
> constraint is that reading pickles cannot use buffered I/O (at least
> not on a non-seekable file) because the API has been documented to
> leave the file positioned right after the last byte of the pickled
> data, right?

Right.

> > Indeed, if you use seek() and read(), 1) you limit yourself to seekable
> > files 2) performance can be hampered by very bad seek() performance
> > (this is true on GzipFile).
> 
> Ow... I've always assumed that seek() is essentially free, because
> that's how a typical OS kernel implements it. If seek() is bad on
> GzipFile, how hard would it be to fix this?

The worst case is backwards seeks. Forward seeks are implemented as a
simply read(), which makes them O(k) where k is the displacement. For
buffering applications where k is bounded by the buffer size, it is
O(1) (still with, of course, a non-trivial multiplier).

Backwards seeks are implemented as rewinding the whole file (seek(0))
and then reading again up to the requested position, which makes them
O(n) with n the absolute target position. When your requirement is to
rewind by a bounded number of bytes in order to undo some readahead,
this is rather catastrophic.

I don't know how the gzip algorithm works under the hood; my impression
is that optimizing backwards seeks would have us save us checkpoints of
the decompressor state and restore it if needed. It doesn't sound like a
trivial improvement, and would involve tradeoffs w.r.t. to
performance of sequential reads.

  (I haven't looked at BZ2File, which has a totally different -- and
  outdated -- implementation)

It's why I would favour the peek() (or peek()-like, as in the prefetch()
idea) approach anyway. Not only it works on unseekable files, but
implementing peek() when you have an internal buffer is quite simple
(see GzipFile.peek here: http://bugs.python.org/issue9962).

peek() could also be added to BytesIO even though it claims to
implement RawIOBase rather than BufferedIOBase.
(buf of course, when you have a BytesIO, you can simply feed its
getvalue() or getbuffer() directly to pickle.loads)

> How common is the use case where you need to read a gzipped pickle
> *and* you need to leave the unzipped stream positioned exactly at the
> end of the pickle?

I really don't know. But I don't think we can break the API for a
special case without potentially causing nasty surprises for the user.

Also, my intuition is that pickling directly from a stream is partly
meant for cases where you want to access data following the pickle
data in the stream.

> > If instead you use peek() and read(), the situation is better, but you
> > end up doing multiple copies of data; also, you must call read() to
> > advance the file pointer even though you don't care about the results.
> 
> Have you measured how bad the situation is if you do implement it this way?

It is actually quite good compared to the statu quo (3x to 10x), and as
good as the seek/read solution for regular files (and, of course, much
better for gzipped files once GzipFile.peek is implemented):
http://bugs.python.org/issue3873#msg117483

So, for solving the unpickle performance issue, it is sufficient.
Chances are the bottleneck for further improvements would be in the
unpickling logic itself. It feels a bit clunky, though.

Direct timing shows that peek()+read() has a non-trivial cost compared
to read():

$ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \
  "while f.read(4096): pass"
1000 loops, best of 3: 277 usec per loop
$ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \
  "while f.read(4096): f.peek(4096)"
1000 loops, best of 3: 361 usec per loop

(that's on a C extension type where peek() is almost a single call to
PyBytes_FromStringAndSize)

> > So I would propose adding the following method to BufferedIOBase:
> >
> > prefetch(self, buffer, skip, minread)
> >
> > Skip `skip` bytes from the stream. ?Then, try to read at
> > least `minread` bytes and write them into `buffer`. The file
> > pointer is advanced by at most `skip + minread`, or less if
> > the end of file was reached. The total number of bytes written
> > in `buffer` is returned, which can be more than `minread`
> > if additional bytes could be prefetched (but, of course,
> > cannot be more than `len(buffer)`).
> >
> > Arguments:
> > - `buffer`: a writable buffer (e.g. bytearray)
> > - `skip`: number of bytes to skip (must be >= 0)
> > - `minread`: number of bytes to read (must be >= 0 and <= len(buffer))
> 
> I like the idea of an API that combines seek and read into a mutable
> buffer. However the semantics of this call seem really weird: there is
> no direct relationship between where it leaves the stream position and
> how much data it reads into the buffer. can you explain how exactly
> this will help solve the gzipped pickle performance problem?

The general idea with buffering is that:
- you want to skip the previously prefetched bytes (through peek()
  or prefetch()) which have been consumed -> hence the `skip` argument
- you want to consume a known number of bytes from the stream (for
  example a 4-bytes little-endian integer) -> hence the `minread`
  argument
- you would like to prefetch some more bytes if cheaply possible, so as
  to avoid calling read() or prefetch() too much; but you don't know
  yet if you will consume those bytes, so the file pointer shouldn't be
  advanced for them

If you don't prefetch more than the minimum needed amount of bytes, you
don't solve the performance problem at all (unpickling needs many tiny
reads). If you advance the file pointer after the whole prefetched data
(even though it might not be entirely consumed), you need to seek()
back at the end: it doesn't work on unseekable files, and is very slow
on some seekable file types.

So, the proposal is like a combination of forward seek() + read() +
peek() in a single call. With the advantages that:
- it works on non-seekable files (things like SocketIO)
- it allows the caller to operate in its own buffer (this is nice in C)
- it returns the data naturally concatenated, so you don't have to do
  it yourself if needed
- it gives more guarantees than peek() as to the min and max number of
  bytes returned; peek(), as it is not allowed to advance the file
  pointer, can return as little as 1 byte (even if you ask for 4096,
  and even if EOF isn't reached)

I also find it interesting that implementing a single primitive be
enough for creating custom buffered types (by deriving other methods
from it), but the aesthetics of this can be controversial.

Regards

Antoine.




From guido at python.org  Tue Sep 28 16:08:08 2010
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Sep 2010 07:08:08 -0700
Subject: [Python-ideas] Prefetching on buffered IO files
In-Reply-To: <20100928145704.2fb2e382@pitrou.net>
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTin6UQ73yH3DFrP8s_Wswwq0qdODH=i+en8_qZyW@mail.gmail.com>
	<20100928145704.2fb2e382@pitrou.net>
Message-ID: <AANLkTi=fggmgyr3yXaFrD5f0cbFprejnGZA1vonLA-Z7@mail.gmail.com>

On Tue, Sep 28, 2010 at 5:57 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 27 Sep 2010 17:39:45 -0700
> Guido van Rossum <guido at python.org> wrote:
>> On Mon, Sep 27, 2010 at 3:41 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> > While trying to solve #3873 (poor performance of pickle on file
>> > objects, due to the overhead of calling read() with very small values),
>> > it occurred to me that the prefetching facilities offered by
>> > BufferedIOBase are not flexible and efficient enough.
>>
>> I haven't read the whole bug but there seem to be lots of different
>> smaller issues there, right?
>
> The bug entry is quite old and at first the slowness had to do with the
> pure Python IO layer. Now the remaining performance difference with
> Python 2 is entirely caused by the following core issue:
>
>> It seems that one (unfortunate)
>> constraint is that reading pickles cannot use buffered I/O (at least
>> not on a non-seekable file) because the API has been documented to
>> leave the file positioned right after the last byte of the pickled
>> data, right?
>
> Right.
>
>> > Indeed, if you use seek() and read(), 1) you limit yourself to seekable
>> > files 2) performance can be hampered by very bad seek() performance
>> > (this is true on GzipFile).
>>
>> Ow... I've always assumed that seek() is essentially free, because
>> that's how a typical OS kernel implements it. If seek() is bad on
>> GzipFile, how hard would it be to fix this?
>
> The worst case is backwards seeks. Forward seeks are implemented as a
> simply read(), which makes them O(k) where k is the displacement. For
> buffering applications where k is bounded by the buffer size, it is
> O(1) (still with, of course, a non-trivial multiplier).
>
> Backwards seeks are implemented as rewinding the whole file (seek(0))
> and then reading again up to the requested position, which makes them
> O(n) with n the absolute target position. When your requirement is to
> rewind by a bounded number of bytes in order to undo some readahead,
> this is rather catastrophic.
>
> I don't know how the gzip algorithm works under the hood; my impression
> is that optimizing backwards seeks would have us save us checkpoints of
> the decompressor state and restore it if needed. It doesn't sound like a
> trivial improvement, and would involve tradeoffs w.r.t. to
> performance of sequential reads.
>
> ?(I haven't looked at BZ2File, which has a totally different -- and
> ?outdated -- implementation)
>
> It's why I would favour the peek() (or peek()-like, as in the prefetch()
> idea) approach anyway. Not only it works on unseekable files, but
> implementing peek() when you have an internal buffer is quite simple
> (see GzipFile.peek here: http://bugs.python.org/issue9962).
>
> peek() could also be added to BytesIO even though it claims to
> implement RawIOBase rather than BufferedIOBase.
> (buf of course, when you have a BytesIO, you can simply feed its
> getvalue() or getbuffer() directly to pickle.loads)
>
>> How common is the use case where you need to read a gzipped pickle
>> *and* you need to leave the unzipped stream positioned exactly at the
>> end of the pickle?
>
> I really don't know. But I don't think we can break the API for a
> special case without potentially causing nasty surprises for the user.
>
> Also, my intuition is that pickling directly from a stream is partly
> meant for cases where you want to access data following the pickle
> data in the stream.
>
>> > If instead you use peek() and read(), the situation is better, but you
>> > end up doing multiple copies of data; also, you must call read() to
>> > advance the file pointer even though you don't care about the results.
>>
>> Have you measured how bad the situation is if you do implement it this way?
>
> It is actually quite good compared to the statu quo (3x to 10x), and as
> good as the seek/read solution for regular files (and, of course, much
> better for gzipped files once GzipFile.peek is implemented):
> http://bugs.python.org/issue3873#msg117483
>
> So, for solving the unpickle performance issue, it is sufficient.
> Chances are the bottleneck for further improvements would be in the
> unpickling logic itself. It feels a bit clunky, though.
>
> Direct timing shows that peek()+read() has a non-trivial cost compared
> to read():
>
> $ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \
> ?"while f.read(4096): pass"
> 1000 loops, best of 3: 277 usec per loop
> $ ./python -m timeit -s "f=open('Misc/HISTORY', 'rb')" "f.seek(0)" \
> ?"while f.read(4096): f.peek(4096)"
> 1000 loops, best of 3: 361 usec per loop
>
> (that's on a C extension type where peek() is almost a single call to
> PyBytes_FromStringAndSize)
>
>> > So I would propose adding the following method to BufferedIOBase:
>> >
>> > prefetch(self, buffer, skip, minread)
>> >
>> > Skip `skip` bytes from the stream. ?Then, try to read at
>> > least `minread` bytes and write them into `buffer`. The file
>> > pointer is advanced by at most `skip + minread`, or less if
>> > the end of file was reached. The total number of bytes written
>> > in `buffer` is returned, which can be more than `minread`
>> > if additional bytes could be prefetched (but, of course,
>> > cannot be more than `len(buffer)`).
>> >
>> > Arguments:
>> > - `buffer`: a writable buffer (e.g. bytearray)
>> > - `skip`: number of bytes to skip (must be >= 0)
>> > - `minread`: number of bytes to read (must be >= 0 and <= len(buffer))
>>
>> I like the idea of an API that combines seek and read into a mutable
>> buffer. However the semantics of this call seem really weird: there is
>> no direct relationship between where it leaves the stream position and
>> how much data it reads into the buffer. can you explain how exactly
>> this will help solve the gzipped pickle performance problem?
>
> The general idea with buffering is that:
> - you want to skip the previously prefetched bytes (through peek()
> ?or prefetch()) which have been consumed -> hence the `skip` argument
> - you want to consume a known number of bytes from the stream (for
> ?example a 4-bytes little-endian integer) -> hence the `minread`
> ?argument
> - you would like to prefetch some more bytes if cheaply possible, so as
> ?to avoid calling read() or prefetch() too much; but you don't know
> ?yet if you will consume those bytes, so the file pointer shouldn't be
> ?advanced for them
>
> If you don't prefetch more than the minimum needed amount of bytes, you
> don't solve the performance problem at all (unpickling needs many tiny
> reads). If you advance the file pointer after the whole prefetched data
> (even though it might not be entirely consumed), you need to seek()
> back at the end: it doesn't work on unseekable files, and is very slow
> on some seekable file types.
>
> So, the proposal is like a combination of forward seek() + read() +
> peek() in a single call. With the advantages that:
> - it works on non-seekable files (things like SocketIO)
> - it allows the caller to operate in its own buffer (this is nice in C)
> - it returns the data naturally concatenated, so you don't have to do
> ?it yourself if needed
> - it gives more guarantees than peek() as to the min and max number of
> ?bytes returned; peek(), as it is not allowed to advance the file
> ?pointer, can return as little as 1 byte (even if you ask for 4096,
> ?and even if EOF isn't reached)
>
> I also find it interesting that implementing a single primitive be
> enough for creating custom buffered types (by deriving other methods
> from it), but the aesthetics of this can be controversial.

Thanks for the long explanation. I have some further questions:

It seems this won't make any difference for a truly unbuffered stream,
right? A truly unbuffered stream would not have a buffer where it
could save the bytes that were prefetched past the stream position, so
it wouldn't return any optional extra bytes, so there would be no
speedup. And for a buffered stream, it would be much simpler to just
read ahead in large chunks and seek back once you've found the end.
(Actually for a buffered stream I suppose that many short read() and
small seek() calls aren't actually slow since most of the time they
work within the buffer.) So it seems the API is specifically designed
to improve the situation with GzipFile since it maintains the fiction
of an unbuffered file but in fact has some internal buffer space. I
wonder if it wouldn't be better to add an extra buffer to GzipFile so
small seek() and read() calls can be made more efficient?

In fact, this makes me curious as to the use that unpickling can make
of the prefetch() call -- I suppose you had to implement some kind of
layer on top of prefetch() that behaves more like a plain unbuffered
file?

I want to push back on this more, primarily because a new primitive
I/O operation has high costs: it can never be removed, it has to be
added to every stream implementation, developers need to learn to use
the new operation, and so on. A local change that only affects
GzipFile doesn't have any of these problems.

Also, if you can believe the multi-core crowd, a very different
possible future development might be to run the gunzip algorithm and
the unpickle algorithm in parallel, on separate cores. Truly such a
solution would require totally *different* new I/O primitives, which
might have a higher chance of being reusable outside the context of
pickle.

-- 
--Guido van Rossum (python.org/~guido)


From daniel at stutzbachenterprises.com  Tue Sep 28 16:26:30 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Tue, 28 Sep 2010 09:26:30 -0500
Subject: [Python-ideas] [Python-Dev] Prefetching on buffered IO files
In-Reply-To: <20100928004119.3963a4ad@pitrou.net>
References: <20100928004119.3963a4ad@pitrou.net>
Message-ID: <AANLkTi=754pvRhwejtD0uVrb9=0+uD1JQR3JSfdjYC4-@mail.gmail.com>

On Mon, Sep 27, 2010 at 5:41 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> While trying to solve #3873 (poor performance of pickle on file
> objects, due to the overhead of calling read() with very small values),
>

After looking over the relevant code, it looks to me like the overhead of
calling the read() method compared to calling fread() in Python 2 is the
overhead of calling PyObject_Call along with the construction of argument
tuples and deconstruction of the return value.  I don't think the extra
interface would benefit code written in Python as much.  Even if  Python
code gets the data into a buffer more easily, it's going to pay those costs
to manipulate the buffered data.  It would mostly help modules written in C,
such as pickle, which right now are heavily bottlenecked getting the data
into a buffer.

Comparing the C code for Python 2's cPickle and Python 3's pickle, I see
that Python 2 has paths for unpickling from a FILE *, cStringIO, and
"other".  Python effectively only has a code path for "other", so it's not
surprising that it's slower.  In the worst case, I am sure that if we
re-added specialized code paths that we could make it just as fast as Python
2, although that would make the code messy.

Some ideas:
- Use readinto() instead of read(), to avoid extra allocations/deallocations
- But first, fix bufferediobase_readinto() so it doesn't work by calling the
read() method and/or follow up on the TODO in buffered_readinto()

If you want a new API, I think a new C API for I/O objects with C-friendly
arguments would be better than a new Python-level API.

In a nutshell, if you feel the need to make a buffer around BufferedReader,
then I agree there's a problem, but I don't think helping you make a buffer
around BufferedReader is the right solution. ;-)

-- 
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100928/d9a55613/attachment.html>

From solipsis at pitrou.net  Tue Sep 28 16:32:49 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Sep 2010 16:32:49 +0200
Subject: [Python-ideas] Prefetching on buffered IO files
In-Reply-To: <AANLkTi=fggmgyr3yXaFrD5f0cbFprejnGZA1vonLA-Z7@mail.gmail.com>
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTin6UQ73yH3DFrP8s_Wswwq0qdODH=i+en8_qZyW@mail.gmail.com>
	<20100928145704.2fb2e382@pitrou.net>
	<AANLkTi=fggmgyr3yXaFrD5f0cbFprejnGZA1vonLA-Z7@mail.gmail.com>
Message-ID: <1285684369.3141.22.camel@localhost.localdomain>

Le mardi 28 septembre 2010 ? 07:08 -0700, Guido van Rossum a ?crit :
> 
> Thanks for the long explanation. I have some further questions:
> 
> It seems this won't make any difference for a truly unbuffered stream,
> right? A truly unbuffered stream would not have a buffer where it
> could save the bytes that were prefetched past the stream position, so
> it wouldn't return any optional extra bytes, so there would be no
> speedup.

Indeed. But you can trivially wrap an unbuffered stream inside a
BufferedReader, and get peek() even when the raw stream is unseekable.

> And for a buffered stream, it would be much simpler to just
> read ahead in large chunks and seek back once you've found the end.

Well, no, only if your stream is seekable and seek() is fast enough.
So, it wouldn't work on SocketIO for example (even wrapped inside a
BufferedReader, since BufferedReader will refuse to seek() if seekable()
returns False).

> I
> wonder if it wouldn't be better to add an extra buffer to GzipFile so
> small seek() and read() calls can be made more efficient?

The problem is that, since the buffer of the unpickler and the buffer of
the GzipFile are not aware of each other, the unpickler could easily ask
to seek() backwards past the current GzipFile buffer, and fall back on
the slow algorithm.

The "extra buffer" can trivially consist in wrapping the GzipFile inside
a BufferedReader (which is actually recommended if you want e.g. very
fast readlines()), but it doesn't solve the above issue.

> In fact, this makes me curious as to the use that unpickling can make
> of the prefetch() call -- I suppose you had to implement some kind of
> layer on top of prefetch() that behaves more like a plain unbuffered
> file?

I didn't implement prefetch() at all. It would be prematurate :)
But, if the stream had prefetch(), the unpickling would be simplified: I
would only have to call prefetch() once when refilling the buffer,
rather than two read()'s followed by a peek().

(I could try to coalesce the two reads, but it would complicate the code
a bit more...)

> I want to push back on this more, primarily because a new primitive
> I/O operation has high costs: it can never be removed, it has to be
> added to every stream implementation, developers need to learn to use
> the new operation, and so on.

I agree with this (except that most developers don't really need to
learn to use it: common uses of readable files are content with read()
and readline(), and need neither peek() nor prefetch()). I don't intend
to push this for 3.2; I'm throwing the idea around with a hypothetical
3.3 landing if it seems useful.

> Also, if you can believe the multi-core crowd, a very different
> possible future development might be to run the gunzip algorithm and
> the unpickle algorithm in parallel, on separate cores. Truly such a
> solution would require totally *different* new I/O primitives, which
> might have a higher chance of being reusable outside the context of
> pickle.

Well, it's a bit of a pie-in-the-sky perspective :)
Furthermore, such a solution won't improve CPU efficiency, so if your
workload is already able to utilize all CPU cores (which it can easily
do if you are in a VM, or have multiple busy daemons), it doesn't bring
anything.

Regards

Antoine.




From solipsis at pitrou.net  Tue Sep 28 17:06:44 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Sep 2010 17:06:44 +0200
Subject: [Python-ideas] Prefetching on buffered IO files
In-Reply-To: <AANLkTi=754pvRhwejtD0uVrb9=0+uD1JQR3JSfdjYC4-@mail.gmail.com>
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTi=754pvRhwejtD0uVrb9=0+uD1JQR3JSfdjYC4-@mail.gmail.com>
Message-ID: <1285686404.3141.56.camel@localhost.localdomain>


> I don't think the extra interface would benefit code written in Python
> as much.  Even if  Python code gets the data into a buffer more
> easily, it's going to pay those costs to manipulate the buffered data.
> It would mostly help modules written in C, such as pickle, which right
> now are heavily bottlenecked getting the data into a buffer.

Right. It would, however, benefit /file objects/ written in Python
(since the cost of calling a peek() written in pure Python is certainly
significant compared to the cost of the actual peeking operation).

> - But first, fix bufferediobase_readinto() so it doesn't work by
> calling the read() method and/or follow up on the TODO in
> buffered_readinto()

Patches welcome :)

> Comparing the C code for Python 2's cPickle and Python 3's pickle, I
> see that Python 2 has paths for unpickling from a FILE *, cStringIO,
> and "other".  Python effectively only has a code path for "other", so
> it's not surprising that it's slower.  In the worst case, I am sure
> that if we re-added specialized code paths that we could make it just
> as fast as Python 2, although that would make the code messy.

It would be very ugly, IMO. And it would still be slower than the clean
solution, which is to have a buffer size big enough that the overhead of
making a read() method call is dwarfed by the processing cost of the
data (that's how TextIOWrapper works).

(for the record, with the read()+peek() patch, unpickle is already
faster than Python 2, but that's comparing apples to oranges because
Python 3 got other unpickle optimizations in the meantime)

> If you want a new API, I think a new C API for I/O objects with
> C-friendly arguments would be better than a new Python-level API.

I really think we should keep an unified API. A low-level C API would be
difficult to get right, make implementations more complicated, and
consumers would have to keep fallback code for objects not implementing
the C API, which would complicate things on their side too.

Conversely, one purpose of my prefetch() proposal, besides optimizing
some workloads, is to *simplify* writing of buffered IO code.

> In a nutshell, if you feel the need to make a buffer around
> BufferedReader, then I agree there's a problem, but I don't think
> helping you make a buffer around BufferedReader is the right
> solution. ;-)

In a layered approach, it's hard not to end up with multiple levels of
buffering (think TextIOWrapper + BufferedReader + OS page-level
caching) :) I agree that shared buffers sound more efficient but, again,
I fear they would be a lot of work to get right. If you look at the
BufferedReader code, it's already non-trivial, and bugs in this area can
be really painful.

Regards

Antoine.




From daniel at stutzbachenterprises.com  Tue Sep 28 17:19:51 2010
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Tue, 28 Sep 2010 10:19:51 -0500
Subject: [Python-ideas] Prefetching on buffered IO files
In-Reply-To: <1285686404.3141.56.camel@localhost.localdomain>
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTi=754pvRhwejtD0uVrb9=0+uD1JQR3JSfdjYC4-@mail.gmail.com>
	<1285686404.3141.56.camel@localhost.localdomain>
Message-ID: <AANLkTikfCMxm-DkhEQitUDwxkh-N7HWg=XpR8QozpDNn@mail.gmail.com>

On Tue, Sep 28, 2010 at 10:06 AM, Antoine Pitrou <solipsis at pitrou.net>wrote:

> > - But first, fix bufferediobase_readinto() so it doesn't work by
> > calling the read() method and/or follow up on the TODO in
> > buffered_readinto()
>
> Patches welcome :)


I'm not likely to get to it soon, but I've opened Issue 9971 to at least
keep track of it.

-- 
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20100928/51533583/attachment.html>

From guido at python.org  Tue Sep 28 18:44:38 2010
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Sep 2010 09:44:38 -0700
Subject: [Python-ideas] Prefetching on buffered IO files
In-Reply-To: <1285684369.3141.22.camel@localhost.localdomain>
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTin6UQ73yH3DFrP8s_Wswwq0qdODH=i+en8_qZyW@mail.gmail.com>
	<20100928145704.2fb2e382@pitrou.net>
	<AANLkTi=fggmgyr3yXaFrD5f0cbFprejnGZA1vonLA-Z7@mail.gmail.com>
	<1285684369.3141.22.camel@localhost.localdomain>
Message-ID: <AANLkTikB+=zPUdYkPJeCq917-muL4iitfDwf+W_KeTU6@mail.gmail.com>

On Tue, Sep 28, 2010 at 7:32 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
[Guido]
>> wonder if it wouldn't be better to add an extra buffer to GzipFile so
>> small seek() and read() calls can be made more efficient?
>
> The problem is that, since the buffer of the unpickler and the buffer of
> the GzipFile are not aware of each other, the unpickler could easily ask
> to seek() backwards past the current GzipFile buffer, and fall back on
> the slow algorithm.

But AFAICT unpickle doesn't use seek()?

[...]
> But, if the stream had prefetch(), the unpickling would be simplified: I
> would only have to call prefetch() once when refilling the buffer,
> rather than two read()'s followed by a peek().
>
> (I could try to coalesce the two reads, but it would complicate the code
> a bit more...)

Where exactly would the peek be used? (I must be confused because I
can't find either peek or seek in _pickle.c.)

It still seems to me that the "right" way to solve this would be to
insert a transparent extra buffer somewhere, probably in the GzipFile
code, and work in reducing the call overhead.

>> I want to push back on this more, primarily because a new primitive
>> I/O operation has high costs: it can never be removed, it has to be
>> added to every stream implementation, developers need to learn to use
>> the new operation, and so on.
>
> I agree with this (except that most developers don't really need to
> learn to use it: common uses of readable files are content with read()
> and readline(), and need neither peek() nor prefetch()). I don't intend
> to push this for 3.2; I'm throwing the idea around with a hypothetical
> 3.3 landing if it seems useful.

So far it seems more awkward than useful.

>> Also, if you can believe the multi-core crowd, a very different
>> possible future development might be to run the gunzip algorithm and
>> the unpickle algorithm in parallel, on separate cores. Truly such a
>> solution would require totally *different* new I/O primitives, which
>> might have a higher chance of being reusable outside the context of
>> pickle.
>
> Well, it's a bit of a pie-in-the-sky perspective :)
> Furthermore, such a solution won't improve CPU efficiency, so if your
> workload is already able to utilize all CPU cores (which it can easily
> do if you are in a VM, or have multiple busy daemons), it doesn't bring
> anything.

Agreed it's pie in the sky... Though the interface between the two
CPUs might actually be designed to be faster than the current buffered
I/O. I have (mostly :-) fond memories of async I/O on a mainframe I
used in the '70s which worked this way.

-- 
--Guido van Rossum (python.org/~guido)


From solipsis at pitrou.net  Tue Sep 28 22:33:39 2010
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 28 Sep 2010 22:33:39 +0200
Subject: [Python-ideas] Prefetching on buffered IO files
References: <20100928004119.3963a4ad@pitrou.net>
	<AANLkTin6UQ73yH3DFrP8s_Wswwq0qdODH=i+en8_qZyW@mail.gmail.com>
	<20100928145704.2fb2e382@pitrou.net>
	<AANLkTi=fggmgyr3yXaFrD5f0cbFprejnGZA1vonLA-Z7@mail.gmail.com>
	<1285684369.3141.22.camel@localhost.localdomain>
	<AANLkTikB+=zPUdYkPJeCq917-muL4iitfDwf+W_KeTU6@mail.gmail.com>
Message-ID: <20100928223339.3f621915@pitrou.net>

On Tue, 28 Sep 2010 09:44:38 -0700
Guido van Rossum <guido at python.org> wrote:
> 
> But AFAICT unpickle doesn't use seek()?
> 
> [...]
> > But, if the stream had prefetch(), the unpickling would be simplified: I
> > would only have to call prefetch() once when refilling the buffer,
> > rather than two read()'s followed by a peek().
> >
> > (I could try to coalesce the two reads, but it would complicate the code
> > a bit more...)
> 
> Where exactly would the peek be used? (I must be confused because I
> can't find either peek or seek in _pickle.c.)

peek/seek are not used currently (in SVN). Each of them is used in
one of the prefetching approaches proposed to solve the unpickling
performance problem.

(the first approach uses seek() and read(), the second approach uses
read() and peek(); as already explained, I tend to consider the second
approach much better, and the prefetch() proposal comes in part from the
experience gathered on that approach)

> It still seems to me that the "right" way to solve this would be to
> insert a transparent extra buffer somewhere, probably in the GzipFile
> code, and work in reducing the call overhead.

No, because if you don't have any buffering on the unpickling side
(rather than the GzipFile or the BufferedReader side), then you still
have the method call overhead no matter what. And this overhead is
rather big when you're reading data byte per byte, or word per word
(which unpickling very frequently does).

(for the record, GzipFile already has an internal buffer. But calling
GzipFile.read() still has a large overhead compared to reading
data directly from a prefetch buffer inside the unpickler object)

Regards

Antoine.