From jimjjewett at gmail.com  Thu Feb  1 00:06:26 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 31 Jan 2007 18:06:26 -0500
Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k
In-Reply-To: <20070131133737.5A61.JCARLSON@uci.edu>
References: <20070131122426.5A5C.JCARLSON@uci.edu>
	<fb6fbf560701311314s6cb16ba4l263c156de5c331f2@mail.gmail.com>
	<20070131133737.5A61.JCARLSON@uci.edu>
Message-ID: <fb6fbf560701311506j49fccb16k4c4a435c1446be34@mail.gmail.com>

On 1/31/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Jim Jewett" <jimjjewett at gmail.com> wrote:
> > On 1/31/07, Josiah Carlson <jcarlson at uci.edu> wrote:

> > > Do you remember my "string view" post from last September/October or so?
> > > It implemented almost all of the string API exactly as the string API
> > > did, except that rather than returning strings, it returned views.

> > So there would be places where you couldn't safely use it, even though
> > it had all the required functionality.

> Almost certainly, but the point is that you could get back to what you
> wanted via str(obj), unicode(obj), etc., which would incur (in the worst
> case) the overhead you saved before, or raise a MemoryError exception
> (unless its linux, in which case you will likely segfault).

> > How would you feel if it also

> > (1)  Claimed to be a subclass of str (though it might not actually
> > inherit anything)
> > (2)  Implemented the rest of the methods by delegation.  (Call str on
> > itself, switch its "real" object to the new string, and delegate to
> > that.)

> I'm not terribly concerned about the implementation details of an object
> I don't need to use.  As long as it works, it is fine.  I am concerned
> about the implementation details of objects I will use.

The reason to ask for these is that then you could use it anywhere a
str could be used (unless they explicitly did CheckExact).  Since the
object itself would be in charge of creating a "normal" str when
needed, you wouldn't have to do it pre-emptively before passing it to
a library.

> I believe the base type included with Python should allocate the memory
> on creation.  Why?  Because the implementation is simple, and I believe
> that a base type implementation should be as simple as possible.

Do you think it should happen to do that as an implementation detail,
or that it should *promise* to do so, and bind all string-alikes to
the same promise?

-jJ

From jcarlson at uci.edu  Thu Feb  1 01:18:17 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 31 Jan 2007 16:18:17 -0800
Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k
In-Reply-To: <fb6fbf560701311506j49fccb16k4c4a435c1446be34@mail.gmail.com>
References: <20070131133737.5A61.JCARLSON@uci.edu>
	<fb6fbf560701311506j49fccb16k4c4a435c1446be34@mail.gmail.com>
Message-ID: <20070131160422.5A6D.JCARLSON@uci.edu>


"Jim Jewett" <jimjjewett at gmail.com> wrote:
> 
> On 1/31/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > "Jim Jewett" <jimjjewett at gmail.com> wrote:
> > > On 1/31/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> 
> > > > Do you remember my "string view" post from last September/October or so?
> > > > It implemented almost all of the string API exactly as the string API
> > > > did, except that rather than returning strings, it returned views.
> 
> > > So there would be places where you couldn't safely use it, even though
> > > it had all the required functionality.
> 
> > Almost certainly, but the point is that you could get back to what you
> > wanted via str(obj), unicode(obj), etc., which would incur (in the worst
> > case) the overhead you saved before, or raise a MemoryError exception
> > (unless its linux, in which case you will likely segfault).
> 
> > > How would you feel if it also
> 
> > > (1)  Claimed to be a subclass of str (though it might not actually
> > > inherit anything)
> > > (2)  Implemented the rest of the methods by delegation.  (Call str on
> > > itself, switch its "real" object to the new string, and delegate to
> > > that.)
> 
> > I'm not terribly concerned about the implementation details of an object
> > I don't need to use.  As long as it works, it is fine.  I am concerned
> > about the implementation details of objects I will use.
> 
> The reason to ask for these is that then you could use it anywhere a
> str could be used (unless they explicitly did CheckExact).  Since the
> object itself would be in charge of creating a "normal" str when
> needed, you wouldn't have to do it pre-emptively before passing it to
> a library.

Certainly, but a well-behaved C extension or library should be using the
single segment buffer interface anyways, to allow for the passing of
array.array, numpy.array, buffer(...), etc.  For views, this works great,
you just return a pointer and length into the original object.  For
concatenation objects, one needs to render the string, but that is
expected.

For reference, I have written quite a bit of code that expects
string-like things to be passed to C extensions, and by using the buffer
interface, I have been able to use str, array and mmap instances
interchangeably depending on what I want as a result, or what I'm using
as temporary memory.


> > I believe the base type included with Python should allocate the memory
> > on creation.  Why?  Because the implementation is simple, and I believe
> > that a base type implementation should be as simple as possible.
> 
> Do you think it should happen to do that as an implementation detail,
> or that it should *promise* to do so, and bind all string-alikes to
> the same promise?

What subtypes do are their own business.  I only have an opinion for the
str and unicode types allocating memory on creation.  Aside from being
simpler, it keeps the base types from delaying a MemoryError in low
memory conditions.  An implementation of string views (or concatenations)
that is a subclass of the string type, which defers creation until
necessary (or never), is perfectly reasonable. Whether it is in the
stdlib or 3rd party, I don't care.  (I swear, this is at least the
second or third time I've said this)


 - Josiah


From ntoronto at cs.byu.edu  Thu Feb  1 03:53:49 2007
From: ntoronto at cs.byu.edu (Neil Toronto)
Date: Wed, 31 Jan 2007 19:53:49 -0700
Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k
In-Reply-To: <1cb725390701311318n21f57a7et95aef7d41dd8f130@mail.gmail.com>
References: <45C07234.8070808@hastings.org>
	<1cb725390701311318n21f57a7et95aef7d41dd8f130@mail.gmail.com>
Message-ID: <45C1563D.4070902@cs.byu.edu>

Paul Prescod wrote:
> String concatenation is a known issue in Python programming and
> workarounds for it are common obfuscations in a language otherwise
> famous for being clean. So I vote +1 on it. I abstain on slicing.
>   

Seconded: +1 on concatenation, no opinion on the rest. It'd be great to 
retire the ''.join(my_big_list_of_strings) idiom.

Neil


From aahz at pythoncraft.com  Thu Feb  1 04:02:20 2007
From: aahz at pythoncraft.com (Aahz)
Date: Wed, 31 Jan 2007 19:02:20 -0800
Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k
In-Reply-To: <45C07234.8070808@hastings.org>
References: <45C07234.8070808@hastings.org>
Message-ID: <20070201030220.GA9206@panix.com>

On Wed, Jan 31, 2007, Larry Hastings wrote:
>
> I'd like to start a (hopefully final) round of discussion on the "lazy
> strings" series of patches.  What follows is a summary on the current
> state of the patches, followed by five poll questions.

While I don't have an opinion about the patch itself, I do have an
opinion about other people's opinions.  ;-)  That is, my opinion is that
unless you get a +1 from at least one of Fredrik, MvL, or MAL (and no -1
from any of them), this patch should be abandoned.  (The exact set of
developers doesn't matter, though you should be focused on people with
commits in unicodeobject.c, and I'd recommend that Fredrik or MvL be on
that list regardless.)
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"I disrespectfully agree."  --SJM

From larry at hastings.org  Thu Feb  1 10:51:01 2007
From: larry at hastings.org (Larry Hastings)
Date: Thu, 01 Feb 2007 01:51:01 -0800
Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k
In-Reply-To: <20070201030220.GA9206@panix.com>
References: <45C07234.8070808@hastings.org> <20070201030220.GA9206@panix.com>
Message-ID: <45C1B805.5030306@hastings.org>

Aahz wrote:
> While I don't have an opinion about the patch itself, I do have an
> opinion about other people's opinions.  ;-)  That is, my opinion is that
> unless you get a +1 from at least one of Fredrik, MvL, or MAL (and no -1
> from any of them), this patch should be abandoned.  (The exact set of
> developers doesn't matter, though you should be focused on people with
> commits in unicodeobject.c, and I'd recommend that Fredrik or MvL be on
> that list regardless.)
>   

I should focus how?  With offers of cash rewards?

I'm happy to field questions from anybody, on the list or via email.  
I'm sure all those folks are as aware of this thread as they need to 
be.  Beyond that I don't see how I can affect if or when they render a vote.

Not-that-cash-rewards-are-out-of-the-question-ly,


/larry/

From tomerfiliba at gmail.com  Thu Feb  1 12:43:03 2007
From: tomerfiliba at gmail.com (tomer filiba)
Date: Thu, 1 Feb 2007 13:43:03 +0200
Subject: [Python-3000] the types module
Message-ID: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com>

i've had some difficulty with code that attempts to locate a type
by its __module__ and __name__, something like:
    getattr(sys.modules[t.__module__], t.__name__)

the trouble is, all builtin types claim to belong to the __builtin__ module.

for example:
    >>> types.FunctionType
    <type 'function'>
    >>> types.FunctionType.__name__
    'funcrtion'
    >>> types.FunctionType.__module__
    '__builtin__'

but --
    >>> __builtin__.function
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: 'module' object has no attribute 'function'

most, but not all, of the types are exposed in __builtin__... this required
me to create an artificial mapping in which "__builtin__.function" is mapped

to types.FunctionType, and then use this mapping instead of sys.modules,
which adds more special cases on my part.

on the other hand, the exceptions module works differently. all builtin
exceptions are defined in the exceptions module, but are exposed
through __builtin__:
    >>> EOFError.__module__
    'exceptions'
    >>> exceptions.EOFError
    <type 'exceptions.EOFError'>
    >>> __builtin__.EOFError
    <type 'exceptions.EOFError'>

so i thought why not do the same with all builtin types? currently the
types module (types.py) exposes some type objects (not all), and uses
witchcraft to obtain them:
    try:
        raise TypeError
    except TypeError:
        tb = sys.exc_info()[2]
        TracebackType = type(tb)
        FrameType = type(tb.tb_frame)

instead, let's make it a builtin module, in which all types will be defined;
the useful types (int, str, ...) would be exposed into __builtin__ (just as
the exceptions module does), while the less useful will be kept unexposed.

this would make FunctionType.__module__ == "types", rather than
"__builtin__",
which would allow me to fetch it by name from sys.modules.


-tomer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070201/162438b9/attachment.html 

From aahz at pythoncraft.com  Thu Feb  1 15:01:28 2007
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 1 Feb 2007 06:01:28 -0800
Subject: [Python-3000] Poll: Lazy Unicode Strings For Py3k
In-Reply-To: <45C1B805.5030306@hastings.org>
References: <45C07234.8070808@hastings.org> <20070201030220.GA9206@panix.com>
	<45C1B805.5030306@hastings.org>
Message-ID: <20070201140128.GA24639@panix.com>

On Thu, Feb 01, 2007, Larry Hastings wrote:
> Aahz wrote:
>>
>> While I don't have an opinion about the patch itself, I do have an
>> opinion about other people's opinions.  ;-)  That is, my opinion is that
>> unless you get a +1 from at least one of Fredrik, MvL, or MAL (and no -1
>> from any of them), this patch should be abandoned.  (The exact set of
>> developers doesn't matter, though you should be focused on people with
>> commits in unicodeobject.c, and I'd recommend that Fredrik or MvL be on
>> that list regardless.)
> 
> I should focus how?  With offers of cash rewards?
> 
> I'm happy to field questions from anybody, on the list or via email.
> I'm sure all those folks are as aware of this thread as they need to
> be.  Beyond that I don't see how I can affect if or when they render a
> vote.

Maybe they are and maybe they aren't -- people don't always pay full
attention to mailing lists.  MvL at least has a standing offer to review
patches if you review five other patches.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"I disrespectfully agree."  --SJM

From brett at python.org  Thu Feb  1 20:12:20 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 1 Feb 2007 11:12:20 -0800
Subject: [Python-3000] the types module
In-Reply-To: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com>
References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com>
Message-ID: <bbaeab100702011112m5d5c55cdl58fee6e5d7c2f8b1@mail.gmail.com>

On 2/1/07, tomer filiba <tomerfiliba at gmail.com> wrote:
> i've had some difficulty with code that attempts to locate a type
> by its __module__ and __name__, something like:
>     getattr(sys.modules[t.__module__], t.__name__)
>
> the trouble is, all builtin types claim to belong to the __builtin__ module.
> for example:
>     >>> types.FunctionType
>     <type 'function'>
>     >>> types.FunctionType.__name__
>     'funcrtion'
>      >>> types.FunctionType.__module__
>     '__builtin__'
>
> but --
>     >>> __builtin__.function
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     AttributeError: 'module' object has no attribute 'function'
>
> most, but not all, of the types are exposed in __builtin__... this required
> me to create an artificial mapping in which "__builtin__.function" is mapped
> to types.FunctionType, and then use this mapping instead of sys.modules,
> which adds more special cases on my part.
>

This has come up before on python-dev, IIRC.  Double-check the archives.

-Brett

From rhamph at gmail.com  Fri Feb  2 05:18:11 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Thu, 1 Feb 2007 21:18:11 -0700
Subject: [Python-3000] reference leak when pressing Enter at interpreter
	prompt
In-Reply-To: <epphqf$oro$1@sea.gmane.org>
References: <bbaeab100701291540k1e06fce3qa1553257f65dc54a@mail.gmail.com>
	<epphqf$oro$1@sea.gmane.org>
Message-ID: <aac2c7cb0702012018j36c6ed32wd1c559b4dad014d8@mail.gmail.com>

On 1/31/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Brett Cannon schrieb:
> > Seems two references are leaking every time you press Enter at the
> > interpreter prompt in a debug build.  Anyone have an inkling of who
> > introduced it?
>
> If anyone wants to look into it:
> It was rev. 53421, the merging of the long-int-unification branch.

long_richcompare doesn't Py_DECREF a and b allocated by CONVERT_BINOP.
 This exists in 53421 (and presumably earlier) by doing "1L == 2L" at
the interpreter prompt.  There might be another function or two with
the same bug.

-- 
Adam Olsen, aka Rhamphoryncus

From brett at python.org  Fri Feb  2 07:07:07 2007
From: brett at python.org (Brett Cannon)
Date: Thu, 1 Feb 2007 22:07:07 -0800
Subject: [Python-3000] reference leak when pressing Enter at interpreter
	prompt
In-Reply-To: <aac2c7cb0702012018j36c6ed32wd1c559b4dad014d8@mail.gmail.com>
References: <bbaeab100701291540k1e06fce3qa1553257f65dc54a@mail.gmail.com>
	<epphqf$oro$1@sea.gmane.org>
	<aac2c7cb0702012018j36c6ed32wd1c559b4dad014d8@mail.gmail.com>
Message-ID: <bbaeab100702012207y4a5671dfvfd889d3173b3149d@mail.gmail.com>

On 2/1/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 1/31/07, Georg Brandl <g.brandl at gmx.net> wrote:
> > Brett Cannon schrieb:
> > > Seems two references are leaking every time you press Enter at the
> > > interpreter prompt in a debug build.  Anyone have an inkling of who
> > > introduced it?
> >
> > If anyone wants to look into it:
> > It was rev. 53421, the merging of the long-int-unification branch.
>
> long_richcompare doesn't Py_DECREF a and b allocated by CONVERT_BINOP.
>  This exists in 53421 (and presumably earlier) by doing "1L == 2L" at
> the interpreter prompt.  There might be another function or two with
> the same bug.
>

Thanks for the debugging, Adam.  I personally don't have time right
now to dig in to verify and patch, but hopefully someone does.  Else I
will try to get to it at some point between now and the end of PyCon.

-Brett

From martin at v.loewis.de  Tue Feb  6 22:06:20 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 06 Feb 2007 22:06:20 +0100
Subject: [Python-3000] reference leak when pressing Enter at interpreter
 prompt
In-Reply-To: <bbaeab100702012207y4a5671dfvfd889d3173b3149d@mail.gmail.com>
References: <bbaeab100701291540k1e06fce3qa1553257f65dc54a@mail.gmail.com>	<epphqf$oro$1@sea.gmane.org>	<aac2c7cb0702012018j36c6ed32wd1c559b4dad014d8@mail.gmail.com>
	<bbaeab100702012207y4a5671dfvfd889d3173b3149d@mail.gmail.com>
Message-ID: <45C8EDCC.5060000@v.loewis.de>

Brett Cannon schrieb:
> Thanks for the debugging, Adam.  I personally don't have time right
> now to dig in to verify and patch, but hopefully someone does.  Else I
> will try to get to it at some point between now and the end of PyCon.

I just fixed this and a few related bugs.

Regards,
Martin


From collinw at gmail.com  Fri Feb  9 15:55:23 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 08:55:23 -0600
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
Message-ID: <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>

The raise- and except-related PEPs from this discussion have been
committed as PEP 3109 and PEP 3110, respectively.

Thanks, everyone!

Collin Winter

From g.brandl at gmx.net  Fri Feb  9 19:41:53 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 09 Feb 2007 19:41:53 +0100
Subject: [Python-3000] Pre-peps on raise and except changes (was:
 Warning for 2.6 and greater)
In-Reply-To: <43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>	<ep6brv$p3c$1@sea.gmane.org>	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
Message-ID: <eqif9h$mf0$1@sea.gmane.org>

Collin Winter schrieb:
> The raise- and except-related PEPs from this discussion have been
> committed as PEP 3109 and PEP 3110, respectively.

One question: will there be an exception keyword argument to set the
traceback, to simplify

e = Error(V)
e.__traceback__ = tb
raise e

to

raise Error(V, traceback=tb)

I remember this being proposed, but could not find it in the PEPs.

Georg


From guido at python.org  Fri Feb  9 21:09:55 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Feb 2007 12:09:55 -0800
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <eqif9h$mf0$1@sea.gmane.org>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
Message-ID: <ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>

I agree that this API is better. If it's not in PEP 344 it should be added.

On 2/9/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Collin Winter schrieb:
> > The raise- and except-related PEPs from this discussion have been
> > committed as PEP 3109 and PEP 3110, respectively.
>
> One question: will there be an exception keyword argument to set the
> traceback, to simplify
>
> e = Error(V)
> e.__traceback__ = tb
> raise e
>
> to
>
> raise Error(V, traceback=tb)
>
> I remember this being proposed, but could not find it in the PEPs.
>
> Georg
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Fri Feb  9 23:51:22 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 16:51:22 -0600
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
Message-ID: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>

> On 2/9/07, Georg Brandl <g.brandl at gmx.net> wrote:
> > One question: will there be an exception keyword argument to set the
> > traceback, to simplify
> >
> > e = Error(V)
> > e.__traceback__ = tb
> > raise e
> >
> > to
> >
> > raise Error(V, traceback=tb)
> >
> > I remember this being proposed, but could not find it in the PEPs.

I believe the original proposal was something like

raise E(V).with_traceback(T)

My preference would be a method (as opposed to a keyword argument).

On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> I agree that this API is better. If it's not in PEP 344 it should be added.

Should this be added to PEP 344 or 3109? That is, do you want to see
it before Python 3?

Collin Winter

From guido at python.org  Fri Feb  9 23:57:13 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Feb 2007 14:57:13 -0800
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
Message-ID: <ca471dc20702091457p51e2aa87jf48e21ead6872501@mail.gmail.com>

On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> > On 2/9/07, Georg Brandl <g.brandl at gmx.net> wrote:
> > > One question: will there be an exception keyword argument to set the
> > > traceback, to simplify
> > >
> > > e = Error(V)
> > > e.__traceback__ = tb
> > > raise e
> > >
> > > to
> > >
> > > raise Error(V, traceback=tb)
> > >
> > > I remember this being proposed, but could not find it in the PEPs.
>
> I believe the original proposal was something like
>
> raise E(V).with_traceback(T)
>
> My preference would be a method (as opposed to a keyword argument).

Fair enough; that way the signature of user-provided exceptions
doesn't need to be messed with.

> On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> > I agree that this API is better. If it's not in PEP 344 it should be added.
>
> Should this be added to PEP 344 or 3109? That is, do you want to see
> it before Python 3?

I think storing the traceback in the exception is a 3.0 feature, since
it depends on the effective 'del e' at the end of the except clause
for avoiding most cycles.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Sat Feb 10 00:54:47 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 09 Feb 2007 18:54:47 -0500
Subject: [Python-3000] Pre-peps on raise and except changes (was:
 Warning for 2.6 and greater)
In-Reply-To: <ca471dc20702091457p51e2aa87jf48e21ead6872501@mail.gmail.co
 m>
References: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
Message-ID: <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>

At 02:57 PM 2/9/2007 -0800, Guido van Rossum wrote:
> > On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> > > I agree that this API is better. If it's not in PEP 344 it should be 
> added.
> >
> > Should this be added to PEP 344 or 3109? That is, do you want to see
> > it before Python 3?
>
>I think storing the traceback in the exception is a 3.0 feature, since
>it depends on the effective 'del e' at the end of the except clause
>for avoiding most cycles.

We would then have to have a Python 3.0 API to fetch the traceback, 
otherwise there's no way to write code that works in both 2.6 and 3.0 and 
gets a traceback.  Did we decide to keep sys.exc_info()?  If so, then that 
would presumably work.



From collinw at gmail.com  Sat Feb 10 01:09:45 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 18:09:45 -0600
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
Message-ID: <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>

On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 02:57 PM 2/9/2007 -0800, Guido van Rossum wrote:
> > > On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> > > > I agree that this API is better. If it's not in PEP 344 it should be
> > added.
> > >
> > > Should this be added to PEP 344 or 3109? That is, do you want to see
> > > it before Python 3?
> >
> >I think storing the traceback in the exception is a 3.0 feature, since
> >it depends on the effective 'del e' at the end of the except clause
> >for avoiding most cycles.
>
> We would then have to have a Python 3.0 API to fetch the traceback,
> otherwise there's no way to write code that works in both 2.6 and 3.0 and
> gets a traceback.  Did we decide to keep sys.exc_info()?  If so, then that
> would presumably work.

sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
attributes will be dropped.

As an aside, should sys.exc_clear() be added to the to-drop list? Is
there still a need for it given Python 3's exception cleanup
semantics?

Collin Winter

From greg.ewing at canterbury.ac.nz  Sat Feb 10 01:33:39 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 10 Feb 2007 13:33:39 +1300
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
Message-ID: <45CD12E3.9050803@canterbury.ac.nz>

Collin Winter wrote:

> I believe the original proposal was something like
> 
> raise E(V).with_traceback(T)

Does this mean you're not intending to have any syntactic
variant of the raise statement that includes a traceback
in 3.0? Or is this just so that forward-compatible code
can be written in 2.6?

If you wanted a distinctive syntax, it could be something
like

   raise e with t

--
Greg

From guido at python.org  Sat Feb 10 01:41:22 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Feb 2007 16:41:22 -0800
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <45CD12E3.9050803@canterbury.ac.nz>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
Message-ID: <ca471dc20702091641p763c87a4p13b3acbf28907c30@mail.gmail.com>

On 2/9/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Collin Winter wrote:
>
> > I believe the original proposal was something like
> >
> > raise E(V).with_traceback(T)
>
> Does this mean you're not intending to have any syntactic
> variant of the raise statement that includes a traceback
> in 3.0? Or is this just so that forward-compatible code
> can be written in 2.6?
>
> If you wanted a distinctive syntax, it could be something
> like
>
>    raise e with t

I can see uses for endowing an exception object with a traceback
without raising it (yet), so we'd still need the method; since we have
the method I'm not sure that we need syntax; I don't expect this to be
needed a lot. (Isn't there also a proposal for automatic exception
chaining? That might mean we'll need this even less.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Sat Feb 10 01:50:58 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 09 Feb 2007 19:50:58 -0500
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <45CD12E3.9050803@canterbury.ac.nz>
References: <43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
Message-ID: <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>

At 01:33 PM 2/10/2007 +1300, Greg Ewing wrote:
>Collin Winter wrote:
>
> > I believe the original proposal was something like
> >
> > raise E(V).with_traceback(T)
>
>Does this mean you're not intending to have any syntactic
>variant of the raise statement that includes a traceback
>in 3.0?

That *is* the variant.  ;)


>Or is this just so that forward-compatible code
>can be written in 2.6?

Actually, forward compatible code would be easier with something syntactic, 
like your 'raise e with t' idea.  It would allow the implementation to be 
different in 2.6 and 3.0, while using the same syntax.  (In 2.6 it could 
use the existing machinery, while in 3.0 it could call the 
.with_traceback() method.

Hm.  Actually, that's not necessary.  We could include .with_traceback(T) 
in 2.6, and just have old-style except: clauses delete the traceback from 
the returned objects.  New-style except: clauses would work just as they 
would in 3.0.

To summarize, in 2.6 we could support .with_traceback() and create 
exception instances with traceback attributes, but the old-style except: 
clauses could discard them to prevent cycles.  Raising an exception 
instance with a __traceback__ attribute would get some special handling so 
that it's equivalent to 3-argument raise in today's Python.  Likewise, 
generator.throw() would need the same special handling in 2.6.  Meanwhile, 
sys.exc_info() still lives in both versions.

To write 3.0-compatible code, you just use the 3.0 spellings of raise, 
throw(), and except.  Sounds like a plan!


From guido at python.org  Sat Feb 10 02:03:14 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Feb 2007 17:03:14 -0800
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
Message-ID: <ca471dc20702091703r56fd2e62g90e01a3a4719b8e6@mail.gmail.com>

On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> attributes will be dropped.

I understand why, but that doesn't make me uncomfortable with keeping
it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
exception objects so we could be weened off it in 2.6?

> As an aside, should sys.exc_clear() be added to the to-drop list? Is
> there still a need for it given Python 3's exception cleanup
> semantics?

I don't think so -- AFAIK the same use case is handled well enough by
the cleanup semantics of the except clause.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Sat Feb 10 02:08:28 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 19:08:28 -0600
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <ca471dc20702091641p763c87a4p13b3acbf28907c30@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<ca471dc20702091641p763c87a4p13b3acbf28907c30@mail.gmail.com>
Message-ID: <43aa6ff70702091708v42d93ae5rb8a957088e955709@mail.gmail.com>

On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> On 2/9/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Collin Winter wrote:
> >
> > > I believe the original proposal was something like
> > >
> > > raise E(V).with_traceback(T)
> >
> > Does this mean you're not intending to have any syntactic
> > variant of the raise statement that includes a traceback
> > in 3.0? Or is this just so that forward-compatible code
> > can be written in 2.6?
> >
> > If you wanted a distinctive syntax, it could be something
> > like
> >
> >    raise e with t
>
> I can see uses for endowing an exception object with a traceback
> without raising it (yet), so we'd still need the method; since we have
> the method I'm not sure that we need syntax; I don't expect this to be
> needed a lot. (Isn't there also a proposal for automatic exception
> chaining? That might mean we'll need this even less.)

The current 3-argument form of "raise" is used incredibly rarely
(compared to other raise forms), so I don't see a need for this kind
of syntactic support. Also, adding a "with" clause like that means we
have to hash out whether it goes in front of "from" (in "raise ...
from ...") or after it, etc, etc, and that's just begging for
100+-post bikeshedding threads.

Collin Winter

From guido at python.org  Sat Feb 10 02:09:45 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Feb 2007 17:09:45 -0800
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <ca471dc20702091703r56fd2e62g90e01a3a4719b8e6@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<ca471dc20702091703r56fd2e62g90e01a3a4719b8e6@mail.gmail.com>
Message-ID: <ca471dc20702091709o3ef01811o89e74d21f17a356a@mail.gmail.com>

On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> > attributes will be dropped.
>
> I understand why, but that doesn't make me uncomfortable with keeping

(of course I means "doesn't make me *comfortable*")

> it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
> exception objects so we could be weened off it in 2.6?
>
> > As an aside, should sys.exc_clear() be added to the to-drop list? Is
> > there still a need for it given Python 3's exception cleanup
> > semantics?
>
> I don't think so -- AFAIK the same use case is handled well enough by
> the cleanup semantics of the except clause.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Feb 10 02:14:47 2007
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 Feb 2007 17:14:47 -0800
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
Message-ID: <ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>

On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 01:33 PM 2/10/2007 +1300, Greg Ewing wrote:
> >Collin Winter wrote:
> >
> > > I believe the original proposal was something like
> > >
> > > raise E(V).with_traceback(T)
> >
> >Does this mean you're not intending to have any syntactic
> >variant of the raise statement that includes a traceback
> >in 3.0?
>
> That *is* the variant.  ;)
>
>
> >Or is this just so that forward-compatible code
> >can be written in 2.6?
>
> Actually, forward compatible code would be easier with something syntactic,
> like your 'raise e with t' idea.  It would allow the implementation to be
> different in 2.6 and 3.0, while using the same syntax.  (In 2.6 it could
> use the existing machinery, while in 3.0 it could call the
> .with_traceback() method.
>
> Hm.  Actually, that's not necessary.  We could include .with_traceback(T)
> in 2.6, and just have old-style except: clauses delete the traceback from
> the returned objects.  New-style except: clauses would work just as they
> would in 3.0.
>
> To summarize, in 2.6 we could support .with_traceback() and create
> exception instances with traceback attributes, but the old-style except:
> clauses could discard them to prevent cycles.  Raising an exception
> instance with a __traceback__ attribute would get some special handling so
> that it's equivalent to 3-argument raise in today's Python.  Likewise,
> generator.throw() would need the same special handling in 2.6.  Meanwhile,
> sys.exc_info() still lives in both versions.
>
> To write 3.0-compatible code, you just use the 3.0 spellings of raise,
> throw(), and except.  Sounds like a plan!

Can't see anything wrong with this either. Collin, do you have enough
to update your PEPs?

I wonder if we should try to keep PEP 344 up to date, or if we should
just do this in the Py3k PEPs; I'm okay with adding some notes about
2.6 to Py3k PEPs so I guess the latter would work.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Sat Feb 10 02:27:14 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 19:27:14 -0600
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <ca471dc20702091703r56fd2e62g90e01a3a4719b8e6@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<ca471dc20702091703r56fd2e62g90e01a3a4719b8e6@mail.gmail.com>
Message-ID: <43aa6ff70702091727g51ea4cccpff5ac1843f7c04f7@mail.gmail.com>

On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> > attributes will be dropped.
>
> I understand why, but that doesn't make me comfortable with keeping
> it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
> exception objects so we could be weened off it in 2.6?

That would imply that 2.6's "3.0 compatibility mode" would also
activate the cleanup semantics for "except" clauses. Switching that
kind of deep, subtle functionality on or off based on a command-line
switch makes me uncomfortable. There would also have to be a way of
distinguishing .pyc files produced by 2.6 versus those produced by 2.6
in 3.0-mode (since the cleanup semantics are implemented by emitting
extra bytecode for the implicit inner try/finally block).

> > As an aside, should sys.exc_clear() be added to the to-drop list? Is
> > there still a need for it given Python 3's exception cleanup
> > semantics?
>
> I don't think so -- AFAIK the same use case is handled well enough by
> the cleanup semantics of the except clause.

I've added sys.exc_clear()'s demise to PEP 3100.

Collin Winter

From collinw at gmail.com  Sat Feb 10 02:35:36 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 19:35:36 -0600
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
Message-ID: <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>

On 2/9/07, Guido van Rossum <guido at python.org> wrote:
> On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> > At 01:33 PM 2/10/2007 +1300, Greg Ewing wrote:
> > >Collin Winter wrote:
> > >
> > > > I believe the original proposal was something like
> > > >
> > > > raise E(V).with_traceback(T)
> > >
> > >Does this mean you're not intending to have any syntactic
> > >variant of the raise statement that includes a traceback
> > >in 3.0?
> >
> > That *is* the variant.  ;)
> >
> >
> > >Or is this just so that forward-compatible code
> > >can be written in 2.6?
> >
> > Actually, forward compatible code would be easier with something syntactic,
> > like your 'raise e with t' idea.  It would allow the implementation to be
> > different in 2.6 and 3.0, while using the same syntax.  (In 2.6 it could
> > use the existing machinery, while in 3.0 it could call the
> > .with_traceback() method.
> >
> > Hm.  Actually, that's not necessary.  We could include .with_traceback(T)
> > in 2.6, and just have old-style except: clauses delete the traceback from
> > the returned objects.  New-style except: clauses would work just as they
> > would in 3.0.
> >
> > To summarize, in 2.6 we could support .with_traceback() and create
> > exception instances with traceback attributes, but the old-style except:
> > clauses could discard them to prevent cycles.  Raising an exception
> > instance with a __traceback__ attribute would get some special handling so
> > that it's equivalent to 3-argument raise in today's Python.  Likewise,
> > generator.throw() would need the same special handling in 2.6.  Meanwhile,
> > sys.exc_info() still lives in both versions.
> >
> > To write 3.0-compatible code, you just use the 3.0 spellings of raise,
> > throw(), and except.  Sounds like a plan!
>
> Can't see anything wrong with this either. Collin, do you have enough
> to update your PEPs?

I think so. I've already got language ready for the section on using
BaseException.with_traceback() in the 2->3 raise translations, and
I'll work up additional language for the transition plan sometime this
weekend.

> I wonder if we should try to keep PEP 344 up to date, or if we should
> just do this in the Py3k PEPs; I'm okay with adding some notes about
> 2.6 to Py3k PEPs so I guess the latter would work.

If with_traceback() is going to be added in 2.6, I think at least that
much should go in PEP 344. The rest falls under "transitioning to
3.0", so it should probably go in PEP 3109.

Collin Winter

From pje at telecommunity.com  Sat Feb 10 04:44:22 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 09 Feb 2007 22:44:22 -0500
Subject: [Python-3000] Pre-peps on raise and except changes (was:
 Warning for 2.6 and greater)
In-Reply-To: <ca471dc20702091703r56fd2e62g90e01a3a4719b8e6@mail.gmail.co
 m>
References: <43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
Message-ID: <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>

At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote:
>On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> > attributes will be dropped.
>
>I understand why, but that doesn't make me uncomfortable with keeping
>it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
>exception objects so we could be weened off it in 2.6?

I notice that neither PEP addresses PEP 343 compatibility.  Do we plan to 
make __exit__() only get one argument?  Right now the protocol demands all 
three.  I suppose we could pass one argument in 3.0, and if you want to 
support 2.6 you would have to add default arguments.  Such code would be 
ugly as sin, but workable.

I'm not 100% certain we *can't* ditch sys.exc_info(), but if we do, we 
still need *some* way to get the "current exception" and have it include a 
traceback, that will also work in 2.6.  I don't believe there's any 
proposal for such an API currently outstanding.

WSGI still uses sys.exc_info tuples, but we could always add a 
wsgiref.exc_info() that gets the current exception and turns it into such a 
tuple.  ;-)

Anyway, I suggest we either decide to deal with that sort of ugliness, or 
decide to live with sys.exc_info(), and then get on with whichever of those 
two choices you decide to make.  :)


From collinw at gmail.com  Sat Feb 10 05:52:15 2007
From: collinw at gmail.com (Collin Winter)
Date: Fri, 9 Feb 2007 22:52:15 -0600
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
Message-ID: <43aa6ff70702092052o56a83545je2c7d570ddfdbb1b@mail.gmail.com>

On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote:
> >On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> > > attributes will be dropped.
> >
> >I understand why, but that doesn't make me uncomfortable with keeping
> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
> >exception objects so we could be weened off it in 2.6?
>
> I notice that neither PEP addresses PEP 343 compatibility.  Do we plan to
> make __exit__() only get one argument?  Right now the protocol demands all
> three.  I suppose we could pass one argument in 3.0, and if you want to
> support 2.6 you would have to add default arguments.  Such code would be
> ugly as sin, but workable.

Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of
*sys.exc_info()? That is, the source translation given in PEP 343
becomes

        mgr = (EXPR)
        exit = mgr.__exit__  # Not calling it yet
        value = mgr.__enter__()
        exc = True
        try:
            try:
                VAR = value  # Only if "as VAR" is present
                BLOCK
            except Exception as e:
                # The exceptional case is handled here
                exc = False
                if not exit(type(e), e, e.__traceback__):
                    raise
                # The exception is swallowed if exit() returns true
        finally:
            # The normal and non-local-goto cases are handled here
            if exc:
                exit(None, None, None)

Collin Winter

From ncoghlan at gmail.com  Sat Feb 10 09:08:09 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 10 Feb 2007 18:08:09 +1000
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>	<ep6brv$p3c$1@sea.gmane.org>	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>	<eqif9h$mf0$1@sea.gmane.org>	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>	<45CD12E3.9050803@canterbury.ac.nz>	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>
Message-ID: <45CD7D69.4090709@gmail.com>

Collin Winter wrote:
> I think so. I've already got language ready for the section on using
> BaseException.with_traceback() in the 2->3 raise translations, and
> I'll work up additional language for the transition plan sometime this
> weekend.

If with_traceback() is an instance method, does it mutate the existing 
exception or create a new one?

To avoid any confusion, perhaps it should instead be a class method 
equivalent to the following:

   @classmethod
   def with_traceback(*args, **kwds):
      cls = args[0]
      tb = args[1]
      args = args[2:]
      exc = cls(*args, **kwds)
      exc.__traceback__ = tb
      return exc

Usage would look like:

   raise E.with_traceback(T, V)


Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From pje at telecommunity.com  Sat Feb 10 18:02:17 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 10 Feb 2007 12:02:17 -0500
Subject: [Python-3000] Pre-peps on raise and except changes (was:
 Warning for 2.6 and greater)
In-Reply-To: <43aa6ff70702092052o56a83545je2c7d570ddfdbb1b@mail.gmail.co
 m>
References: <5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
	<43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ep6brv$p3c$1@sea.gmane.org>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>

At 10:52 PM 2/9/2007 -0600, Collin Winter wrote:
>On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
>>At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote:
>> >On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
>> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
>> > > attributes will be dropped.
>> >
>> >I understand why, but that doesn't make me uncomfortable with keeping
>> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
>> >exception objects so we could be weened off it in 2.6?
>>
>>I notice that neither PEP addresses PEP 343 compatibility.  Do we plan to
>>make __exit__() only get one argument?  Right now the protocol demands all
>>three.  I suppose we could pass one argument in 3.0, and if you want to
>>support 2.6 you would have to add default arguments.  Such code would be
>>ugly as sin, but workable.
>
>Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of
>*sys.exc_info()?

Sure, but *why*?  After all, we're changing gen.throw() in the same way.

My thought is, 2.6 would pass all three arguments, 3.0 just one.


From guido at python.org  Sat Feb 10 18:09:00 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Feb 2007 09:09:00 -0800
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <45CD7D69.4090709@gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>
	<45CD7D69.4090709@gmail.com>
Message-ID: <ca471dc20702100909s5c8a068cjb188fe3362f9dfa8@mail.gmail.com>

Why don't you want it to mutate the instance?

On 2/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Collin Winter wrote:
> > I think so. I've already got language ready for the section on using
> > BaseException.with_traceback() in the 2->3 raise translations, and
> > I'll work up additional language for the transition plan sometime this
> > weekend.
>
> If with_traceback() is an instance method, does it mutate the existing
> exception or create a new one?
>
> To avoid any confusion, perhaps it should instead be a class method
> equivalent to the following:
>
>    @classmethod
>    def with_traceback(*args, **kwds):
>       cls = args[0]
>       tb = args[1]
>       args = args[2:]
>       exc = cls(*args, **kwds)
>       exc.__traceback__ = tb
>       return exc
>
> Usage would look like:
>
>    raise E.with_traceback(T, V)
>
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sat Feb 10 18:09:45 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Feb 2007 09:09:45 -0800
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
	<5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>
Message-ID: <ca471dc20702100909s4717bf29h1c369fda895eeb6b@mail.gmail.com>

WFM.

On 2/10/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 10:52 PM 2/9/2007 -0600, Collin Winter wrote:
> >On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> >>At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote:
> >> >On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> >> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> >> > > attributes will be dropped.
> >> >
> >> >I understand why, but that doesn't make me uncomfortable with keeping
> >> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
> >> >exception objects so we could be weened off it in 2.6?
> >>
> >>I notice that neither PEP addresses PEP 343 compatibility.  Do we plan to
> >>make __exit__() only get one argument?  Right now the protocol demands all
> >>three.  I suppose we could pass one argument in 3.0, and if you want to
> >>support 2.6 you would have to add default arguments.  Such code would be
> >>ugly as sin, but workable.
> >
> >Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of
> >*sys.exc_info()?
>
> Sure, but *why*?  After all, we're changing gen.throw() in the same way.
>
> My thought is, 2.6 would pass all three arguments, 3.0 just one.
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From collinw at gmail.com  Sat Feb 10 23:31:48 2007
From: collinw at gmail.com (Collin Winter)
Date: Sat, 10 Feb 2007 16:31:48 -0600
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <45CD7D69.4090709@gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>
	<45CD7D69.4090709@gmail.com>
Message-ID: <43aa6ff70702101431j555dd7f0v383d06e08d389529@mail.gmail.com>

On 2/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Collin Winter wrote:
> > I think so. I've already got language ready for the section on using
> > BaseException.with_traceback() in the 2->3 raise translations, and
> > I'll work up additional language for the transition plan sometime this
> > weekend.
>
> If with_traceback() is an instance method, does it mutate the existing
> exception or create a new one?

I say it mutates the instance.

> To avoid any confusion, perhaps it should instead be a class method
> equivalent to the following:
>
[snip]
>
> Usage would look like:
>
>    raise E.with_traceback(T, V)

What confusion do you foresee?

Collin Winter

From brett at python.org  Sun Feb 11 00:07:59 2007
From: brett at python.org (Brett Cannon)
Date: Sat, 10 Feb 2007 15:07:59 -0800
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <ca471dc20702100909s4717bf29h1c369fda895eeb6b@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
	<5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>
	<ca471dc20702100909s4717bf29h1c369fda895eeb6b@mail.gmail.com>
Message-ID: <bbaeab100702101507t36416760s140df4d89fda952b@mail.gmail.com>

On 2/10/07, Guido van Rossum <guido at python.org> wrote:
> WFM.
>

Wow, I think that is the shortest way you can OK an idea, Guido,
without just leaving off the period.  =)

And for what it's worth, I'm +1 on adding default args and passing a
single argument in Py3K and all three in 2.6 as well.

-Brett

From collinw at gmail.com  Sun Feb 11 00:14:33 2007
From: collinw at gmail.com (Collin Winter)
Date: Sat, 10 Feb 2007 17:14:33 -0600
Subject: [Python-3000] Pre-peps on raise and except changes (was:
	Warning for 2.6 and greater)
In-Reply-To: <5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>
	<eqif9h$mf0$1@sea.gmane.org>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>
	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>
	<5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>
	<5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>
Message-ID: <43aa6ff70702101514r478086aenf43dfb33c2e56458@mail.gmail.com>

On 2/10/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 10:52 PM 2/9/2007 -0600, Collin Winter wrote:
> >On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
> >>At 05:03 PM 2/9/2007 -0800, Guido van Rossum wrote:
> >> >On 2/9/07, Collin Winter <collinw at gmail.com> wrote:
> >> > > sys.exc_info() will be kept, while the sys.exc_{type,value,traceback}
> >> > > attributes will be dropped.
> >> >
> >> >I understand why, but that doesn't make me uncomfortable with keeping
> >> >it. Maybe in "3.0 compatibility mode" 2.6 could attach tracebacks to
> >> >exception objects so we could be weened off it in 2.6?
> >>
> >>I notice that neither PEP addresses PEP 343 compatibility.  Do we plan to
> >>make __exit__() only get one argument?  Right now the protocol demands all
> >>three.  I suppose we could pass one argument in 3.0, and if you want to
> >>support 2.6 you would have to add default arguments.  Such code would be
> >>ugly as sin, but workable.
> >
> >Couldn't __exit__() be passed (type(e), e, e.__traceback__) instead of
> >*sys.exc_info()?
>
> Sure, but *why*?  After all, we're changing gen.throw() in the same way.
>
> My thought is, 2.6 would pass all three arguments, 3.0 just one.

My only concern was that keeping the three-argument signature means
one less thing to change when transitioning to 3.0. Anyone really
concerned about their context managers working in 2.6 and 3.0 could
just use a decorator to ensure compatibility, though, so count me in.

Collin Winter

From ncoghlan at gmail.com  Sun Feb 11 01:31:26 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 11 Feb 2007 10:31:26 +1000
Subject: [Python-3000] Pre-peps on raise and except changes
 (was:	Warning for 2.6 and greater)
In-Reply-To: <43aa6ff70702101514r478086aenf43dfb33c2e56458@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>	<43aa6ff70701241322o6967cb27s4bf27279caf354cd@mail.gmail.com>	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>	<eqif9h$mf0$1@sea.gmane.org>	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>	<5.1.1.6.0.20070209185225.029f5c50@sparrow.telecommunity.com>	<43aa6ff70702091609h3a8a8fc8w875bad847190e4d7@mail.gmail.com>	<5.1.1.6.0.20070209223738.044ed8d0@sparrow.telecommunity.com>	<5.1.1.6.0.20070210120058.0271e800@sparrow.telecommunity.com>
	<43aa6ff70702101514r478086aenf43dfb33c2e56458@mail.gmail.com>
Message-ID: <45CE63DE.60400@gmail.com>

Collin Winter wrote:
> On 2/10/07, Phillip J. Eby <pje at telecommunity.com> wrote:
>> My thought is, 2.6 would pass all three arguments, 3.0 just one.
> 
> My only concern was that keeping the three-argument signature means
> one less thing to change when transitioning to 3.0. Anyone really
> concerned about their context managers working in 2.6 and 3.0 could
> just use a decorator to ensure compatibility, though, so count me in.

A lot of context managers will also adjust automatically when 
contextlib.contextmanager is updated to handle the change.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sun Feb 11 01:35:36 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 11 Feb 2007 10:35:36 +1000
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <ca471dc20702100909s5c8a068cjb188fe3362f9dfa8@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>	
	<43aa6ff70702090655n7e5f77d2v50b20e82a8e91b4d@mail.gmail.com>	
	<eqif9h$mf0$1@sea.gmane.org>	
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>	
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>	
	<45CD12E3.9050803@canterbury.ac.nz>	
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>	
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>	
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>	
	<45CD7D69.4090709@gmail.com>
	<ca471dc20702100909s5c8a068cjb188fe3362f9dfa8@mail.gmail.com>
Message-ID: <45CE64D8.1030704@gmail.com>

Guido van Rossum wrote:
> Why don't you want it to mutate the instance?

The recent repeat of the API discussion about list.sort() & 
list.reversed() (mutate instance & return None) vs sorted() and 
reversed() (return new instance).

I'm trying to see why mutating & returning self would be OK here, when 
it's not OK for a list to do the same thing.

An alternate constructor as a class method ducks the question entirely.

Cheers,
Nick.

> 
> On 2/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Collin Winter wrote:
>> > I think so. I've already got language ready for the section on using
>> > BaseException.with_traceback() in the 2->3 raise translations, and
>> > I'll work up additional language for the transition plan sometime this
>> > weekend.
>>
>> If with_traceback() is an instance method, does it mutate the existing
>> exception or create a new one?
>>
>> To avoid any confusion, perhaps it should instead be a class method
>> equivalent to the following:
>>
>>    @classmethod
>>    def with_traceback(*args, **kwds):
>>       cls = args[0]
>>       tb = args[1]
>>       args = args[2:]
>>       exc = cls(*args, **kwds)
>>       exc.__traceback__ = tb
>>       return exc
>>
>> Usage would look like:
>>
>>    raise E.with_traceback(T, V)
>>
>>
>> Cheers,
>> Nick.
>>
>> -- 
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>> ---------------------------------------------------------------
>>              http://www.boredomandlaziness.org
>>
> 
> 


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Sun Feb 11 05:08:26 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Feb 2007 20:08:26 -0800
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <45CE64D8.1030704@gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>
	<45CD7D69.4090709@gmail.com>
	<ca471dc20702100909s5c8a068cjb188fe3362f9dfa8@mail.gmail.com>
	<45CE64D8.1030704@gmail.com>
Message-ID: <ca471dc20702102008t2c00754bt24ac4612be31cbfe@mail.gmail.com>

Somehow it seems that exceptions keep getting permission to violate
the rules... (E.g. the insistence on a fixed base class is also
considered unpythonic in other contexts.) Maybe it's because they're
"exceptions" ? :-)

Anyway, I believe there's a use case for re-raising an existing
exception with an added traceback. After all the __traceback__
attribute is mutable. Returning the mutated object is acceptable here
because the *dominant* use case is creating and raising an exception
in one go:

  raise FooException(<args>).with_traceback(<tb>)

--Guido

On 2/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > Why don't you want it to mutate the instance?
>
> The recent repeat of the API discussion about list.sort() &
> list.reversed() (mutate instance & return None) vs sorted() and
> reversed() (return new instance).
>
> I'm trying to see why mutating & returning self would be OK here, when
> it's not OK for a list to do the same thing.
>
> An alternate constructor as a class method ducks the question entirely.
>
> Cheers,
> Nick.
>
> >
> > On 2/10/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> Collin Winter wrote:
> >> > I think so. I've already got language ready for the section on using
> >> > BaseException.with_traceback() in the 2->3 raise translations, and
> >> > I'll work up additional language for the transition plan sometime this
> >> > weekend.
> >>
> >> If with_traceback() is an instance method, does it mutate the existing
> >> exception or create a new one?
> >>
> >> To avoid any confusion, perhaps it should instead be a class method
> >> equivalent to the following:
> >>
> >>    @classmethod
> >>    def with_traceback(*args, **kwds):
> >>       cls = args[0]
> >>       tb = args[1]
> >>       args = args[2:]
> >>       exc = cls(*args, **kwds)
> >>       exc.__traceback__ = tb
> >>       return exc
> >>
> >> Usage would look like:
> >>
> >>    raise E.with_traceback(T, V)
> >>
> >>
> >> Cheers,
> >> Nick.
> >>
> >> --
> >> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> >> ---------------------------------------------------------------
> >>              http://www.boredomandlaziness.org
> >>
> >
> >
>
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Feb 11 07:26:28 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 10 Feb 2007 22:26:28 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <bbaeab100701290951o5e97b97cm41ac3db30a15e9fc@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45BD94CB.6060107@canterbury.ac.nz>
	<bbaeab100701290951o5e97b97cm41ac3db30a15e9fc@mail.gmail.com>
Message-ID: <ca471dc20702102226i6f38a1c1k9221661f97023bfb@mail.gmail.com>

On 1/29/07, Brett Cannon <brett at python.org> wrote:
> I was more generally wondering what the plan was for transitioning any
> C API changes (if we were even going to do that level of transition).

It's too early for much of a plan IMO. I'm not making radical changes
(yet) but I'm mercilessly deleting APIs as they become obsolete. I
expect that we need to wait until we've implemented the new I/O
library and the str/unicode unification before we can say much about
what to do about C APIs.

But there's one thing we can do: not change existing APIs in
incompatible ways. If you delete an API, code that uses it gets a
compile-time error, and that should make it relatively simple to fix
(assuming there's a replacement). But if you change the signature it's
more questionable, and if you change the semantics (e.g. returning a
different kind of PyObject*) it's painful.

So let's commit to not changing signatures or semantics, but delete
obsolete APIs in favor of new ones (with a different name). I guess
this means some of the new names will be ugly. So what, it's C. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Sun Feb 11 18:50:34 2007
From: brett at python.org (Brett Cannon)
Date: Sun, 11 Feb 2007 09:50:34 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <ca471dc20702102226i6f38a1c1k9221661f97023bfb@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45BD94CB.6060107@canterbury.ac.nz>
	<bbaeab100701290951o5e97b97cm41ac3db30a15e9fc@mail.gmail.com>
	<ca471dc20702102226i6f38a1c1k9221661f97023bfb@mail.gmail.com>
Message-ID: <bbaeab100702110950m55c26163y940e7023bc9e2c49@mail.gmail.com>

On 2/10/07, Guido van Rossum <guido at python.org> wrote:
> On 1/29/07, Brett Cannon <brett at python.org> wrote:
> > I was more generally wondering what the plan was for transitioning any
> > C API changes (if we were even going to do that level of transition).
>
> It's too early for much of a plan IMO. I'm not making radical changes
> (yet) but I'm mercilessly deleting APIs as they become obsolete. I
> expect that we need to wait until we've implemented the new I/O
> library and the str/unicode unification before we can say much about
> what to do about C APIs.
>

OK, fair enough.  I know Neal has some ideas on this so I can let him
sweat some of the details when it comes time.  =)

> But there's one thing we can do: not change existing APIs in
> incompatible ways. If you delete an API, code that uses it gets a
> compile-time error, and that should make it relatively simple to fix
> (assuming there's a replacement). But if you change the signature it's
> more questionable, and if you change the semantics (e.g. returning a
> different kind of PyObject*) it's painful.
>
> So let's commit to not changing signatures or semantics, but delete
> obsolete APIs in favor of new ones (with a different name). I guess
> this means some of the new names will be ugly. So what, it's C. :-)

Thank goodness for documentation and the C API index then.  =)

Then I will probably try to come up with a reasonable name for
something to replace PyErr_GivenExceptionMatches(), add it to 2.6, and
delete PyErr_GivenExceptionMatches() in 3.0.

-Brett

From martin at v.loewis.de  Sun Feb 11 19:18:35 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Feb 2007 19:18:35 +0100
Subject: [Python-3000] the types module
In-Reply-To: <bbaeab100702011112m5d5c55cdl58fee6e5d7c2f8b1@mail.gmail.com>
References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com>
	<bbaeab100702011112m5d5c55cdl58fee6e5d7c2f8b1@mail.gmail.com>
Message-ID: <45CF5DFB.7090609@v.loewis.de>

Brett Cannon schrieb:
> This has come up before on python-dev, IIRC.  Double-check the archives.

More specifically, see PEP 294. It claims the types module will be
removed in Python 3000.

Regards,
Martin


From martin at v.loewis.de  Sun Feb 11 19:20:12 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 11 Feb 2007 19:20:12 +0100
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
Message-ID: <45CF5E5C.1050703@v.loewis.de>

Brett Cannon schrieb:
> My specific need is that PyErr_GivenExceptionMatches() does not have
> an exception return value.  This sucks for me in 2.6 for deprecating
> catching string exceptions, but it sucks more in 3.0 since only
> subclasses of BaseException can be raised.  But not allowing -1 to
> represent that an error occurred is a pain for anyone who wants to
> properly use the function.

I don't understand what exceptional value you are talking about.
If the given object cannot be an exception, it clearly doesn't
match, so the outcome should be zero (not an error).

Regards,
Martin


From brett at python.org  Sun Feb 11 20:30:24 2007
From: brett at python.org (Brett Cannon)
Date: Sun, 11 Feb 2007 11:30:24 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <45CF5E5C.1050703@v.loewis.de>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45CF5E5C.1050703@v.loewis.de>
Message-ID: <bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>

On 2/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Brett Cannon schrieb:
> > My specific need is that PyErr_GivenExceptionMatches() does not have
> > an exception return value.  This sucks for me in 2.6 for deprecating
> > catching string exceptions, but it sucks more in 3.0 since only
> > subclasses of BaseException can be raised.  But not allowing -1 to
> > represent that an error occurred is a pain for anyone who wants to
> > properly use the function.
>
> I don't understand what exceptional value you are talking about.
> If the given object cannot be an exception, it clearly doesn't
> match, so the outcome should be zero (not an error).
>

Right, but I wanted to be able to raise a warning.  If that warning is
supposed to be treated as an exception the caller needs to let that
propagate.  RIght now PyErr_GivenExceptionMatches() can in no way let
the caller know that fact; the caller need to use PyErr_Occurred()
after the call.  I checked and no one does that in the core or in 3rd
party libraries from a Google Code search I did.

-Brett

From collinw at gmail.com  Mon Feb 12 01:43:24 2007
From: collinw at gmail.com (Collin Winter)
Date: Sun, 11 Feb 2007 18:43:24 -0600
Subject: [Python-3000] the types module
In-Reply-To: <45CF5DFB.7090609@v.loewis.de>
References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com>
	<bbaeab100702011112m5d5c55cdl58fee6e5d7c2f8b1@mail.gmail.com>
	<45CF5DFB.7090609@v.loewis.de>
Message-ID: <43aa6ff70702111643w57d8c8f1kd104c82d34d95551@mail.gmail.com>

On 2/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Brett Cannon schrieb:
> > This has come up before on python-dev, IIRC.  Double-check the archives.
>
> More specifically, see PEP 294. It claims the types module will be
> removed in Python 3000.

Is removing the types module still a goal? It's not mentioned in
either PEP 3100 or 3108.

Collin Winter

From guido at python.org  Mon Feb 12 03:26:01 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 11 Feb 2007 18:26:01 -0800
Subject: [Python-3000] the types module
In-Reply-To: <43aa6ff70702111643w57d8c8f1kd104c82d34d95551@mail.gmail.com>
References: <1d85506f0702010343j26ddb0eeub63dafad8a83cf78@mail.gmail.com>
	<bbaeab100702011112m5d5c55cdl58fee6e5d7c2f8b1@mail.gmail.com>
	<45CF5DFB.7090609@v.loewis.de>
	<43aa6ff70702111643w57d8c8f1kd104c82d34d95551@mail.gmail.com>
Message-ID: <ca471dc20702111826g52d528c4m1caa5b0d9241c6b6@mail.gmail.com>

Well, I would surely love to see it replaced by something more reasonable.

Collecting type objects together just on the basis that they are all
built-in type objects was a bad idea.

I still hope to do something about Bill Janssen's ABC proposal. But
that will have to wait until after PyCon.

--Guido

On 2/11/07, Collin Winter <collinw at gmail.com> wrote:
> On 2/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Brett Cannon schrieb:
> > > This has come up before on python-dev, IIRC.  Double-check the archives.
> >
> > More specifically, see PEP 294. It claims the types module will be
> > removed in Python 3000.
>
> Is removing the types module still a goal? It's not mentioned in
> either PEP 3100 or 3108.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Mon Feb 12 07:55:39 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 12 Feb 2007 07:55:39 +0100
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>	
	<45CF5E5C.1050703@v.loewis.de>
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>
Message-ID: <45D00F6B.7020207@v.loewis.de>

Brett Cannon schrieb:
> Right, but I wanted to be able to raise a warning.  If that warning is
> supposed to be treated as an exception the caller needs to let that
> propagate.  RIght now PyErr_GivenExceptionMatches() can in no way let
> the caller know that fact

I'm unclear why you want to warn in PyErr_GivenExceptionMatches:
shouldn't you rather warn when the exception is raised?

Regards,
Martin

From ncoghlan at gmail.com  Mon Feb 12 10:35:29 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 12 Feb 2007 19:35:29 +1000
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <ca471dc20702102008t2c00754bt24ac4612be31cbfe@mail.gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>	
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>	
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>	
	<45CD12E3.9050803@canterbury.ac.nz>	
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>	
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>	
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>	
	<45CD7D69.4090709@gmail.com>	
	<ca471dc20702100909s5c8a068cjb188fe3362f9dfa8@mail.gmail.com>	
	<45CE64D8.1030704@gmail.com>
	<ca471dc20702102008t2c00754bt24ac4612be31cbfe@mail.gmail.com>
Message-ID: <45D034E1.2090506@gmail.com>

Guido van Rossum wrote:
> Somehow it seems that exceptions keep getting permission to violate
> the rules... (E.g. the insistence on a fixed base class is also
> considered unpythonic in other contexts.) Maybe it's because they're
> "exceptions" ? :-)
> 
> Anyway, I believe there's a use case for re-raising an existing
> exception with an added traceback. After all the __traceback__
> attribute is mutable. Returning the mutated object is acceptable here
> because the *dominant* use case is creating and raising an exception
> in one go:
> 
>  raise FooException(<args>).with_traceback(<tb>)

Works for me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From brett at python.org  Mon Feb 12 23:11:05 2007
From: brett at python.org (Brett Cannon)
Date: Mon, 12 Feb 2007 14:11:05 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <45D00F6B.7020207@v.loewis.de>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45CF5E5C.1050703@v.loewis.de>
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>
	<45D00F6B.7020207@v.loewis.de>
Message-ID: <bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>

On 2/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Brett Cannon schrieb:
> > Right, but I wanted to be able to raise a warning.  If that warning is
> > supposed to be treated as an exception the caller needs to let that
> > propagate.  RIght now PyErr_GivenExceptionMatches() can in no way let
> > the caller know that fact
>
> I'm unclear why you want to warn in PyErr_GivenExceptionMatches:
> shouldn't you rather warn when the exception is raised?
>

Guido wants both so that you don't end up with useless values in the
'except' clause.  So yes, things are checked at the time of raising an
exception, but that does not prevent someone from putting something in
an 'except' clause that is useless.

-Brett

From guido at python.org  Mon Feb 12 23:55:21 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 12 Feb 2007 14:55:21 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45CF5E5C.1050703@v.loewis.de>
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>
	<45D00F6B.7020207@v.loewis.de>
	<bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>
Message-ID: <ca471dc20702121455r166b41t981f6e4821bbf1b9@mail.gmail.com>

But I only want the latter in Py3k, and I don't mind using a different
API there, even potentially a separate check after evaluating 'E' but
before checking whether it matches.

I think it's fine not to catch this in 2.6; after all it's a bug
anyway so we're not expecting many occurrences. I don't think the 3.0
mode in 2.6 needs to catch existing bugs; it only needs to catch code
that *works* in 2.6 but won' in 3.0.

On 2/12/07, Brett Cannon <brett at python.org> wrote:
> On 2/11/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Brett Cannon schrieb:
> > > Right, but I wanted to be able to raise a warning.  If that warning is
> > > supposed to be treated as an exception the caller needs to let that
> > > propagate.  RIght now PyErr_GivenExceptionMatches() can in no way let
> > > the caller know that fact
> >
> > I'm unclear why you want to warn in PyErr_GivenExceptionMatches:
> > shouldn't you rather warn when the exception is raised?
> >
>
> Guido wants both so that you don't end up with useless values in the
> 'except' clause.  So yes, things are checked at the time of raising an
> exception, but that does not prevent someone from putting something in
> an 'except' clause that is useless.
>
> -Brett
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Feb 13 00:08:09 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 13 Feb 2007 12:08:09 +1300
Subject: [Python-3000] Pre-peps on raise and except changes
In-Reply-To: <45D034E1.2090506@gmail.com>
References: <43aa6ff70701221445s5edb4b2do8c8cffdebc759c7@mail.gmail.com>
	<ca471dc20702091209r497083e9x50eb5477e9e8bf8e@mail.gmail.com>
	<43aa6ff70702091451t2b4db67by8f949dda52bdc43d@mail.gmail.com>
	<45CD12E3.9050803@canterbury.ac.nz>
	<5.1.1.6.0.20070209194256.03894be0@sparrow.telecommunity.com>
	<ca471dc20702091714r2cec9ac3r4b0db0f4d6f6e75a@mail.gmail.com>
	<43aa6ff70702091735p2f281678mecaf416cc71ca361@mail.gmail.com>
	<45CD7D69.4090709@gmail.com>
	<ca471dc20702100909s5c8a068cjb188fe3362f9dfa8@mail.gmail.com>
	<45CE64D8.1030704@gmail.com>
	<ca471dc20702102008t2c00754bt24ac4612be31cbfe@mail.gmail.com>
	<45D034E1.2090506@gmail.com>
Message-ID: <45D0F359.2040802@canterbury.ac.nz>

Nick Coghlan wrote:
> Guido van Rossum wrote:
 > Someone else wrote:
 >
> > raise FooException(<args>).with_traceback(<tb>)
> 
> Works for me.

I don't like that somehow -- it looks too clever.
Also it violates the general principle of mutating
methods not returning things. I know Guido said
he's willing to waive that rule for exceptions,
but it still bothers me.

--
Greg

From martin at v.loewis.de  Tue Feb 13 06:52:29 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 13 Feb 2007 06:52:29 +0100
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>	
	<45CF5E5C.1050703@v.loewis.de>	
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>	
	<45D00F6B.7020207@v.loewis.de>
	<bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>
Message-ID: <45D1521D.7000903@v.loewis.de>

Brett Cannon schrieb:
>> I'm unclear why you want to warn in PyErr_GivenExceptionMatches:
>> shouldn't you rather warn when the exception is raised?
>>
> 
> Guido wants both so that you don't end up with useless values in the
> 'except' clause.  So yes, things are checked at the time of raising an
> exception, but that does not prevent someone from putting something in
> an 'except' clause that is useless.

Ok: but why does this check need to happen in PyErr_GivenExceptionMatchs?

The deprecation of string exceptions already happens in cmp_outcome;
if you check for bad base exceptions there also, you would find them
all, no? So I still don't see a need to modify GivenExceptionMatches.

Regards,
Martin

From brett at python.org  Tue Feb 13 21:46:59 2007
From: brett at python.org (Brett Cannon)
Date: Tue, 13 Feb 2007 12:46:59 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <45D1521D.7000903@v.loewis.de>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45CF5E5C.1050703@v.loewis.de>
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>
	<45D00F6B.7020207@v.loewis.de>
	<bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>
	<45D1521D.7000903@v.loewis.de>
Message-ID: <bbaeab100702131246i5badab39h8a78fdb1bb20749e@mail.gmail.com>

On 2/12/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Brett Cannon schrieb:
> >> I'm unclear why you want to warn in PyErr_GivenExceptionMatches:
> >> shouldn't you rather warn when the exception is raised?
> >>
> >
> > Guido wants both so that you don't end up with useless values in the
> > 'except' clause.  So yes, things are checked at the time of raising an
> > exception, but that does not prevent someone from putting something in
> > an 'except' clause that is useless.
>
> Ok: but why does this check need to happen in PyErr_GivenExceptionMatchs?
>

It doesn't need to, it just would have been convenient and consistent.
  It seems odd that C code can compare an exception against other
objects that an 'except' clause won't.

> The deprecation of string exceptions already happens in cmp_outcome;
> if you check for bad base exceptions there also, you would find them
> all, no?

It wouldn't be checked in both places, just PyErr_GivenExceptionMatches().

-Brett

From cvrebert at gmail.com  Wed Feb 14 04:25:55 2007
From: cvrebert at gmail.com (Chris Rebert)
Date: Tue, 13 Feb 2007 19:25:55 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
Message-ID: <45D28143.9010502@gmail.com>

Requesting comments on the following pre-PEP. pybench runs both with and 
without the patch applied would also be appreciated.
- Chris R


Title: Default Argument Expressions
Author: Christopher Rebert <cvrebertatgmaildotcom>
Status: Draft
Type: Standards Track
Requires: 3000
Python-Version: 3.0

Abstract

     This PEP proposes new semantics for default arguments to remove
     boilerplate code associated with non-constant default argument values,
     allowing them to be expressed more clearly and succinctly. 
Specifically,
     all default argument expressions are re-evaluated at each call as 
opposed
     to just once at definition-time as they are now.


Motivation

     Currently, to write functions using non-constant default arguments, one
     must use the idiom:

         def foo(non_const=None):
             if non_const is None:
                 non_const = some_expr
             #rest of function

    or equivalent code.  Naive programmers desiring mutable default 
arguments
    often make the mistake of writing the following:

         def foo(mutable=some_expr_producing_mutable):
             #rest of function

     However, this does not work as intended, as 
'some_expr_producing_mutable'
     is evaluated only *once* at definition-time, rather than once per 
call at
     call-time.  This results in all calls to 'foo' using the same default
     value, which can result in unintended consequences.  This 
necessitates the
     previously mentioned idiom.  This unintuitive behavior is such a 
frequent
     stumbling block for newbies that it is present in at least 3 lists of
     Python's deficiencies [0] [1] [2].  Python's tutorial even mentions the
     issue explicitly [3].
     There are currently few, if any, known good uses of the current 
behavior
     of mutable default arguments.  The most common one is to preserve 
function
     state between calls.  However, as one of the lists [2] comments, this
     purpose is much better served by decorators, classes, or (though less
     preferred) global variables.
     Therefore, since the current semantics aren't useful for non-constant
     default values and an idiom is necessary to work around this 
deficiency,
     why not change the semantics so that people can write what they 
mean more
     directly, without the tedious boilerplate? Removing this idiom 
would help
     make code more readable and self-documenting.


Rationale

     The discussion referenced herein is based on two threads [4] [5] on the
     python-ideas mailing list.
     Originally, it was proposed that all default argument values be
     deep-copied from the original (evaluated at definition-time) at each
     invocation of the function where the default value was required. 
However,
     this doesn't take into account default values that are not 
literals, e.g.
     function calls, subscripts, attribute accesses.  Thus, the new idea 
was to
     re-evaluate the default arguments at each call where they were needed.
     There was some concern over the possible performance hit this could 
cause,
     and whether there should be new syntax so that code could use the 
existing
     semantics for performance reasons.  Some of the proposed syntaxes were:

         def foo(bar=<baz>):
             #code

         def foo(bar=new baz):
             #code

         def foo(bar=fresh baz):
             #code

         def foo(bar=separate baz):
             #code

         def foo(bar=another baz):
             #code

         def foo(bar=unique baz):
             #code

         def foo(bar or baz):
             #code

     where the keyword (or angle brackets) would indicate that the
     default value 'baz' of parameter 'bar' should use the new semantics.
     Other parameters would continue to use the old semantics.

     Alternately, the new semantics could be the default, with the old
     semantics accessible using:

         def foo(bar=once baz):
             #code

     Where 'once' indicates the old default argument semantics. A 
similar idea
     is mentioned in PEP 3103 [6] under "Option 4".  However, having two 
sets
     of semantics could be confusing, and leaving in the old semantics 
might be
     considered premature optimization.  So this PEP proposed having 
just one
     set of semantics.  Refactorings to deal with the possible 
performance hit
     from the new semantics are discussed later.

     A more radical proposed solution was to restrict default arguments to
     being hash()-able values, thus theoretically restricting default 
arguments
     to immutable values only.  While this would solve the newbie-confusion
     issue, it does not suggest a better way to specify that a default value
     should be recomputed at every function call.

     Throughout the discussion, several decorators were shown as 
alternatives
     to the aforementioned idiom.  These do allow the programmer to express
     their intent more clearly, at the cost of some extra complexity. 
Also, no
     one generator could be applied to all situations.  The programmer would
     have to figure out which one to use each time.  This PEP's proposed
     solution would make these decorators unnecessary and allow a more 
general
     solution to the issue than these decorators.  The question was also 
raised
     as to whether the problem this PEP seeks to solve is significant 
enough to
     warrant a language change.  The statistics in the Compatibility Issues
     section should help demonstrate the necessity of the changes that 
this PEP
     proposes.

     The next question was exactly how default variable expressions 
should be
     scoped.  By way of demonstration:

         a = 42
         def foo(b=a):
             a = 3.14

     Now, does the variable 'a' in the default expression for 'b' refer 
to the
     lexical variable 'a', or the local variable 'a'?  If it refers to a 
local
     variable, then this code is basically equivalent to:

         a = 42
         def foo(b=None):
             if b is None:
                 b = a
             a = 3.14

     in which case, 'a' is being referenced before it's been assigned to 
in the
     function, causing an UnboundLocalError.  The alternative is to have 
Python
     treat 'a' within the function's body differently from the 'a' in the\
     default expression.  In this case, the code would behave as if it were:

         a = 42
         def foo(b=None):
             if b is None:
                 b = __a
             a = 3.14

     where __a indicates Python 'magically' treating it as a lexical 
variable
     that is distinct from the local variable 'a'.  This would increase
     backward-compatibility, allowing you to use a lexical variable with the
     same name as a local variable as a default expression, which is more
     similar to Python's current behavior.  However, this would 
complicate the
     semantics of default expressions.  For simplicity's sake, this PEP
     endorses treating variables in default expressions as normal function
     variables.  Suggestions for dealing with the incompatibilities this 
would
     introduce are discussed later.


Specification

     The current semantics for default arguments are replaced by the 
following
     semantics:
         - Whenever a function is called, and the caller does not provide a
         value for a parameter with a default expression, the parameter's
         default expression is evaluated in the function's scope.  The
         resulting value is then assigned to a local variable in the
         function's scope with the same name as the parameter.
         - The default argument expressions are evaluated before the body
         of the function.
         - The evaluation of default argument expressions proceeds in the
         same order as that of the parameter list in the function's 
definition.
         - Variables in a default expression are be treated like normal
         function variables (i.e. global/lexical variables unless 
assigned to
         in the function).
     Given these semantics, it makes more sense to refer to default argument
     expressions rather than default argument values, as the expression is
     re-evaluated at each call, rather than just once at definition-time.
     Therefore, we shall do so hereafter.

     Demonstrative examples:
         #default argument expressions can refer to
         #variables in the enclosing scope...
         CONST = "hi"
         def foo(a=CONST):
             print a

         >>> foo()
         hi
         >>> CONST="bye"
         >>> foo()
         bye

         #...or even other arguments
         def ncopies(container, n=len(container)):
             return [container for i in range(n)]

         >>> ncopies([1, 2], 5)
         [[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]]
         >>> ncopies([1, 2, 3])
         [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
         >>> #ncopies grabbed n from [1, 2, 3]'s length (3)

         #default argument expressions are arbitrary expressions
         def my_sum(lst):
             cur_sum = lst[0]
             for i in lst[1:]: cur_sum += i
             return cur_sum

         def bar(b=my_sum((["b"] * (2 * 3))[:4])):
             print b

         >>> bar()
         bbbb

         #default argument expressions are re-evaluated at every call...
         from random import randint
         def baz(c=randint(1,3)):
             print c

         >>> baz()
         2
         >>> baz()
         3

         #...but only when they're required
         def silly():
             print "spam"
             return 42

         def qux(d=silly()):
             pass

         >>> qux()
         spam
         >>> qux(17)
         >>> qux(d=17)
         >>> qux(*[17])
         >>> qux(**{'d':17})
         >>> #no output since silly() never called
         >>> #because d's value was specified in the calls

         #default argument expressions are evaluated in calling sequence 
order
         count = 0
         def next():
             global count
             count += 1
             return count - 1

         def frobnicate(g=next(), h=next(), i=next()):
             print g, h, i

         >>> frobnicate()
         0 1 2
         >>> #g, h, and i's default argument expressions are evaluated
         >>> #in the same order as in the parameter definition

         #variables in default expressions refer to lexical/global 
variables...
         j = "holy grail"
         def frenchy(k=j):
             print j
         #...unless assigned to in the function (or its parameters)
         def arthur(j="swallow", m=j):
             print m

         >>> frenchy()
         holy grail
         >>> arthur()
         swallow


Compatibility Issues

     This change in semantics breaks code which uses mutable default 
argument
     expressions and depends on those expressions being evaluated only once.
     It also will break code that assigns new incompatible values in a 
parent
     scope to variables used in default expressions.  Code relying on such
     behavior can be refactored from:

         def foo(bar=mutable):
             #code

     to

         state = mutable
         def foo(bar=state):
             #code

     or

         class Baz(object):
             state = mutable

             @classmethod
             def foo(cls, bar=cls.state):
                 #code

     or

         from functools import wraps

         def stateify(states):
             def _wrap(func):
                 @wraps(func)
                 def _wrapper(*args, **kwds):
                     new_kwargs = states.copy()
                     new_kwargs.update(kwds)
                     return func(*args, **new_kwargs)
                 return _wrapper
             return _wrap

         @stateify({'bar' : mutable})
         def foo(bar):
             #code

     Code such as the following (which was also mentioned in the Rationale):

         b = 42 #outer b
         def foo(a=b): #ERROR: refers to local b, not outer b!
             b = 7 #local b

     which has default values that refer to variables in enclosing 
scopes and
     contains assignments to local variables of the same names will also be
     incompatible, as the 'b' in the default argument refers to the 
local 'b'
     rather than the outer 'b', resulting in an UnboundLocalError 
because the
     local variable 'b' has not been assigned to at the time "a"'s default
     expression is evaluated.  Such code will need to rename the affected
     variables.

     The changes in this PEP are backwards-compatible with all code whose
     default argument values are immutable, including code using the idiom
     mentioned in the 'Motivation' section.  However, such values will 
now be
     recomputed for each call for which they are required.  This may cause
     performance degradation.  If such recomputation is significantly
     expensive, the same refactoring mentioned above can be used.

     A survey of the standard library for Python v2.5, produced via a
     script [7], gave the following statistics for the standard library
     (608 files, test suites were excluded):

         total number of non-None immutable default arguments: 1585 (41.5%)
         total number of mutable default arguments: 186 (4.9%)
         total number of default arguments with a value of None: 1813 
(47.4%)
         total number of default arguments with unknown mutability: 238 
(6.2%)
         total number of comparisons to None: 940

     Note: The number of comparisons to None refers to *all* such 
comparisons,
     not necessarily just those used in the idiom mentioned in the 
Motivation
     section.

     Looking more closely at the script's output, it appears that Tix.py and
     Tkinter.py are the primary users of mutable default arguments in the
     standard library.

     Similarly, examination of the unknown default arguments reveals that a
     significant fraction are functions, classes, or constants, which 
should, for
     the most part, not be functionally affected by this proposal

     Assuming the standard library is indicative of Python code in 
general, the
     change in semantics will have comparatively little impact on the 
correct
     operation of Python programs.

     Running pybench with modifications to simulate the proposed 
semantics [8]
     shows that Python function/method calls using default arguments run 
about
     4.4%-6.5% slower versus the current semantics.  However, as the 
simulation
     of the proposed semantics is crude, this should be considered an upper
     bound for any performance decreases this proposal might cause.

     In relation to Python 3.0, this PEP's proposal is compatible with 
those of
     PEP 3102 [9] and PEP 3107 [10], though it does not depend on the
     acceptance of either of those PEPs.


Reference Implementation

     All code of the form:

         def foo(bar=some_expr, baz=other_expr):
             #body

     Should be compiled as if it had read (in pseudo-Python):

         def foo(bar=_undefined, baz=_undefined):
             if bar is _undefined:
                 bar = some_expr
             if baz is _undefined:
                 baz = other_expr
             #body

     where '_undefined' is the value given to a parameter when the caller
     didn't specify a value for it.  This is not intended to be a literal
     translation, but rather a demonstration as to how Python's
     argument-handling machinery should act.  Specifically, there should 
be no
     Python-level value corresponding to _undefined, nor should a literal
     translation such as that shown necessarily be used.


References

     [0] 10 Python pitfalls
         http://zephyrfalcon.org/labs/python_pitfalls.html

     [1] Python Gotchas
         http://www.ferg.org/projects/python_gotchas.html#contents_item_6

     [2] When Pythons Attack
 
http://www.onlamp.com/pub/a/python/2004/02/05/learn_python.html?page=2

     [3] 4. More Control Flow Tools
         http://docs.python.org/tut/node6.html#SECTION006710000000000000000

     [4] [Python-ideas] fixing mutable default argument values
 
http://mail.python.org/pipermail/python-ideas/2007-January/000073.html

     [5] [Python-ideas] proto-PEP: Fixing Non-constant Default Arguments
 
http://mail.python.org/pipermail/python-ideas/2007-January/000121.html

     [6] A Switch/Case Statement
         http://www.python.org/dev/peps/pep-3103/

     [7] Script to generate default argument statistics
         See attachment.

     [8] Patch to pybench/Calls.py
         See attachment.

     [9] Keyword-Only Arguments
         http://www.python.org/dev/peps/pep-3102/

     [10] Function Annotations
         http://www.python.org/dev/peps/pep-3107/


Copyright

     This document has been placed in the public domain.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: defargs.diff
Type: text/x-patch
Size: 794 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070213/12125965/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_find.py
Type: text/x-python
Size: 4245 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070213/12125965/attachment.py 

From mike.klaas at gmail.com  Wed Feb 14 05:44:08 2007
From: mike.klaas at gmail.com (Mike Klaas)
Date: Tue, 13 Feb 2007 20:44:08 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <45D28143.9010502@gmail.com>
References: <45D28143.9010502@gmail.com>
Message-ID: <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com>

On 2/13/07, Chris Rebert <cvrebert at gmail.com> wrote:
>      This PEP proposes new semantics for default arguments to remove
>      boilerplate code associated with non-constant default argument values,
>      allowing them to be expressed more clearly and succinctly.
> Specifically,
>      all default argument expressions are re-evaluated at each call as
> opposed
>      to just once at definition-time as they are now.

Seems like a huge barrel of worms.  The binding semantics are not only
a problem for mutable arguments, as you state in your pep:

In [2]: def a():
   ...:     g = 1
   ...:     def b():
   ...:         print g
   ...:     g = 2
   ...:     return b
   ...:
In [4]: a()()
2

In [5]: def a():
   ...:     g = 1
   ...:     def b(g=g):
   ...:         print g
   ...:     g = 2
   ...:     return b
In [6]: a()()
1

Creating closures and define-time local bindings is certainly not as
common as a "regular" function definition, it is important part of
python when programming in a semi-functional style.  Imagine that "def
b" is in a for loop.  Your presented alternatives either don't work or
go to rather extreme effort to duplicate this simple and useful
functionality.

I agree that newbies stumble over mutable default arguments.  I did.
If we could improve that learning process, I would be all for it.
However, besides this being a significant change in semantics, two
main stumbling blocks in my mind are:

1. Scoping.  Scoping issues are not minor consequences of changes to
default argument behaviour, but are integral.  I think that you'd have
to come up with a more obvious way to accomplish all the various
current behaviours of def args before changing their semantics.  This
is probably a larger project than the original proposal.

2. Performance.  The speed of python is influenced greatly by the
performance of function dispatch.  This may not show up in pystone.

-Mike

From anthony at interlink.com.au  Wed Feb 14 06:55:45 2007
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed, 14 Feb 2007 16:55:45 +1100
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com>
References: <45D28143.9010502@gmail.com>
	<3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com>
Message-ID: <200702141655.46596.anthony@interlink.com.au>

On Wednesday 14 February 2007 15:44, Mike Klaas wrote:
> 2. Performance.  The speed of python is influenced greatly by the
> performance of function dispatch.  This may not show up in
> pystone.

pystone is an utterly useless benchmark. It should not be used, 
ever. The pre-PEP references pybench, which does a much better job 
of showing this sort of thing.


From jcarlson at uci.edu  Wed Feb 14 08:27:39 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 13 Feb 2007 23:27:39 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <45D28143.9010502@gmail.com>
References: <45D28143.9010502@gmail.com>
Message-ID: <20070213231036.AD24.JCARLSON@uci.edu>


Chris Rebert <cvrebert at gmail.com> wrote:
> Requesting comments on the following pre-PEP. pybench runs both with and 
> without the patch applied would also be appreciated.
> - Chris R

One Glyph Lefkowitz posted today [1] in response to dynamic attribute
access the following, which is surely applicable here.

> I also strongly dislike every syntax that has thus far been proposed,
> but even if I loved them, there is just no motivating use-case.  New
> syntax is not going to make dynamic attribute access easier to
> understand, and it *is* going to cause even more version-compatibility
> headaches.
> 
> I really, really wish that every feature proposal for Python had to meet
> some burden of proof, or submit a cost/benefit analysis.  Who is this
> going to help?  How much is this going to help them?  "Who is this going
> to hurt" is easy, but should also be included for completeness -
> everyone who wants to be able to deploy new code on old Pythons.
> 
> I suspect this would kill 90% of "hey wouldn't this syntax be neat"
> proposals on day zero, and the ones that survived would be a lot more
> interesting to talk about.

Replace "dynamic attribute access" with "default argument expressions". 
With that said, please provide:
1a) Proof as to what is to be gained over an explicit if statement or
conditional expression.
or
1b) A cost/benefit analysis of the time it would take to "fix" the
standard library and/or user code with any of the provided new
syntax/semantics.
2) Who is this going to help (and do we care)?
3) How much is this going to help them?
4) Who is this going to hurt (in addition to everyone who wants to run
new code in older Pythons)?


As stated by most repondents to the original threads, a conditional
statement is generally preferable (which answers #1a).  It is really
only going to help new users of Python (as seasoned users don't have the
issue, and generally don't seem to mind using an additional line to
"solve" the "problem") (which answers #2).  It isn't going to help very
many people terribly much - 2 line addition and 1 line modification *in
the worst case*, if you include a new None-like sentinal (which answers
#3).  Further, it's going to hurt everyone who is used to the 'execute
once' default argument semantics currently in place (which answers #4).

Using Glyph's requirements, we see that the syntax is just not
worthwhile, as stated by most people in the original thread.


 - Josiah

[1] http://mail.python.org/pipermail/python-dev/2007-February/071061.html

From martin at v.loewis.de  Wed Feb 14 09:17:04 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 14 Feb 2007 09:17:04 +0100
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <bbaeab100702131246i5badab39h8a78fdb1bb20749e@mail.gmail.com>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>	
	<45CF5E5C.1050703@v.loewis.de>	
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>	
	<45D00F6B.7020207@v.loewis.de>	
	<bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>	
	<45D1521D.7000903@v.loewis.de>
	<bbaeab100702131246i5badab39h8a78fdb1bb20749e@mail.gmail.com>
Message-ID: <45D2C580.6090406@v.loewis.de>

Brett Cannon schrieb:
> It doesn't need to, it just would have been convenient and consistent.
>  It seems odd that C code can compare an exception against other
> objects that an 'except' clause won't.

If you look at the C code, you find that there are very few callers
to GivenExceptionMatches (even if you also count ExceptionMatches
callers), and they either pass a PyExc_ object (which will automatically
be permitted), or one of their own exceptions. If you were to remove
PyErr_GivenExceptionMatches, and replace it with something else
where
a) people have to change the functions in their code, and
b) have to check the return value for errors (which they can
    statically determine to never happen)
I think the authors would be unhappy about this gratuitous change.

>> The deprecation of string exceptions already happens in cmp_outcome;
>> if you check for bad base exceptions there also, you would find them
>> all, no?
> 
> It wouldn't be checked in both places, just PyErr_GivenExceptionMatches().

Please don't.

Martin


From eopadoan at altavix.com  Wed Feb 14 14:24:32 2007
From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan)
Date: Wed, 14 Feb 2007 11:24:32 -0200
Subject: [Python-3000] fixing test_dict
Message-ID: <dea92f560702140524x1919315bx76b98559ffa7b0bf@mail.gmail.com>

If someone is alread working at this, please ignore this mail: I just
picked because it was ease enough not to do while I wait some other
code to run at work.
I've created two patches to p3yk. They are two alternatives to fix the
broken test_dict.py:
test_dict_1.patch uses the same approach as test_dictviews.py:
transform the dict_view in a set.
test_dict_2.patch is an alternative: I'm not sure if the .items(),
.values() and .keys() should be covered two times (test_dict.py and
test_dictviews.py), so this solves the problem removing this tests
from test_dict.py.

-- 
EduardoOPadoan (eopadoan->altavix::com)
Bookmarks: http://del.icio.us/edcrypt
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_dict_1.patch
Type: text/x-patch
Size: 1029 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070214/3fafcdd1/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_dict_2.patch
Type: text/x-patch
Size: 1018 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070214/3fafcdd1/attachment-0001.bin 

From guido at python.org  Wed Feb 14 18:49:27 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 Feb 2007 09:49:27 -0800
Subject: [Python-3000] fixing test_dict
In-Reply-To: <dea92f560702140524x1919315bx76b98559ffa7b0bf@mail.gmail.com>
References: <dea92f560702140524x1919315bx76b98559ffa7b0bf@mail.gmail.com>
Message-ID: <ca471dc20702140949k7ad11a44w862d5f1e7afcc307@mail.gmail.com>

Thanks! I decided to use your first approach; one can never have too
many unit tests! :-)

On 2/14/07, Eduardo EdCrypt O. Padoan <eopadoan at altavix.com> wrote:
> If someone is alread working at this, please ignore this mail: I just
> picked because it was ease enough not to do while I wait some other
> code to run at work.
> I've created two patches to p3yk. They are two alternatives to fix the
> broken test_dict.py:
> test_dict_1.patch uses the same approach as test_dictviews.py:
> transform the dict_view in a set.
> test_dict_2.patch is an alternative: I'm not sure if the .items(),
> .values() and .keys() should be covered two times (test_dict.py and
> test_dictviews.py), so this solves the problem removing this tests
> from test_dict.py.
>
> --
> EduardoOPadoan (eopadoan->altavix::com)
> Bookmarks: http://del.icio.us/edcrypt
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Wed Feb 14 19:53:57 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 14 Feb 2007 10:53:57 -0800
Subject: [Python-3000] how should we handle changes to the C API?
In-Reply-To: <45D2C580.6090406@v.loewis.de>
References: <bbaeab100701282108r6b01e0c5ge4dba673c5894fc7@mail.gmail.com>
	<45CF5E5C.1050703@v.loewis.de>
	<bbaeab100702111130w5f831fbbo35cb2c5fa904f7bd@mail.gmail.com>
	<45D00F6B.7020207@v.loewis.de>
	<bbaeab100702121411x55a26907x8f15fd384ffd4df2@mail.gmail.com>
	<45D1521D.7000903@v.loewis.de>
	<bbaeab100702131246i5badab39h8a78fdb1bb20749e@mail.gmail.com>
	<45D2C580.6090406@v.loewis.de>
Message-ID: <bbaeab100702141053y163cd6b4id0ba68e53d49c919@mail.gmail.com>

On 2/14/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Brett Cannon schrieb:
> > It doesn't need to, it just would have been convenient and consistent.
> >  It seems odd that C code can compare an exception against other
> > objects that an 'except' clause won't.
>
> If you look at the C code, you find that there are very few callers
> to GivenExceptionMatches (even if you also count ExceptionMatches
> callers), and they either pass a PyExc_ object (which will automatically
> be permitted), or one of their own exceptions. If you were to remove
> PyErr_GivenExceptionMatches, and replace it with something else
> where
> a) people have to change the functions in their code, and

Which is why this was a Py3K question.

> b) have to check the return value for errors (which they can
>     statically determine to never happen)
> I think the authors would be unhappy about this gratuitous change.
>

Well, I happen to not think it is gratuitous, but I think we are just
going to agree to disagree on this one.  =)

> >> The deprecation of string exceptions already happens in cmp_outcome;
> >> if you check for bad base exceptions there also, you would find them
> >> all, no?
> >
> > It wouldn't be checked in both places, just PyErr_GivenExceptionMatches().
>
> Please don't.

I'm not.  At this point I am not going to bother to touch anything and
just continue forward with how I did things in 2.6.

-Brett

From bjourne at gmail.com  Thu Feb 15 01:36:18 2007
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Thu, 15 Feb 2007 01:36:18 +0100
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <20070213231036.AD24.JCARLSON@uci.edu>
References: <45D28143.9010502@gmail.com> <20070213231036.AD24.JCARLSON@uci.edu>
Message-ID: <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>

On 2/14/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> Chris Rebert <cvrebert at gmail.com> wrote:
> > Requesting comments on the following pre-PEP. pybench runs both with and
> > without the patch applied would also be appreciated.
> > - Chris R
>
> One Glyph Lefkowitz posted today [1] in response to dynamic attribute
> access the following, which is surely applicable here.

To be fair, the two ideas are fairly different. Dynamic attribute
access was about adding new syntax which makes the language more
complex. This idea is more about fine-tuning existing syntax; it does
not add to the language, it just makes it different.

> > I also strongly dislike every syntax that has thus far been proposed,
> > but even if I loved them, there is just no motivating use-case.  New
> > syntax is not going to make dynamic attribute access easier to
> > understand, and it *is* going to cause even more version-compatibility
> > headaches.
> >
> > I really, really wish that every feature proposal for Python had to meet
> > some burden of proof, or submit a cost/benefit analysis.  Who is this
> > going to help?  How much is this going to help them?  "Who is this going
> > to hurt" is easy, but should also be included for completeness -
> > everyone who wants to be able to deploy new code on old Pythons.
> >
> > I suspect this would kill 90% of "hey wouldn't this syntax be neat"
> > proposals on day zero, and the ones that survived would be a lot more
> > interesting to talk about.
>
> Replace "dynamic attribute access" with "default argument expressions".
> With that said, please provide:

> 1a) Proof as to what is to be gained over an explicit if statement or
> conditional expression.

Two less lines of code? It is hard to grep for it, but I bet there are
a few hundred occurrences the following in the standard library:

    def something(x = None):
        if x is None:
            x = [1, 2, 3]       # <- default

If you remember, it was constructs like this that was one of the big
motivations behind the terniary operator. So now you write the above like this:

    def something(x = None):
        x = [1, 2, 3] if x is None else x

If I remember correctly, the discussion about the terniary operator
was sparked by Raymond Hettinger finding a bug in some code that
erroneously used the and-or-terniary-trick.

But also, often the choice is not between "explicit if statement"
and this. It is between having an obscure and hard to find bug and the
new semantic. I have many, MANY times written bugged code like this:

    def something(x = None):
        if not x:
            x = 42      # <- Oh noe!

or:

    def something(x = []):
        x += ["foobar"]     # <- Even worse!

I guess bugs like these could be explained by stupidity, laziness or
some combination of both. :) Or they could, if other programmers
experience them, be a sign of a deficiency in the language.

> or
> 1b) A cost/benefit analysis of the time it would take to "fix" the
> standard library and/or user code with any of the provided new
> syntax/semantics.

I naively think that Python's test suite would discover most of the
problems. If not, fix the test suite. :) This idea is for py3k, so one
would guess that the allowed cost is higher.

> 2) Who is this going to help (and do we care)?

Me, newbies, lazy programmers or programmers with not enough attention
to details. The last group is fairly big, I think.

> 3) How much is this going to help them?

I think alot. Especially newbies. As said in the PEP, Python's current
default argument evaluation is mentioned in three different lists of
Python's deficiencies.

> 4) Who is this going to hurt (in addition to everyone who wants to run
> new code in older Pythons)?

Everyone that is accustomed to the old behavior. Every book author
whose books become deprecated. On the other hand, the more changes to
the language the more books they can write. :)

I agree that the cost probably is "huge," but so is the benefit,
IMHO. If Python was created today, I bet that default arguments would
be reevaluated at each invocation of the callable.

> Using Glyph's requirements, we see that the syntax is just not
> worthwhile, as stated by most people in the original thread.

Maybe, but it certainly would make some code look much nicer. From
cookielib.py:

    def is_expired(self, now=None):
        if now is None: now = time.time()
        if (self.expires is not None) and (self.expires <= now):
            return True
        return False

With new semantics:

    def is_expired(self, now = time.time()):
        if (self.expires is not None) and (self.expires <= now):
            return True
        return False

-- 
mvh Bj?rn

From greg.ewing at canterbury.ac.nz  Thu Feb 15 01:54:29 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Feb 2007 13:54:29 +1300
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>
References: <45D28143.9010502@gmail.com> <20070213231036.AD24.JCARLSON@uci.edu>
	<740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>
Message-ID: <45D3AF45.7070402@canterbury.ac.nz>

BJ?rn Lindqvist wrote:
> I have many, MANY times written bugged code like this:
> 
>     def something(x = None):
>         if not x:
>             x = 42      # <- Oh noe!

You can get exactly the same bug in many other contexts
besides default arguments. It's something you need to
be on the alert for generally, and if you are, you are
no more likely to encounter it here than anywhere else.

>     def something(x = []):
>         x += ["foobar"]     # <- Even worse!

I'm skeptical that people really write functions that
do things like that. It smells wrong: Is the function
intended to mutate an argument that's passed in? If
not, then it shouldn't be touching the argument, in
which case it doesn't matter if the default value is
evaluated only once. If so, and no argument is passed,
it would be more efficient to just skip the code that
does the mutation, rather than create a new list,
mutate it, and then throw it away.

> Me, newbies, lazy programmers or programmers with not enough attention
> to details.

Anyone who can't pay attention to details is going to
have much bigger problems with programming than just
dealing with default arguments.

--
Greg

From jcarlson at uci.edu  Thu Feb 15 02:10:21 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 14 Feb 2007 17:10:21 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>
References: <20070213231036.AD24.JCARLSON@uci.edu>
	<740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>
Message-ID: <20070214165418.AD39.JCARLSON@uci.edu>


"BJ?rn Lindqvist" <bjourne at GMAIL.COM> wrote:
> On 2/14/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Chris Rebert <cvrebert at gmail.com> wrote:
> > > Requesting comments on the following pre-PEP. pybench runs both with and
> > > without the patch applied would also be appreciated.
> > > - Chris R
> >
> > One Glyph Lefkowitz posted today [1] in response to dynamic attribute
> > access the following, which is surely applicable here.
> 
> To be fair, the two ideas are fairly different. Dynamic attribute
> access was about adding new syntax which makes the language more
> complex. This idea is more about fine-tuning existing syntax; it does
> not add to the language, it just makes it different.

There are about a dozen different syntax proposals in the pre-PEP to
determine whether something is executed at compilation or during call. 
Re-read it.

[snip]
> > 1a) Proof as to what is to be gained over an explicit if statement or
> > conditional expression.
> 
> Two less lines of code? It is hard to grep for it, but I bet there are
> a few hundred occurrences the following in the standard library:
> 
>     def something(x = None):
>         if x is None:
>             x = [1, 2, 3]       # <- default

If some 500+ examples of dynamic attribute access in the Python standard
library wasn't sufficient, than the 'few hundred' surely isn't,
especially without actual counts.  Yes, coming up with good counts is
hard, but that's one of the requirements Glyph pointed out.  If no one
is willing to go through and see what it would fix, then it's obviously
not worth it.


> If you remember, it was constructs like this that was one of the big
> motivations behind the terniary operator. So now you write the above like this:
> 
>     def something(x = None):
>         x = [1, 2, 3] if x is None else x

That is certainly an *application* of the terniary operator, but they
can be used *anywhere* a decision is made to choose a value, not merely
in the function signature.

[snip]
> > or
> > 1b) A cost/benefit analysis of the time it would take to "fix" the
> > standard library and/or user code with any of the provided new
> > syntax/semantics.
> 
> I naively think that Python's test suite would discover most of the
> problems. If not, fix the test suite. :) This idea is for py3k, so one
> would guess that the allowed cost is higher.

The cost of syntax changes are allowed to be higher, *but only if their
benefits actually outweigh their costs*.  So far, all you or really
anyone else has shown in the default argument expressions discussion is
that:
1) a few lines
2) on occasion
3) written by new or sloppy Python developers

will be:
1a) less buggy
1b) or not buggy
2) at most 2 lines shorter

Bite the bullet.  Spend the two lines.

 - Josiah


From guido at python.org  Thu Feb 15 02:14:35 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 Feb 2007 17:14:35 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <20070214165418.AD39.JCARLSON@uci.edu>
References: <20070213231036.AD24.JCARLSON@uci.edu>
	<740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>
	<20070214165418.AD39.JCARLSON@uci.edu>
Message-ID: <ca471dc20702141714p12635169kc7e4208097286ba0@mail.gmail.com>

Nobody has asked me yet, but I'm not going to support this PEP. it's
too big a departure from existing semantics. Next are we going to turn
class variables initialized with expressions into automatic instance
variable initializers implicitly executed in the __init__ code?
Newbies are just as likely to run into the aliasing problem there as
in the argument default case.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From cvrebert at gmail.com  Thu Feb 15 02:35:04 2007
From: cvrebert at gmail.com (Chris Rebert)
Date: Wed, 14 Feb 2007 17:35:04 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com>
References: <45D28143.9010502@gmail.com>
	<3d2ce8cb0702132044r423bbfb5l6a73eb1203081e92@mail.gmail.com>
Message-ID: <45D3B8C8.8080502@gmail.com>

Mike Klaas wrote:
> On 2/13/07, Chris Rebert <cvrebert at gmail.com> wrote:
>>      This PEP proposes new semantics for default arguments to remove
>>      boilerplate code associated with non-constant default argument 
>> values,
>>      allowing them to be expressed more clearly and succinctly.
>> Specifically,
>>      all default argument expressions are re-evaluated at each call as
>> opposed
>>      to just once at definition-time as they are now.
> 
> Seems like a huge barrel of worms.  The binding semantics are not only
> a problem for mutable arguments, as you state in your pep:
> 
> In [2]: def a():
>   ...:     g = 1
>   ...:     def b():
>   ...:         print g
>   ...:     g = 2
>   ...:     return b
>   ...:
> In [4]: a()()
> 2
> 
> In [5]: def a():
>   ...:     g = 1
>   ...:     def b(g=g):
>   ...:         print g
>   ...:     g = 2
>   ...:     return b
> In [6]: a()()
> 1
> 
> Creating closures and define-time local bindings is certainly not as
> common as a "regular" function definition, it is important part of
> python when programming in a semi-functional style.  Imagine that "def
> b" is in a for loop.  Your presented alternatives either don't work or
> go to rather extreme effort to duplicate this simple and useful
> functionality.

The refactorings mentioned in the PEP were specifically for mutable
arguments. I didn't consider the case you mentioned. The first snippet
you give would be unaffected by the PEP's changes. As for the second case:

while whatever:
     #code
     g = 1
     def b(g=g):
         print g
     g = 2
     b() #=> 1
     #code

It could be modified like so:

while whatever:
     #code
     g = 1
     retro_g = g
     def b(g=retro_g):
         print g
     g = 2
     b() #=> 1
     #code

> I agree that newbies stumble over mutable default arguments.  I did.
> If we could improve that learning process, I would be all for it.
> However, besides this being a significant change in semantics, two
> main stumbling blocks in my mind are:
> 
> 1. Scoping.  Scoping issues are not minor consequences of changes to
> default argument behavior, but are integral.  I think that you'd have
> to come up with a more obvious way to accomplish all the various
> current behaviors of def args before changing their semantics.  This
> is probably a larger project than the original proposal.

Well, as the PEP mentions, new syntax could be added to access the old
semantics, or alternatively, to enable the new semantics, though I'd
prefer to avoid adding syntax. However, finding refactorings for various
uses of the current semantics is very relevant to the PEP. I'll be sure
to add you case and any others mentioned to the PEP.

> 2. Performance.  The speed of python is influenced greatly by the
> performance of function dispatch.  This may not show up in pystone.

Clarification: as Anthony Baxter mentioned, I used pybench, not pystone. 
However, if someone recommends a better benchmark to measure the
performance impact of the proposed change, I'd be all for it.

- Chris Rebert


From cvrebert at gmail.com  Thu Feb 15 03:20:31 2007
From: cvrebert at gmail.com (Chris Rebert)
Date: Wed, 14 Feb 2007 18:20:31 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <20070214165418.AD39.JCARLSON@uci.edu>
References: <20070213231036.AD24.JCARLSON@uci.edu>
	<740c3aec0702141636s381b57d8k465e020a2a04d6a2@mail.gmail.com>
	<20070214165418.AD39.JCARLSON@uci.edu>
Message-ID: <45D3C36F.2050501@gmail.com>

Josiah Carlson wrote:
> "BJ?rn Lindqvist" <bjourne at GMAIL.COM> wrote:
>> On 2/14/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>>> Chris Rebert <cvrebert at gmail.com> wrote:
>>>> Requesting comments on the following pre-PEP. pybench runs both with and
>>>> without the patch applied would also be appreciated.
>>>> - Chris R
>>> One Glyph Lefkowitz posted today [1] in response to dynamic attribute
>>> access the following, which is surely applicable here.
>> To be fair, the two ideas are fairly different. Dynamic attribute
>> access was about adding new syntax which makes the language more
>> complex. This idea is more about fine-tuning existing syntax; it does
>> not add to the language, it just makes it different.
> 
> There are about a dozen different syntax proposals in the pre-PEP to
> determine whether something is executed at compilation or during call. 
> Re-read it.

Those syntaxes were only raised during discussion of changing default 
argument semantics. If you read the PEP, it doesn't endorse any of them. 
I'm against adding new syntax. However, that could always change based 
on community feedback.

> [snip]
>>> 1a) Proof as to what is to be gained over an explicit if statement or
>>> conditional expression.
>> Two less lines of code? It is hard to grep for it, but I bet there are
>> a few hundred occurrences the following in the standard library:
>>
>>     def something(x = None):
>>         if x is None:
>>             x = [1, 2, 3]       # <- default
> 
> If some 500+ examples of dynamic attribute access in the Python standard
> library wasn't sufficient, than the 'few hundred' surely isn't,
> especially without actual counts.  Yes, coming up with good counts is
> hard, but that's one of the requirements Glyph pointed out.  If no one
> is willing to go through and see what it would fix, then it's obviously
> not worth it.

Under "Compatibility Issues" in the PEP, I mention that my 
statistics-generating script found in the standard library (among other 
things):
     total number of default arguments with a value of None: 1813 (47.4% 
of all default arguments)
     total number of comparisons to None: 940
Yes, these aren't specific counts of uses of the 'x=None...if x is None: 
x=whatever' idiom, but you can't get much closer without looking over 
the files manually.

> [snip]
>>> or
>>> 1b) A cost/benefit analysis of the time it would take to "fix" the
>>> standard library and/or user code with any of the provided new
>>> syntax/semantics.
[snip]

I'm going to respond to this in the original email that asked these 
questions.
- Chris Rebert

From guido at python.org  Thu Feb 15 05:51:13 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 Feb 2007 20:51:13 -0800
Subject: [Python-3000] UserDict revamp
Message-ID: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com>

I tried to fix a few more unit tests tonight that had started failing
after the introduction of dict views. Looking over UserDict.py, it's
clear that this module needs more work -- while I banged it into
submission with minimal effort, it would reallly make a lot more sense
to redesign UserDict and MixinDict so they are more like dict, even if
this means that their users will have to be fixed, too.

Perhaps the most egregious example is MixinDict, which currently
assumes that keys() is a primitive operation returning a list, and
builds __iter__() out of that. Obviously a better approach is to turn
this around. (I'd have thought that ever since 2.2 this would have
been the better design, but perhaps it was too late then already.)

Is someone interested in looking at a redesign and cleanup of these
classes? I suppose that they also need a Python implementation of
dictionary views -- some of this can be lifted straight out of PEP
3106, fortunately.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Thu Feb 15 06:29:36 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 Feb 2007 18:29:36 +1300
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <45D28143.9010502@gmail.com>
References: <45D28143.9010502@gmail.com>
Message-ID: <45D3EFC0.5030401@canterbury.ac.nz>

I just noticed that my Thunderbird marked the posting
with this PEP in it as spam. Not sure what that says
about the proposal...

--
Greg

From eopadoan at altavix.com  Thu Feb 15 13:24:42 2007
From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan)
Date: Thu, 15 Feb 2007 10:24:42 -0200
Subject: [Python-3000] UserDict revamp
In-Reply-To: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com>
References: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com>
Message-ID: <dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com>

Ops, sending to the whole list.

On 2/15/07, Guido van Rossum <guido at python.org> wrote:
> I tried to fix a few more unit tests tonight that had started failing
> after the introduction of dict views. Looking over UserDict.py, it's
> clear that this module needs more work -- while I banged it into
> submission with minimal effort, it would reallly make a lot more sense
> to redesign UserDict and MixinDict so they are more like dict, even if
> this means that their users will have to be fixed, too.
>
> Perhaps the most egregious example is MixinDict, which currently
> assumes that keys() is a primitive operation returning a list, and
> builds __iter__() out of that. Obviously a better approach is to turn
> this around. (I'd have thought that ever since 2.2 this would have
> been the better design, but perhaps it was too late then already.)

s/MixinDict/DictMixin ? :)

> Is someone interested in looking at a redesign and cleanup of these
> classes? I suppose that they also need a Python implementation of
> dictionary views -- some of this can be lifted straight out of PEP
> 3106, fortunately.
>

I would love to spend my weekend looking into this. I already read the
PEP 3106 and I think I understand it.
It is carnival, and I'm no fan of samba music.

--
EduardoOPadoan (eopadoan->altavix::com)
Bookmarks: http://del.icio.us/edcrypt

From steven.bethard at gmail.com  Thu Feb 15 16:44:24 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Thu, 15 Feb 2007 08:44:24 -0700
Subject: [Python-3000] UserDict revamp
In-Reply-To: <dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com>
References: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com>
	<dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com>
Message-ID: <d11dcfba0702150744w2343e6f5xa4c743dadccc8e71@mail.gmail.com>

On 2/15/07, Guido van Rossum <guido at python.org> wrote:
> Perhaps the most egregious example is MixinDict, which currently
> assumes that keys() is a primitive operation returning a list, and
> builds __iter__() out of that. Obviously a better approach is to turn
> this around. (I'd have thought that ever since 2.2 this would have
> been the better design, but perhaps it was too late then already.)

I asked the same thing back in early 2005:

    http://mail.python.org/pipermail/python-list/2005-January/300042.html

Glad to hear I wasn't too out of my mind. ;-)

On 2/15/07, Eduardo EdCrypt O. Padoan <eopadoan at altavix.com> wrote:
> I would love to spend my weekend looking into this. I already read the
> PEP 3106 and I think I understand it.

Let me know if you need any help with this.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From jimjjewett at gmail.com  Thu Feb 15 17:48:02 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 15 Feb 2007 11:48:02 -0500
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <45D28143.9010502@gmail.com>
References: <45D28143.9010502@gmail.com>
Message-ID: <fb6fbf560702150848v64428ea0w2ce97a5de0354ba4@mail.gmail.com>

On 2/13/07, Chris Rebert <cvrebert at gmail.com> wrote:
>      There are currently few, if any, known good uses of the current
>      behavior of mutable default arguments.

Then are there *any* good use cases for the proposed semantics?

Here are the use cases that I can remember seeing for mutable default arguments.

(1)  Not really (treated as) mutable.  ==> Doesn't care about the
mutability semantics.

    >>> def f(extra_settings={}) ...

usually doesn't modify or even store extra_settings; it just wants an
empty (and perhaps iterable) mapping.  (Sometimes, it doesn't even
need that, and is really just providing type information.)

(2)  Storing state between calls. ==> Keep the current semantics

We disagree on how useful this is, and how easily it can be replaced,
but agree that the use exists.

We also agree that it smells bad -- but the problem isn't mutable
arguments.  The problem is that the state variable really isn't
(intended as) a parameter at all, and it feels wrong to pretend that
it is.

In theory, we could fix this with something like C's static

    >>> def f():
    ...         once state_var={}

but in practice, there is some value in leaving it accessible, because of

(2a)  A test harness may wish to pass in its own state_var to get
extra information, or to avoid cluttering the production logs.

(3)  Collecting results ==> the code is buggy, don't encourage it.

    >>> def squares(data, results=[])
    ...         for e in data:
    ...                 results.append(e*e)

should instead be written as

    >>> def squares(data)
    ...         results=[]
    ...         for e in data:
    ...                 results.append(e*e)

to make it clear that results is newly constructed container.

(3b)  Adding stuff to a container ==> ???

I think this is the real motivation; I've done it myself.  But I
realized later that it was bad code.

For example, I may want a filter to return a list of candidates for
further processing.

    >>> def still_valid(data):
    ...         results = []
    ...         for e in data:
    ...                 if good_enough(e):
    ...                         results.append(e)

Hey, and maybe I have some other candidates already ...

    >>> candidates.extend(still_valid(data))

hmm ... but what if I don't usually have any previous candidates?
Couldn't I use a default argument?

    >>> def still_valid(data, results=[]) ...

And now I have buggy code.

The right answer isn't to force re-evaluation of []; it is to be clear
on when your functions will have side effects.  If you don't want call
sites littered with

    >>> candidates = []
    >>> candidates.extend(still_valid(data))

then write a helper, such as

    >>> def extra_candidates(data, known_candidates):
    ...         known_candidates.extend(still_valid(data))

-jJ

From guido at python.org  Thu Feb 15 18:15:56 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 15 Feb 2007 09:15:56 -0800
Subject: [Python-3000] UserDict revamp
In-Reply-To: <d11dcfba0702150744w2343e6f5xa4c743dadccc8e71@mail.gmail.com>
References: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com>
	<dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com>
	<d11dcfba0702150744w2343e6f5xa4c743dadccc8e71@mail.gmail.com>
Message-ID: <ca471dc20702150915n10c2871an4e4a483959f437f@mail.gmail.com>

On 2/15/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 2/15/07, Guido van Rossum <guido at python.org> wrote:
> > Perhaps the most egregious example is MixinDict, which currently
> > assumes that keys() is a primitive operation returning a list, and
> > builds __iter__() out of that. Obviously a better approach is to turn
> > this around. (I'd have thought that ever since 2.2 this would have
> > been the better design, but perhaps it was too late then already.)
>
> I asked the same thing back in early 2005:
>
>     http://mail.python.org/pipermail/python-list/2005-January/300042.html
>
> Glad to hear I wasn't too out of my mind. ;-)

Reading that post, I think that __len__ should also be part of the
primitive operations, at least optionally. The dict view code to
compare two views (or a view and a set; always excluding the values
view which is not a set) for equality makes good use of this since it
knows that if the lengths are unequal the objects cannot be equal. In
order to determine equality without knowing the legth would double the
cost of the operation because you'd end up having to iterate over each
side, checking that all its elements are contained in the other side.
With a length check, you only have to iterate over one side, and only
if the lengths are equal.

Another distinction I'd like to make is between mutable and immutable
mappings. But maybe this is outside the realm of a *dict* mixin, and
belongs in the (more speculative) discussion on abstract base classes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From python at rcn.com  Thu Feb 15 18:38:16 2007
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 15 Feb 2007 09:38:16 -0800
Subject: [Python-3000] UserDict revamp
References: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com><dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com><d11dcfba0702150744w2343e6f5xa4c743dadccc8e71@mail.gmail.com>
	<ca471dc20702150915n10c2871an4e4a483959f437f@mail.gmail.com>
Message-ID: <00fd01c75128$16a67d60$ea146b0a@RaymondLaptop1>

Since I contributed DictMixin and have been responsible for its maintenance,
if no one minds, I would like to be the one to migrate it to Py3.0.


Raymond


----- Original Message ----- 
From: "Guido van Rossum" <guido at python.org>
To: "Steven Bethard" <steven.bethard at gmail.com>
Cc: "Python 3000" <python-3000 at python.org>; "Eduardo EdCrypt O. Padoan" 
<eopadoan at altavix.com>
Sent: Thursday, February 15, 2007 9:15 AM
Subject: Re: [Python-3000] UserDict revamp


> On 2/15/07, Steven Bethard <steven.bethard at gmail.com> wrote:
>> On 2/15/07, Guido van Rossum <guido at python.org> wrote:
>> > Perhaps the most egregious example is MixinDict, which currently
>> > assumes that keys() is a primitive operation returning a list, and
>> > builds __iter__() out of that. Obviously a better approach is to turn
>> > this around. (I'd have thought that ever since 2.2 this would have
>> > been the better design, but perhaps it was too late then already.)
>>
>> I asked the same thing back in early 2005:
>>
>>     http://mail.python.org/pipermail/python-list/2005-January/300042.html
>>
>> Glad to hear I wasn't too out of my mind. ;-)
>
> Reading that post, I think that __len__ should also be part of the
> primitive operations, at least optionally. The dict view code to
> compare two views (or a view and a set; always excluding the values
> view which is not a set) for equality makes good use of this since it
> knows that if the lengths are unequal the objects cannot be equal. In
> order to determine equality without knowing the legth would double the
> cost of the operation because you'd end up having to iterate over each
> side, checking that all its elements are contained in the other side.
> With a length check, you only have to iterate over one side, and only
> if the lengths are equal.
>
> Another distinction I'd like to make is between mutable and immutable
> mappings. But maybe this is outside the realm of a *dict* mixin, and
> belongs in the (more speculative) discussion on abstract base classes.
>
> -- 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-3000/python%40rcn.com 

From bjourne at gmail.com  Thu Feb 15 20:08:20 2007
From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=)
Date: Thu, 15 Feb 2007 20:08:20 +0100
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <fb6fbf560702150848v64428ea0w2ce97a5de0354ba4@mail.gmail.com>
References: <45D28143.9010502@gmail.com>
	<fb6fbf560702150848v64428ea0w2ce97a5de0354ba4@mail.gmail.com>
Message-ID: <740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com>

On 2/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/13/07, Chris Rebert <cvrebert at gmail.com> wrote:
> >      There are currently few, if any, known good uses of the current
> >      behavior of mutable default arguments.
>
> Then are there *any* good use cases for the proposed semantics?

Note that the PEP says _currently_, with the change in semantics the
number of use cases increase drastically. See below.

> Here are the use cases that I can remember seeing for mutable
> default arguments.
>
> (1)  Not really (treated as) mutable.  ==> Doesn't care about the
> mutability semantics.
>
>     >>> def f(extra_settings={}) ...
>
> usually doesn't modify or even store extra_settings; it just wants an
> empty (and perhaps iterable) mapping.  (Sometimes, it doesn't even
> need that, and is really just providing type information.)

That is dangerous code. Sooner or later someone will modify the
extra_settings dict. For me, that is the main attraction of the PEP,
it removes that source of bugs (along with the annoying "if blaha is
None:" thingy).

    class Vector:
        def __init__(self, x, y, z):
            self.x = x
            self.y = y
            self.z = z

    class Ray:
        def __init__(self, direction, origin = Vector(0, 0, 0)):
            self.direction = direction
            self.origin = origin

    ray1 = Ray(Vector(0, 0, -1))
    ray2 = Ray(Vector(0, 0, 1))
    ray3 = Ray(Vector(-1, 0, 0), Vector(2, 3, 4))

The above code looks quite nice, but is wrong.

Not that it matters much, Guido has already rejected the PEP. But the
use cases does exist and there is a problem with how default argument
values are evaluated. Hopefully someone can invent a fix even if this
PEP wasn't it.

-- 
mvh Bj?rn

From python at rcn.com  Fri Feb 16 01:48:51 2007
From: python at rcn.com (Raymond Hettinger)
Date: Thu, 15 Feb 2007 16:48:51 -0800
Subject: [Python-3000] Py3.0 Library Ideas
References: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com><dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com><d11dcfba0702150744w2343e6f5xa4c743dadccc8e71@mail.gmail.com><ca471dc20702150915n10c2871an4e4a483959f437f@mail.gmail.com>
	<00fd01c75128$16a67d60$ea146b0a@RaymondLaptop1>
Message-ID: <012901c75164$3b1e00f0$ea146b0a@RaymondLaptop1>

* Remove the unreliable empty() and full() methods from Queue.py

* Remove jumpahead() from the random API.  It is somewhat uncommon for PRNGs to 
have a closed form solution that jumpsahead N steps.

* Make the primative for random be something generating random bytes rather than 
random floats.  Currently to get a random integer, a generator like the Mersenne 
twister generates two blocks of 4 bytes, which are then turned into a C double 
and then random.py module converts the float back into an integer in the desired 
range.  The long-->float-->long dance could be abbreviated.  This would also 
make it easier to substitute in other generators without making them responsible 
for the long-->float step.

* Get rid of Cookie.SerialCookie and Cookie.SmartCookie

* Modify the heapq.heapreplace() API to compare the new value to the top of the 
heap.  This has come-up more than once.  When using a heap for a priority queue, 
sometimes there is a need to revise the priority of an entry in the middle of 
the heap.  This can be done with heapreplace substituting the new priority/task 
pair and then running a _siftup operation to restore the heap condition.


Raymond 

From cvrebert at gmail.com  Fri Feb 16 06:37:27 2007
From: cvrebert at gmail.com (Chris Rebert)
Date: Thu, 15 Feb 2007 21:37:27 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com>
References: <45D28143.9010502@gmail.com>	
	<fb6fbf560702150848v64428ea0w2ce97a5de0354ba4@mail.gmail.com>
	<740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com>
Message-ID: <45D54317.4020204@gmail.com>

Okay, in light of Guido's comments, alternate idea:

We require all default values to be hash()-able, thus reasonably 
ensuring their immutability. This doesn't deal with the 'x=None...' 
dance, but at least it might stop dangerous code from being written. Or 
if anyone else has ideas, that's great too. Anything to stop the abuses 
of mutable default arguments.

- Chris Rebert


BJ?rn Lindqvist wrote:
> On 2/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:
>> On 2/13/07, Chris Rebert <cvrebert at gmail.com> wrote:
>> >      There are currently few, if any, known good uses of the current
>> >      behavior of mutable default arguments.
>>
>> Then are there *any* good use cases for the proposed semantics?
> 
> Note that the PEP says _currently_, with the change in semantics the
> number of use cases increase drastically. See below.
> 
>> Here are the use cases that I can remember seeing for mutable
>> default arguments.
>>
>> (1)  Not really (treated as) mutable.  ==> Doesn't care about the
>> mutability semantics.
>>
>>     >>> def f(extra_settings={}) ...
>>
>> usually doesn't modify or even store extra_settings; it just wants an
>> empty (and perhaps iterable) mapping.  (Sometimes, it doesn't even
>> need that, and is really just providing type information.)
> 
> That is dangerous code. Sooner or later someone will modify the
> extra_settings dict. For me, that is the main attraction of the PEP,
> it removes that source of bugs (along with the annoying "if blaha is
> None:" thingy).
> 
>    class Vector:
>        def __init__(self, x, y, z):
>            self.x = x
>            self.y = y
>            self.z = z
> 
>    class Ray:
>        def __init__(self, direction, origin = Vector(0, 0, 0)):
>            self.direction = direction
>            self.origin = origin
> 
>    ray1 = Ray(Vector(0, 0, -1))
>    ray2 = Ray(Vector(0, 0, 1))
>    ray3 = Ray(Vector(-1, 0, 0), Vector(2, 3, 4))
> 
> The above code looks quite nice, but is wrong.
> 
> Not that it matters much, Guido has already rejected the PEP. But the
> use cases does exist and there is a problem with how default argument
> values are evaluated. Hopefully someone can invent a fix even if this
> PEP wasn't it.
> 

From ferringb at gmail.com  Fri Feb 16 08:01:04 2007
From: ferringb at gmail.com (Brian Harring)
Date: Thu, 15 Feb 2007 23:01:04 -0800
Subject: [Python-3000] pre-PEP: Default Argument Expressions
In-Reply-To: <45D54317.4020204@gmail.com>
References: <45D28143.9010502@gmail.com>
	<fb6fbf560702150848v64428ea0w2ce97a5de0354ba4@mail.gmail.com>
	<740c3aec0702151108y2232290dqd104bb5609f7bab4@mail.gmail.com>
	<45D54317.4020204@gmail.com>
Message-ID: <20070216070104.GA22681@seldon>

On Thu, Feb 15, 2007 at 09:37:27PM -0800, Chris Rebert wrote:
> Okay, in light of Guido's comments, alternate idea:
> 
> We require all default values to be hash()-able, thus reasonably 
> ensuring their immutability.

Offhand, that's a pretty arbitrary restriction- default __hash__ 
for objects is their address.  Majority of objects *are* mutable also, 
so about all you've managed to block is usage of [] and {}, or objects 
the specifically castrate their __hash__

> but at least it might stop dangerous code from being written.
<snip>
> Anything to stop the abuses of mutable default arguments.

You may not have usage for mutable default args, but others may- 
namely memoization.  Store the cache in the default arg.

Upshot of it, the cache isn't sitting out in the global namespace; you 
can achieve the same with a memoization object/descriptor, but those 
approaches break down since the key calculation can only be 
args/kwargs based, rather then generating a key in a simpler way.  
Further, if there *are* kwargs involved, the memoizer has to know the 
default args for the target, and slip those in everytime which gets 
fairly ugly.

Personally, I'm -1 on suggestions thus far- further, -1 on trying to 
block mutables from default args.

Would suggest creating a tool to scan for potential issues rather then 
trying to strip mutable default args from the language.

~harring
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070215/32844399/attachment.pgp 

From eopadoan at altavix.com  Fri Feb 16 12:30:40 2007
From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan)
Date: Fri, 16 Feb 2007 09:30:40 -0200
Subject: [Python-3000] [Python-Dev]  UserDict revamp
In-Reply-To: <002701c75145$75ece080$ea146b0a@RaymondLaptop1>
References: <ca471dc20702142051j7a7e6621q56ad5d3503720225@mail.gmail.com>
	<dea92f560702150424k68d20b6bh790808191b80cef2@mail.gmail.com>
	<d11dcfba0702150744w2343e6f5xa4c743dadccc8e71@mail.gmail.com>
	<ca471dc20702150915n10c2871an4e4a483959f437f@mail.gmail.com>
	<00fd01c75128$16a67d60$ea146b0a@RaymondLaptop1>
	<d11dcfba0702151305t75885df4i51320a3b13db2bd2@mail.gmail.com>
	<002701c75145$75ece080$ea146b0a@RaymondLaptop1>
Message-ID: <dea92f560702160330j106ed8e5xa43145d968429282@mail.gmail.com>

[Steve]
> No complaints here.  Not that you need my permission of course. ;-)

Same here, obviously.

[Raymond]
> Thanks, I had already started working on this one.
> Of course, everyone is welcome to contribute.

Ok, you can count on that.

--
EduardoOPadoan (eopadoan->altavix::com)
Bookmarks: http://del.icio.us/edcrypt

From eopadoan at altavix.com  Sat Feb 17 04:57:16 2007
From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan)
Date: Sat, 17 Feb 2007 01:57:16 -0200
Subject: [Python-3000] PEPs 3xxx status
Message-ID: <dea92f560702161957u605313d4re7d51ee5f6e6b3e0@mail.gmail.com>

All the 3xxx PEPs except 3100 and the "meta" ones are marked  a draft.
While I understand that many has some open issues, even the
implemented ones (3102, 3105, 3106, 3107, 3110) still run the risk of
being withdrawn?

-- 
EduardoOPadoan (eopadoan->altavix::com)
Bookmarks: http://del.icio.us/edcrypt

From talin at acm.org  Sat Feb 17 06:39:01 2007
From: talin at acm.org (Talin)
Date: Fri, 16 Feb 2007 21:39:01 -0800
Subject: [Python-3000] PEPs 3xxx status
In-Reply-To: <dea92f560702161957u605313d4re7d51ee5f6e6b3e0@mail.gmail.com>
References: <dea92f560702161957u605313d4re7d51ee5f6e6b3e0@mail.gmail.com>
Message-ID: <45D694F5.50109@acm.org>

Eduardo "EdCrypt" O. Padoan wrote:
> All the 3xxx PEPs except 3100 and the "meta" ones are marked  a draft.
> While I understand that many has some open issues, even the
> implemented ones (3102, 3105, 3106, 3107, 3110) still run the risk of
> being withdrawn?

I don't know about the others, however I want to speak to the issue of 
3101 and 3102, since I wrote them - the main reason that those PEPs 
haven't been accepted is that there's no sample implementation to 
evaluate. (At least, I'm not aware of any implementation of them, unless 
someone did it while I wasn't looking :)

As I stated early on in this process, I don't really enjoy working on 
the innards of Python as much as I enjoy working *in* Python - and Guido 
seemed to find this acceptable when I asked him about it. In addition, 
I've been very busy lately, as my absence from this list illustrates.

I have written a Python implementation of 3101 that can be used as a 
model, but the actual implementation needs to be in C, since it's 
anticipated that it will be a built-in function. Some of the number 
formatting operations are best done in C anyway.

Several people have put forward tentative offers to implement these two 
PEPs, however there's been no follow up that I know of.

I should also note that these PEPs should really be targeted at the 2.x 
series, since there's nothing fundamentally "3000-ish" about them, so 
the 31xx numbering is kind of a misnomer. There's no backwards 
compatibility impact in either case.

-- Talin

From eopadoan at altavix.com  Sat Feb 17 14:07:59 2007
From: eopadoan at altavix.com (Eduardo "EdCrypt" O. Padoan)
Date: Sat, 17 Feb 2007 11:07:59 -0200
Subject: [Python-3000] PEPs 3xxx status
In-Reply-To: <45D694F5.50109@acm.org>
References: <dea92f560702161957u605313d4re7d51ee5f6e6b3e0@mail.gmail.com>
	<45D694F5.50109@acm.org>
Message-ID: <dea92f560702170507u30cb09e7o17e99d2242b89b1e@mail.gmail.com>

On 2/17/07, Talin <talin at acm.org> wrote:
> I don't know about the others, however I want to speak to the issue of
> 3101 and 3102, since I wrote them - the main reason that those PEPs
> haven't been accepted is that there's no sample implementation to
> evaluate. (At least, I'm not aware of any implementation of them, unless
> someone did it while I wasn't looking :)

At least for 3102, yes, someone did:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1549670&group_id=5470

-- 
EduardoOPadoan (eopadoan->altavix::com)
Bookmarks: http://del.icio.us/edcrypt

From guido at python.org  Sat Feb 17 19:02:37 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 17 Feb 2007 10:02:37 -0800
Subject: [Python-3000] PEPs 3xxx status
In-Reply-To: <dea92f560702170507u30cb09e7o17e99d2242b89b1e@mail.gmail.com>
References: <dea92f560702161957u605313d4re7d51ee5f6e6b3e0@mail.gmail.com>
	<45D694F5.50109@acm.org>
	<dea92f560702170507u30cb09e7o17e99d2242b89b1e@mail.gmail.com>
Message-ID: <ca471dc20702171002u117c0c65k74d52aa64741e631@mail.gmail.com>

And it's in the p3yk branch, too.

The main reason these are all still drafts is that I expect that
implementing them may cause a certain amount of redesign, and in some
cases the spec isn't entirely clear. The "real" acceptance status (in
my head) is all over the map -- 3102 is obviously accepted, 3101
likely, 3103 unlikely, 3104 possibly, 3108 is too early to tell, and
the rest (3105, 06, 07, 09, 10) are accepted.

--Guido

On 2/17/07, Eduardo EdCrypt O. Padoan <eopadoan at altavix.com> wrote:
> On 2/17/07, Talin <talin at acm.org> wrote:
> > I don't know about the others, however I want to speak to the issue of
> > 3101 and 3102, since I wrote them - the main reason that those PEPs
> > haven't been accepted is that there's no sample implementation to
> > evaluate. (At least, I'm not aware of any implementation of them, unless
> > someone did it while I wasn't looking :)
>
> At least for 3102, yes, someone did:
> https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1549670&group_id=5470
>
> --
> EduardoOPadoan (eopadoan->altavix::com)
> Bookmarks: http://del.icio.us/edcrypt
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Sat Feb 17 19:42:52 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 17 Feb 2007 19:42:52 +0100
Subject: [Python-3000] PEPs 3xxx status
In-Reply-To: <ca471dc20702171002u117c0c65k74d52aa64741e631@mail.gmail.com>
References: <dea92f560702161957u605313d4re7d51ee5f6e6b3e0@mail.gmail.com>	<45D694F5.50109@acm.org>	<dea92f560702170507u30cb09e7o17e99d2242b89b1e@mail.gmail.com>
	<ca471dc20702171002u117c0c65k74d52aa64741e631@mail.gmail.com>
Message-ID: <er7ibd$fk9$1@sea.gmane.org>

Guido van Rossum schrieb:
> And it's in the p3yk branch, too.
> 
> The main reason these are all still drafts is that I expect that
> implementing them may cause a certain amount of redesign, and in some
> cases the spec isn't entirely clear. The "real" acceptance status (in
> my head) is all over the map -- 3102 is obviously accepted, 3101
> likely, 3103 unlikely, 3104 possibly, 3108 is too early to tell, and
> the rest (3105, 06, 07, 09, 10) are accepted.

I updated the PEP index to reflect that.

Georg


From jimjjewett at gmail.com  Sun Feb 18 04:18:53 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 17 Feb 2007 22:18:53 -0500
Subject: [Python-3000] immutable classes [was: pre-PEP: Default Argument
	Expressions]
Message-ID: <fb6fbf560702171918y33d79991j727b2ccdfeef8513@mail.gmail.com>

I have added python-ideas to the Cc list, and suggest removing
python-3000 from additional replies.

BJ?rn Lindqvist gave an example explaining why he might want to
re-evaluate mutable default arguments.  It still looks like like buggy
code, but it isn't the error I was expecting -- and I think it comes
from the difficulty of declaring something immutable.

On 2/15/07, BJ?rn Lindqvist <bjourne at gmail.com> wrote:
> On 2/15/07, Jim Jewett <jimjjewett at gmail.com> wrote:

> > Then are there *any* good use cases for [non-persistent mutable defaults]

> > (1)  Not really (treated as) mutable.  ==> Doesn't care

> >     >>> def f(extra_settings={}) ...

> > usually doesn't modify or even store extra_settings;  ...

> That is dangerous code. Sooner or later someone will modify the
> extra_settings dict.

How?

    >>> f.func_defaults[0]['key']=value

may be misguided, but it probably isn't an accident.

BJ?rn's example does store the mutable directly, but it makes a bit
more sense because it looks like a complex object rather than just a
mapping.

>     class Vector:
>         def __init__(self, x, y, z):
>             self.x = x
>             self.y = y
>             self.z = z

>     class Ray:
>         def __init__(self, direction, origin = Vector(0, 0, 0)):
>             self.direction = direction
>             self.origin = origin
>
>     ray1 = Ray(Vector(0, 0, -1))
>     ray2 = Ray(Vector(0, 0, 1))
>     ray3 = Ray(Vector(-1, 0, 0), Vector(2, 3, 4))

> The above code looks quite nice, but is wrong.

Why is vector mutable?

Is the real problem that it is too hard to declare objects or
attributes immutable?

My solution is below, but I'll grant that it isn't as straightforward
as I would have liked.  Is this something that could be solved with a
recipe, or a factory to make immutable classes?

>>> class Vector3D(tuple):
...         def __new__(self, x, y, z):
...             return super(Vector3D, self).__new__(self, (x, y, z))
...         x=property(lambda self: self[0])
...         y=property(lambda self: self[1])
...         z=property(lambda self: self[2])

-jJ

From andre.roberge at gmail.com  Mon Feb 19 19:32:05 2007
From: andre.roberge at gmail.com (Andre Roberge)
Date: Mon, 19 Feb 2007 14:32:05 -0400
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>
	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
Message-ID: <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>

Any possibility that (some of) the following can be done before Pycon?
Respectfully yours,
Andr? Roberge

On 12/23/06, Guido van Rossum <guido at python.org> wrote:
[http://mail.python.org/pipermail/python-3000/2006-December/005257.html]
> BTW, can someone clean up and check in the proto-PEP and start working
> on an implementation or patch? Should be really simple. I'd like to
> see a patch for the refactoring tool (sandbox/2to3) as well.

This was as a follow up to:
http://mail.python.org/pipermail/python-3000/2006-December/005249.html

On 12/22/06, Guido van Rossum <guido at python.org> wrote:
> I like the exact proposal made here better than any of the
> alternatives mentioned so far.
>
> - Against naming it readline(): the "real" readline doesn't strip the
> \n and returns an empty string for EOF instead of raising EOFError; I
> believe the latter is more helpful for true beginners' code.
>
> - Against naming it ask() and renaming print() to say(): I find those
> rather silly names that belong in toy or AI languages. Changing print
> from statement to function maintains Pythonicity; renaming it say()
> does not.
>
> - I don't expect there will be much potential confusion with the 2.x
> input(); that function is used extremely rarely. It will be trivial to
> add rules to the refactoring tool (sandbox/2to3/) that replace input()
> with eval(input()) and replace raw_input() with input().
>
> --Guido
>
> On 12/22/06, Andre Roberge <andre.roberge at gmail.com> wrote:
> > A few months ago, there was an active discussion on edu-sig regarding
> > the proposed fate of raw_input().  The text below is an attempt at
> > summarizing the discussion in the form of a tentative PEP.
> > It is respectfully submitted for your consideration.
> >
> > If it is to be considered, in some form, as an official PEP, I have
> > absolutely no objection for a regular python-dev contributor to take over
> > the
> > ownership/authorship.
> >
> > Andr? Roberge
> >
> > -----------------------------------------------------------------
> > PEP: XXX
> > Title: Simple input built-in in Python 3000
> > Version: $Revision: 0.2 $
> > Last-Modified: $Date: 2006/12/22 10:00:00 $
> > Author: Andr? Roberge <andre.roberge at gmail.com >
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 13-Sep-2006
> > Python-Version: 3.0
> > Post-History:
> >
> > Abstract
> > ========
> >
> > Input and output are core features of computer programs.  Currently,
> > Python provides a simple means of output through the print keyword
> > and two simple means of interactive input through the input()
> > and raw_input() built-in functions.
> >
> > Python 3.0 will introduces various incompatible changes with previous
> > Python versions[1].  Among the proposed changes, print will become a
> > built-in
> > function, print(), while input() and raw_input() would be removed completely
> > from the built-in namespace, requiring importing some module to provide
> > even the most basic input capability.
> >
> > This PEP proposes that Python 3.0 retains some simple interactive user
> > input capability, equivalent to raw_input(), within the built-in namespace.
> >
> > Motivation
> >  ==========
> >
> > With its easy readability and its support for many programming styles
> > (e.g. procedural, object-oriented, etc.) among others, Python is perhaps
> > the best computer language to use in introductory programming classes.
> > Simple programs often need to provide information to the user (output)
> > and to obtain information from the user (interactive input).
> > Any computer language intended to be used in an educational setting should
> >  provide straightforward methods for both output and interactive input.
> >
> > The current proposals for Python 3.0 [1] include a simple output pathway
> > via a built-in function named print(), but a more complicated method for
> > input [e.g. via sys.stdin.readline()], one that requires importing an
> > external
> > module.  Current versions of Python (pre-3.0) include raw_input() as a
> > built-in function.  With the availability of such a function, programs that
> > require simple input/output can be written from day one, without requiring
> > discussions of importing modules, streams, etc.
> >
> > Rationale
> > =========
> >
> > Current built-in functions, like input() and raw_input(), are found to be
> > extremely useful in traditional teaching settings. (For more details,
> > see [2] and the discussion that followed.)
> > While the BDFL has clearly stated [3] that input() was not to be kept in
> > Python 3000, he has also stated that he was not against revising the
> > decision of killing raw_input().
> >
> > raw_input() provides a simple mean to ask a question and obtain a response
> > from a user.  The proposed plans for Python 3.0 would require the
> > replacement
> > of the single statement
> >
> > name = raw_input("What is your name?")
> >
> > by the more complicated
> >
> > import sys
> > print("What is your name?")
> > same = sys.stdin.readline()
> >
> > However, from the point of view of many Python beginners and educators, the
> > use of sys.stdin.readline() presents the following problems:
> >
> > 1. Compared to the name "raw_input", the name "sys.stdin.readline()"
> > is clunky and inelegant.
> >
> > 2. The names "sys" and "stdin" have no meaning for most beginners,
> > who are mainly interested in *what* the function does, and not *where*
> > in the package structure it is located.  The lack of meaning also makes
> > it difficult to remember:
> > is it "sys.stdin.readline()", or " stdin.sys.readline()"?
> > To a programming novice, there is not any obvious reason to prefer
> > one over the other. In contrast, functions simple and direct names like
> > print, input, and raw_input, and open are easier to remember.
> >
> > 3. The use of "." notation is unmotivated and confusing to many beginners.
> > For example, it may lead some beginners to think "."  is a standard
> > character that could be used in any identifier.
> >
> > 4. There is an asymmetry with the print function: why is print not called
> > sys.stdout.print()?
> >
> >
> > Specification
> > =============
> >
> > The built-in input function should be totally equivalent to the existing
> > raw_input() function.
> >
> > Open issues
> > ===========
> >
> > With input() effectively removed from the language, the name raw_input()
> > makes much less sense and alternatives should be considered.  The
> > various possibilities mentioned in various forums include:
> >
> > ask()
> > ask_user()
> > get_string()
> > input()  # rejected by BDFL
> > prompt()
> > read()
> > user_input()
> > get_response()
> >
> > While it has bee rejected by the BDFL, it has been suggested that the most
> > direct solution would be to rename "raw_input" to "input" in Python 3000.
> > The main objection is that Python 2.x already has a function named "input",
> > and, even though it is not going to be included in Python 3000,
> > having a built-in function with the same name but different semantics may
> > confuse programmers migrating from 2.x to 3000.  Certainly, this is no
> > problem
> > for beginners, and the scope of the problem is unclear for more experienced
> > programmers, since raw_input(), while popular with many, is not in
> > universal use.  In this instance, the good it does for beginners could be
> > seen to outweigh the harm it does to experienced programmers -
> > although it could cause confusion for people reading older books or
> > tutorials.
> >
> >
> > References
> > ==========
> >
> > .. [1] PEP 3100, Miscellaneous Python 3.0 Plans, Kuchling, Cannon
> > (http://www.python.org/dev/peps/pep-3100/)
> > .. [2] The fate of raw_input() in Python 3000
> > (http://mail.python.org/pipermail/edu-sig/2006-September/006967.html)
> > .. [3] Educational aspects of Python 3000
> > (
> > http://mail.python.org/pipermail/python-3000/2006-September/003589.html)
> >
> >
> > Copyright
> > =========
> >
> > This document has been placed in the public domain.
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe:
> > http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
> >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From jan.kanis at phil.uu.nl  Mon Feb 19 23:55:45 2007
From: jan.kanis at phil.uu.nl (Jan Kanis)
Date: Mon, 19 Feb 2007 23:55:45 +0100
Subject: [Python-3000] [Python-ideas] immutable classes [was: pre-PEP:
	Default Argument Expressions]
In-Reply-To: <fb6fbf560702171918y33d79991j727b2ccdfeef8513@mail.gmail.com>
References: <fb6fbf560702171918y33d79991j727b2ccdfeef8513@mail.gmail.com>
Message-ID: <op.tn0py8iid64u53@jan-lenovo>

On the 'using old semantics when you really want to' part, that's very  
well possible with a decorator under the proposed semantics:

def caching(**cachevars):
	def inner(func):
		def wrapper(**argdict):
			for var in cachevars:
				if not var in argdict:
					argdict[var] = cachevars[var]
			return func(**argdict)
		return wrapper
	return inner

@caching(cache={})
def foo(in, cache):
	result = bar(in)
	cache[in] = result
	return result

This implementation of caching doesn't handle positional args, but it can  
be made to. One such decorator would still be a net win of several hundred  
lines of code in the standard lib.


Of course, IMHO, the real fix to this is to 1) have default expressions be  
evaluated at calltime, and 2) have _all_ lexical variables be bound at  
definition time and 3) make them immutable. Then something like

lst = []
for i in range(10):
	lst.append(lambda i: i*i)

would work. That would be a real win for functional programming. (good  
thing)
Unfortunately Guido's decided not to support (1), and (2) has been  
proposed some time ago and didn't make it. In both cases because it would  
be to big a departure from how Python currently works. (3) is quite  
impossible in a language like python. <mode=dreaming> I just hope if  
python were designed today it would have done these. </mode>

- Jan

From raymond.hettinger at verizon.net  Tue Feb 20 09:55:57 2007
From: raymond.hettinger at verizon.net (Raymond Hettinger)
Date: Tue, 20 Feb 2007 00:55:57 -0800
Subject: [Python-3000] Thoughts on dictionary views
Message-ID: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>

The Java concept of dictionary views seems to have caught-on here while I wasn't 
looking.  At the risk of covering some old ground, I would like to re-open the 
question.  Here are a few thoughts on the subject to kick-off the discussion:

* Maintaining a live (self-updating) view is a bit tricky from an implementation 
point-of-view.  While it is clearly doable for dictionaries, it is not clear 
that it is a good idea for a general mapping API which can be wrapped around 
dbms, shelves, elementtrees, b-trees, and other wrascally rabbits.  I doubt that 
the underlying structures of other mapping types support the observer pattern 
necessary to keep views updated -- this is doubly true if the underlying data is 
on disk and can be updated by other processes, threads, etc.

* One of the purported benefits is to provide set-like behavior without the 
expense of copying to a new set object.  FWIW, I've updated the set 
implementation to be more interoperable with dictionaries so that the conversion 
costs are negligible (about the same as a dict resize operation -- one pass, no 
calls to PyObject_Hash, insertion into a presized, sparse table with very few 
collisions).

* A dict is also one of Python's most basic APIs (along with lists).  Ideally, 
we should keep those two APIs as simple as possible (getting rid of setdefault() 
and unneeded methods is a step in the right direction).  IMO, the views will be 
the hardest part of the API to explain and interact with when learning the 
language -- to learn about dicts and lists, you already have to learn about 
mutability and hashability -- it doesn't help this situation if you then need to 
learn about self-updating views that can be deleted, have modified values, but 
cannot be added, and that have their own set-like operations but aren't really 
sets . . .

* ISTM that views offer three benefits:  re-iterability, set behavior, and 
self-updates.  IMO, the first is not commonly needed and is trivially served by 
writing list(mydict.items()) or somesuch.  The second is best served by an 
explicit conversion to a set or frozenset type -- those two types have been 
enormously successful in that they seem to offer a near zero learning curve --  
people seem to intuitively know how to use them right out of the box.  As long 
as that conversion is fast, I think the explicit conversion is the way to go --  
it is the way you would do it with any other Python type where you wanted set 
behavior.  Adding a handful of set methods to dict views would only complicate 
an otherwise simple situation and introduce unnecessary complexity (i.e. what 
should isinstance(d.d_keys, set) return?).  The third benefit (self-updates) is 
more interesting and does not have a direct analog with existing python tools, 
so the question is how valuable is self-updating behavior and are there 
compelling use cases that warrant a more complex API?

My recommendation is to take a more conservative route.  Let's make dicts as 
simple as possible and then introduce a new collections module entry with the 
views bells and whistles.  If the collections version proves itself as 
enormously popular, useful, understandable, and without a good equivalent, then 
it can ask for a promotion.  The collections module is nice place to put in 
alternate datatypes that meet the more demanding needs of advanced users who 
know exactly what they want/need in terms of special behaviors or performance. 
And, if we take the collections module route, there is no reason that it cannot 
be put into Py2.6 where people will either flock to it or ignore it, with either 
result providing us with good guidance for Py3.0.

my-two-cents,


Raymond 


From ncoghlan at gmail.com  Tue Feb 20 14:24:35 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 20 Feb 2007 23:24:35 +1000
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
Message-ID: <45DAF693.1000004@gmail.com>

Raymond Hettinger wrote:
> The Java concept of dictionary views seems to have caught-on here while I wasn't 
> looking.  At the risk of covering some old ground, I would like to re-open the 
> question.  Here are a few thoughts on the subject to kick-off the discussion:
> 
> * Maintaining a live (self-updating) view is a bit tricky from an implementation 
> point-of-view.  While it is clearly doable for dictionaries, it is not clear 
> that it is a good idea for a general mapping API which can be wrapped around 
> dbms, shelves, elementtrees, b-trees, and other wrascally rabbits.  I doubt that 
> the underlying structures of other mapping types support the observer pattern 
> necessary to keep views updated -- this is doubly true if the underlying data is 
> on disk and can be updated by other processes, threads, etc.

FWIW, the py3k trunk is still somewhat broken from the implementation of 
this change. Without any test resources enabled, I still get more than 
half a dozen failures which appear at a glance to be related to 
unexpected mutation of dictionaries (test_anydbm, test_dumbdbm, 
test_mutants, test_compile, test_iter, test_iterlen, test_minidom, 
test_os, test_importhooks, test_unittest)

> 
> * One of the purported benefits is to provide set-like behavior without the 
> expense of copying to a new set object.  FWIW, I've updated the set 
> implementation to be more interoperable with dictionaries so that the conversion 
> costs are negligible (about the same as a dict resize operation -- one pass, no 
> calls to PyObject_Hash, insertion into a presized, sparse table with very few 
> collisions).

The speed costs may become negligible, but I believe the main concern 
here is memory consumption (minimising memory usage is certainly the 
only reason I've ever made sure to use the dict.iter* methods).

However, the string discussion has given me another view on that front, 
too... (more on that below)

> My recommendation is to take a more conservative route.  Let's make dicts as 
> simple as possible and then introduce a new collections module entry with the 
> views bells and whistles.  If the collections version proves itself as 
> enormously popular, useful, understandable, and without a good equivalent, then 
> it can ask for a promotion.  The collections module is nice place to put in 
> alternate datatypes that meet the more demanding needs of advanced users who 
> know exactly what they want/need in terms of special behaviors or performance. 
> And, if we take the collections module route, there is no reason that it cannot 
> be put into Py2.6 where people will either flock to it or ignore it, with either 
> result providing us with good guidance for Py3.0.

One of the things that's been suggested for working with strings in Py3k 
is a stringview type - a wrapper type around a string where slicing (and 
similar operations, like partition()) creates objects with a reference & 
offset into the original string rather than actually copying data 
around. Standard strings would be entirely unaffected. The concept of 
use being that a functions would accept a string-like object as an 
object, wrap the stringview around it, then convert the result back to a 
normal string before passing it back to the caller.

Couldn't something similar serve as a replacement for the iter*() 
methods on dictionaries? That is, rather than copying the data into a 
different data structure, instead provide a group of wrapper classes, 
each of which operate on the wrapped mapping in a different way?

The necessary wrapper classes needed would be:
   - set API exposing keys of the original mapping
   - multiset API exposing values of the underlying mapping
   - keyed set API exposing (key, value) pairs of the original mapping
   - mapping API that uses the above for keys(), values() & items(), but
     otherwise delegates operations to the original mapping

All except the last already exist in the Py3k branch (as the role of the 
last suggested wrapper type is currently being handled by the dict data 
type itself)

Similar to Raymond's suggestion of a new concrete container type (rather 
than a wrapper type), this could be included in the standard library for 
Python 2.6, making it significantly easier to write forward compatible code.

Given such a new dict wrapper type (or standalone container type, if 
Raymond's approach is taken), then it would also be possible to change 
the basic mapping API to define different return types for the 3 methods 
that currently return lists:
   - keys() would be changed to return a set
   - values() would be changed to return a multiset (non-hash based)
   - items() would be changed to return a keyed set (ala sort keys)

The difference from the current Py3k branch is that these would still 
involve copying the data from the original dictionary to a new concrete 
container object which is then returned.

The more I've seen of these discussions (the original dict method one, 
as well as the string concatenation/slicing one), the more leery I 
become of including any view type behaviour in the basic data types.

One factor in this is that I've been getting back into C++ coding 
lately, and keep getting reminded of the various cases where the C++ 
standard defaults to behaviours that are faster (use less memory, 
whatever) when they're valid, but silently do the wrong thing when 
they're inappropriate (default assignment and copy constructions 
operators that lead to significant memory double-free problems are a 
nice example, as is the fact that methods are non-virtual by default). 
So rather than spend the time to figure out whether or not the default 
behaviour is safe for each case, it becomes quicker and easier to just 
stick in the boilerplate to tell the compiler "don't do the default 
thing, it is probably wrong", thus completely invalidating the supposed 
performance improvement that was meant to be provided by the default 
behaviour (and requiring a programmer to put in a comment saying so when 
the default behaviour really is what they want).

Typically, Python doesn't work that way - it defaults to 'safe' 
behaviour, but provides sufficient flexibility to permit optimisation 
when it is necessary (e.g., with the size of the data sets the NumPy 
folks sling around, view-based behaviour is essential, and Python lets 
them do it that way).

My apologies for rambling a bit - I can't currently give a succinct 
explanation for why the current direction feels wrong, but I felt it was 
worth supporting Raymond on this point.

Regards,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue Feb 20 14:28:23 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 20 Feb 2007 23:28:23 +1000
Subject: [Python-3000] [Python-ideas] immutable classes [was: pre-PEP:
 Default Argument Expressions]
In-Reply-To: <op.tn0py8iid64u53@jan-lenovo>
References: <fb6fbf560702171918y33d79991j727b2ccdfeef8513@mail.gmail.com>
	<op.tn0py8iid64u53@jan-lenovo>
Message-ID: <45DAF777.8070708@gmail.com>

Jan Kanis wrote:
<mode=dreaming> I just hope if
> python were designed today it would have done these. </mode>

If Python had done these, it wouldn't be Python ;)

There are many, many programming language design decisions which have 
good arguments on each side (and some which seem obviously correct may 
involve hidden costs which aren't appreciated until after it is too late 
to change them). That's one of the major reasons why there are so many 
different programming languages out there.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue Feb 20 14:59:58 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 20 Feb 2007 23:59:58 +1000
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
	<7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
Message-ID: <45DAFEDE.4030109@gmail.com>

Andre Roberge wrote:
> Any possibility that (some of) the following can be done before Pycon?
> Respectfully yours,
> Andr? Roberge

I've added the PEP as 3111. I made a few small modifications (and 
committed it directly as Accepted) based on Guido's comments in this thread.

The actual change still needs to be made, though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jason.orendorff at gmail.com  Tue Feb 20 15:42:20 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Tue, 20 Feb 2007 09:42:20 -0500
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
Message-ID: <bb8868b90702200642n24b4e072j57578b269e0661a1@mail.gmail.com>

On 2/20/07, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:
> * A dict is also one of Python's most basic APIs (along with lists).  Ideally,
> we should keep those two APIs as simple as possible (getting rid of setdefault()
> and unneeded methods is a step in the right direction).  IMO, the views will be
> the hardest part of the API to explain and interact with when learning the
> language [...]

I agree.  Views will make dicts harder to learn for newcomers and
trickier to use even for experts.

The current non-aliasing behavior is a feature.  Seen that way, the
switch to views seems like a broken optimization.

> * ISTM that views offer three benefits:  re-iterability, set behavior, and
> self-updates. [...]

I think the benefit the team really liked was #4, "delete iterkeys(),
itervalues(), and iteritems() from the mapping API".  But this now
seems like false economy to me.

-j

From steven.bethard at gmail.com  Tue Feb 20 16:08:12 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 20 Feb 2007 08:08:12 -0700
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
Message-ID: <d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>

On 2/20/07, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:
> * ISTM that views offer three benefits:  re-iterability, set behavior, and
> self-updates.  IMO, the first is not commonly needed and is trivially served by
> writing list(mydict.items()) or somesuch.  The second is best served by an
> explicit conversion to a set or frozenset type
[snip]
> My recommendation is to take a more conservative route.  Let's make dicts as
> simple as possible and then introduce a new collections module entry with the
> views bells and whistles.

Just to clarfiy, you're suggesting that we still change .keys()
.values() and .items() to iterators, right?

If so, +1.  I was also starting to get a bit nervous about the new
complexity of dict().  Putting the view-like behavior into the
collections module makes good sense.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From p.f.moore at gmail.com  Tue Feb 20 16:13:57 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 20 Feb 2007 15:13:57 +0000
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
Message-ID: <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com>

On 20/02/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 2/20/07, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:
> > My recommendation is to take a more conservative route.  Let's make dicts as
> > simple as possible and then introduce a new collections module entry with the
> > views bells and whistles.
>
> Just to clarfiy, you're suggesting that we still change .keys()
> .values() and .items() to iterators, right?
>
> If so, +1.  I was also starting to get a bit nervous about the new
> complexity of dict().  Putting the view-like behavior into the
> collections module makes good sense.

I'm also +1. (I have similar concerns over the "new IO" proposals I've
seen, but there's nothing concrete there yet, so I'll save that
argument for another day...)

Paul

From aahz at pythoncraft.com  Tue Feb 20 16:42:04 2007
From: aahz at pythoncraft.com (Aahz)
Date: Tue, 20 Feb 2007 07:42:04 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
Message-ID: <20070220154204.GB20369@panix.com>

On Tue, Feb 20, 2007, Raymond Hettinger wrote:
>
> My recommendation is to take a more conservative route.  Let's make   
> dicts as simple as possible and then introduce a new collections      
> module entry with the views bells and whistles.  If the collections   
> version proves itself as enormously popular, useful, understandable,  
> and without a good equivalent, then it can ask for a promotion.       

+1, and thank you for cogently writing up the unease that I was feeling
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"I disrespectfully agree."  --SJM

From guido at python.org  Tue Feb 20 16:51:07 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 07:51:07 -0800
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <45DAFEDE.4030109@gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>
	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
	<7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
	<45DAFEDE.4030109@gmail.com>
Message-ID: <ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>

Why do you want this *before* PyCon? It would be much easier to do
this as part of the Py3k sprint.

On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Andre Roberge wrote:
> > Any possibility that (some of) the following can be done before Pycon?
> > Respectfully yours,
> > Andr? Roberge
>
> I've added the PEP as 3111. I made a few small modifications (and
> committed it directly as Accepted) based on Guido's comments in this thread.
>
> The actual change still needs to be made, though.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Feb 20 18:09:16 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 09:09:16 -0800
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>
	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
	<7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
	<45DAFEDE.4030109@gmail.com>
	<ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>
	<7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com>
Message-ID: <ca471dc20702200909x22d0184eg3f56d6e9c8bacd26@mail.gmail.com>

Consider the PEP accepted.

Regarding the conversion, please do use the sandbox/2to3 framework.
Write me if you have trouble understanding the many examples already
in fixes/.

On 2/20/07, Andre Roberge <andre.roberge at gmail.com> wrote:
> On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> > Why do you want this *before* PyCon? It would be much easier to do
> > this as part of the Py3k sprint.
> >
>
> My main interest was to have, prior to Pycon, the PEP recorded as
> such; it had been close to 2 months since the last post on this issue
> on the list.
>
> As for the actual work, I'd be willing to volunteer to write the
> required code (with test cases) that could be use to do the conversion
> input(...)  ->  eval(input(...))
> raw_input(...)  ->  input(...)
>
> Unfortunately, I will not be participating in any sprints.
>
> Andr?
>
>
>
> > On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > > Andre Roberge wrote:
> > > > Any possibility that (some of) the following can be done before Pycon?
> > > > Respectfully yours,
> > > > Andr? Roberge
> > >
> > > I've added the PEP as 3111. I made a few small modifications (and
> > > committed it directly as Accepted) based on Guido's comments in this thread.
> > >
> > > The actual change still needs to be made, though.
> > >
> > > Cheers,
> > > Nick.
> > >
> > > --
> > > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> > > ---------------------------------------------------------------
> > >              http://www.boredomandlaziness.org
> > > _______________________________________________
> > > Python-3000 mailing list
> > > Python-3000 at python.org
> > > http://mail.python.org/mailman/listinfo/python-3000
> > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/)
> >
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From andre.roberge at gmail.com  Tue Feb 20 17:58:11 2007
From: andre.roberge at gmail.com (Andre Roberge)
Date: Tue, 20 Feb 2007 12:58:11 -0400
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>
	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
	<7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
	<45DAFEDE.4030109@gmail.com>
	<ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>
Message-ID: <7528bcdd0702200858j74653284x1d368920c9b34e5e@mail.gmail.com>

On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> Why do you want this *before* PyCon? It would be much easier to do
> this as part of the Py3k sprint.
>

My main interest was to have, prior to Pycon, the PEP recorded as
such; it had been close to 2 months since the last post on this issue
on the list.

As for the actual work, if no regular developer is interested, I'd be
willing to volunteer to write the required code (with test cases) that
could be use to do the conversion
input(...)  ->  eval(input(...))
raw_input(...)  ->  input(...)

Unfortunately, I will not be participating in any sprints.

Andr?



> On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > Andre Roberge wrote:
> > > Any possibility that (some of) the following can be done before Pycon?
> > > Respectfully yours,
> > > Andr? Roberge
> >
> > I've added the PEP as 3111. I made a few small modifications (and
> > committed it directly as Accepted) based on Guido's comments in this thread.
> >
> > The actual change still needs to be made, though.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> > ---------------------------------------------------------------
> >              http://www.boredomandlaziness.org
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From andre.roberge at gmail.com  Tue Feb 20 18:01:38 2007
From: andre.roberge at gmail.com (Andre Roberge)
Date: Tue, 20 Feb 2007 13:01:38 -0400
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>
	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
	<7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
	<45DAFEDE.4030109@gmail.com>
	<ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>
Message-ID: <7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com>

On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> Why do you want this *before* PyCon? It would be much easier to do
> this as part of the Py3k sprint.
>

My main interest was to have, prior to Pycon, the PEP recorded as
such; it had been close to 2 months since the last post on this issue
on the list.

As for the actual work, I'd be willing to volunteer to write the
required code (with test cases) that could be use to do the conversion
input(...)  ->  eval(input(...))
raw_input(...)  ->  input(...)

Unfortunately, I will not be participating in any sprints.

Andr?



> On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > Andre Roberge wrote:
> > > Any possibility that (some of) the following can be done before Pycon?
> > > Respectfully yours,
> > > Andr? Roberge
> >
> > I've added the PEP as 3111. I made a few small modifications (and
> > committed it directly as Accepted) based on Guido's comments in this thread.
> >
> > The actual change still needs to be made, though.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> > ---------------------------------------------------------------
> >              http://www.boredomandlaziness.org
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From greg.ewing at canterbury.ac.nz  Wed Feb 21 00:02:16 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 21 Feb 2007 12:02:16 +1300
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
Message-ID: <45DB7DF8.1010109@canterbury.ac.nz>

Raymond Hettinger wrote:

> * Maintaining a live (self-updating) view is a bit tricky from an implementation 
> point-of-view.

I don't understand what the alternative is. If mutating the
underlying object doesn't affect the view, then you don't
really have a view, just a copy of the data -- no different
from the existing dict keys() etc.

If you're saying that you shouldn't be able to mutate the
underlying object *through* the view, that's okay -- I don't
mind if the views are read-only in some or all cases.

 > Let's make dicts as
> simple as possible and then introduce a new collections module entry with the 
> views bells and whistles. 

If the view methods are only available on a special dict
subclass and not on ordinary dicts, their usefulness will
be severely crippled, so you wouldn't learn much from
the experiment.

--
Greg

From guido at python.org  Wed Feb 21 00:25:28 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 15:25:28 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <45DB7DF8.1010109@canterbury.ac.nz>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<45DB7DF8.1010109@canterbury.ac.nz>
Message-ID: <ca471dc20702201525k2dad31aanfb7fbd08f2b9d7a3@mail.gmail.com>

On 2/20/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Raymond Hettinger wrote:
>
> > * Maintaining a live (self-updating) view is a bit tricky from an implementation
> > point-of-view.
>
> I don't understand what the alternative is. If mutating the
> underlying object doesn't affect the view, then you don't
> really have a view, just a copy of the data -- no different
> from the existing dict keys() etc.

FWIW, I didn't find the implementation tricky at all -- the views are
very small objects that simply contain a reference to the underlying
dict. All operations on the view defer to the dict one way or another.

The code in the PEP also shows how simple this is to do generically
for any underlying mapping object that implements __getitem__,
__contains__, __len__, and __iter__. (Or it will, once I am done
updating it. :-)

> If you're saying that you shouldn't be able to mutate the
> underlying object *through* the view, that's okay -- I don't
> mind if the views are read-only in some or all cases.

While the PEP has some mutability, the implementation currently has
all views be read-only, and I like this enough to want to keep it that
way.

>  > Let's make dicts as
> > simple as possible and then introduce a new collections module entry with the
> > views bells and whistles.
>
> If the view methods are only available on a special dict
> subclass and not on ordinary dicts, their usefulness will
> be severely crippled, so you wouldn't learn much from
> the experiment.

True. I'm also unclear on what "as simple as possible" would mean.
Perhaps delete iterkeys etc. and make keys etc. return iterators? That
was the *old* plan, which was never really challenged, and IMO it is
in every aspect inferior to the current plan.

BTW the PEP was incorrectly marked as accepted. I'll unmark it, and
remove the mutability.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Feb 21 00:46:11 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 15:46:11 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
Message-ID: <ca471dc20702201546g71cdaf78se82d5899c4b45e2@mail.gmail.com>

On 2/20/07, Raymond Hettinger <raymond.hettinger at verizon.net> wrote:
> The Java concept of dictionary views seems to have caught-on here while I wasn't
> looking.  At the risk of covering some old ground, I would like to re-open the
> question.

Because it's coming from you I am reopening the discussion; didn't
mean to exclude you. But it is a bit of a pain that you weren't
looking while this was discussed. If we decide to roll it back it will
be painful (depending on the shape of the roll-back).

> Here are a few thoughts on the subject to kick-off the discussion:
>
> * Maintaining a live (self-updating) view is a bit tricky from an implementation
> point-of-view.  While it is clearly doable for dictionaries, it is not clear
> that it is a good idea for a general mapping API which can be wrapped around
> dbms, shelves, elementtrees, b-trees, and other wrascally rabbits.  I doubt that
> the underlying structures of other mapping types support the observer pattern
> necessary to keep views updated -- this is doubly true if the underlying data is
> on disk and can be updated by other processes, threads, etc.

No observer pattern is required. See the (updated) PEP 3106.

> * One of the purported benefits is to provide set-like behavior without the
> expense of copying to a new set object.  FWIW, I've updated the set
> implementation to be more interoperable with dictionaries so that the conversion
> costs are negligible (about the same as a dict resize operation -- one pass, no
> calls to PyObject_Hash, insertion into a presized, sparse table with very few
> collisions).

But it is still O(N) in time and space. Creating a dict view is O(1) in both.

> * A dict is also one of Python's most basic APIs (along with lists).  Ideally,
> we should keep those two APIs as simple as possible (getting rid of setdefault()
> and unneeded methods is a step in the right direction).  IMO, the views will be
> the hardest part of the API to explain and interact with when learning the
> language -- to learn about dicts and lists, you already have to learn about
> mutability and hashability -- it doesn't help this situation if you then need to
> learn about self-updating views that can be deleted, have modified values, but
> cannot be added, and that have their own set-like operations but aren't really
> sets . . .

Perhaps it will be more palatable now that the views aren't mutable?
Also, I think you may have the wrong semantic model -- it's not a
self-updating view, it's just a different way to look at the same
underlying mapping. (Did you see PEP 3106? Since you don't quote it
this is not clear.)

> * ISTM that views offer three benefits:  re-iterability, set behavior, and
> self-updates.  IMO, the first is not commonly needed and is trivially served by
> writing list(mydict.items()) or somesuch.  The second is best served by an
> explicit conversion to a set or frozenset type -- those two types have been
> enormously successful in that they seem to offer a near zero learning curve --
> people seem to intuitively know how to use them right out of the box.

Yes, Greg Wilson did a super job on the API design.

(Though I keep having to remind Googlers about sets; they seem to have
lived in a world limited to Python 2.2 for too long. :-( )

> As long
> as that conversion is fast, I think the explicit conversion is the way to go --
> it is the way you would do it with any other Python type where you wanted set
> behavior.  Adding a handful of set methods to dict views would only complicate
> an otherwise simple situation and introduce unnecessary complexity (i.e. what
> should isinstance(d.d_keys, set) return?).

This I hope to address by introducing Abstract Base Classes.
Unfortunately that proposal isn't at all worked out, the best we have
is a wiki page by Bill Janssen (), but that is quite far removed from
what I would like to see.

> The third benefit (self-updates) is
> more interesting and does not have a direct analog with existing python tools,
> so the question is how valuable is self-updating behavior and are there
> compelling use cases that warrant a more complex API?

Just because it's new doesn't make it suspect does it? It's been very
well received in Java.

> My recommendation is to take a more conservative route.  Let's make dicts as
> simple as possible

I'd like to see a concrete proposal here before I can judge which is
the better proposal.

>  and then introduce a new collections module entry with the
> views bells and whistles.  If the collections version proves itself as
> enormously popular, useful, understandable, and without a good equivalent, then
> it can ask for a promotion.  The collections module is nice place to put in
> alternate datatypes that meet the more demanding needs of advanced users who
> know exactly what they want/need in terms of special behaviors or performance.

But that's not what dict views are about. They ar about making the
mapping API easier for *all* users. (Anyway, Greg Ewing already shot
this down.)

> And, if we take the collections module route, there is no reason that it cannot
> be put into Py2.6 where people will either flock to it or ignore it, with either
> result providing us with good guidance for Py3.0.

Dict views can easily be added to 2.6 by using different method names
that can be automatically converted by the 2to3 converter. E.g.
d.viewkeys(), d.viewitems(), d.viewvalues(). The implementation should
plug right in. (Anthony and Thomas also have some more advanced ideas
on how to make keys/items/values return views when used in a module
declaring "from __future__ import dict_views".)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Feb 21 00:48:50 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 15:48:50 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
	<79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com>
Message-ID: <ca471dc20702201548l74c8dbb7he2106bd2eb87bf7@mail.gmail.com>

On 2/20/07, Paul Moore <p.f.moore at gmail.com> wrote:
> (I have similar concerns over the "new IO" proposals I've
> seen, but there's nothing concrete there yet, so I'll save that
> argument for another day...)

Then you should also have misgivings about the Unicode/str
unification. If you are cool with that, I don't see how we can avoid
redoing the I/O library.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Feb 21 00:51:01 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 15:51:01 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
Message-ID: <ca471dc20702201551o1935104s3a7c45ca35635cb3@mail.gmail.com>

On 2/20/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> Just to clarfiy, you're suggesting that we still change .keys()
> .values() and .items() to iterators, right?

But this isn't really easier to explain to noobs than views, is it?
What's the advantage of

>>> {}.keys()
<dictionary-keyiterator object at 0xb7f82f60>
>>>

over

>>> {}.keys()
<dict_keys object at 0xb7fb6540>
>>>

???

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Wed Feb 21 01:10:53 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 20 Feb 2007 17:10:53 -0700
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <ca471dc20702201551o1935104s3a7c45ca35635cb3@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
	<ca471dc20702201551o1935104s3a7c45ca35635cb3@mail.gmail.com>
Message-ID: <d11dcfba0702201610g725a0ca0ncd7fb8acd100bd8e@mail.gmail.com>

On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> On 2/20/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> > Just to clarfiy, you're suggesting that we still change .keys()
> > .values() and .items() to iterators, right?
>
> But this isn't really easier to explain to noobs than views, is it?
> What's the advantage of
>
> >>> {}.keys()
> <dictionary-keyiterator object at 0xb7f82f60>
> >>>
>
> over
>
> >>> {}.keys()
> <dict_keys object at 0xb7fb6540>
> >>>
>
> ???

No advantage at the interactive prompt of course. ;-)

The advantage is only in what you have to explain about the object. In
the former case, you can simply say "it's an iterator over the keys"
and they can understand it with their existing knowledge of iterators.
And if they don't know what iterators are, once they learn about them
for this case, they'll also know how iterators work in other
situations, e.g. list iterators, set iterators, deque iterators, etc.

On the other hand, when they're told "it's a dict key view object",
they can't use any existing knowledge. They have to go and look up the
API for what exactly a dict key view object does. And once they've
learned what API a dict key view object supports, that knowledge is
not really helpful in any new situations. They won't see key views on
lists, sets or deques, for example.

So it's mainly about keeping the mental footprint small. Knowing how
iterators work is a useful bit of knowledge that is widely applicable
across a variety of Python objects. Knowing how the various dict views
work is not so generally useful.

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Wed Feb 21 01:13:42 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 16:13:42 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <45DAF693.1000004@gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<45DAF693.1000004@gmail.com>
Message-ID: <ca471dc20702201613l67b11539g2aa9bffcd90224d9@mail.gmail.com>

On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Raymond Hettinger wrote:
> > The Java concept of dictionary views seems to have caught-on here while I wasn't
> > looking.  At the risk of covering some old ground, I would like to re-open the
> > question.  Here are a few thoughts on the subject to kick-off the discussion:
> >
> > * Maintaining a live (self-updating) view is a bit tricky from an implementation
> > point-of-view.  While it is clearly doable for dictionaries, it is not clear
> > that it is a good idea for a general mapping API which can be wrapped around
> > dbms, shelves, elementtrees, b-trees, and other wrascally rabbits.  I doubt that
> > the underlying structures of other mapping types support the observer pattern
> > necessary to keep views updated -- this is doubly true if the underlying data is
> > on disk and can be updated by other processes, threads, etc.
>
> FWIW, the py3k trunk is still somewhat broken from the implementation of
> this change. Without any test resources enabled, I still get more than
> half a dozen failures which appear at a glance to be related to
> unexpected mutation of dictionaries (test_anydbm, test_dumbdbm,
> test_mutants, test_compile, test_iter, test_iterlen, test_minidom,
> test_os, test_importhooks, test_unittest)

Yes, I plan to have those fixed before PyCon (so we won't have to
waste time on them at the sprint). But if someone wants to help that
would be great!

> > * One of the purported benefits is to provide set-like behavior without the
> > expense of copying to a new set object.  FWIW, I've updated the set
> > implementation to be more interoperable with dictionaries so that the conversion
> > costs are negligible (about the same as a dict resize operation -- one pass, no
> > calls to PyObject_Hash, insertion into a presized, sparse table with very few
> > collisions).
>
> The speed costs may become negligible, but I believe the main concern
> here is memory consumption (minimising memory usage is certainly the
> only reason I've ever made sure to use the dict.iter* methods).

As I said, it's still O(N) time and space, vs. O(1) for creating a view.

[...]
> One of the things that's been suggested for working with strings in Py3k
> is a stringview type - a wrapper type around a string where slicing (and
> similar operations, like partition()) creates objects with a reference &
> offset into the original string rather than actually copying data
> around.

That must've been proposed while *I* was away. :-) I'm not at all
convinced that this kind of complexity is helpful at all. But note
that it's a different case from dict views, since string views can be
turned into copies with only some cost (or savings :-) in time and
space, but without sematic changes. Dict views have semantics.

> Standard strings would be entirely unaffected. The concept of
> use being that a functions would accept a string-like object as an
> object, wrap the stringview around it, then convert the result back to a
> normal string before passing it back to the caller.
>
> Couldn't something similar serve as a replacement for the iter*()
> methods on dictionaries? That is, rather than copying the data into a
> different data structure,

Um, neither iterkeys() nor dict views do any copying. They just
reference the underlying dict in a tiny fixed-size object (literally
one pointer for dict views, a bit more for iterkeys(), in order to
detect mutations to the dict).

> instead provide a group of wrapper classes,
> each of which operate on the wrapped mapping in a different way?
>
> The necessary wrapper classes needed would be:
>    - set API exposing keys of the original mapping
>    - multiset API exposing values of the underlying mapping
>    - keyed set API exposing (key, value) pairs of the original mapping
>    - mapping API that uses the above for keys(), values() & items(), but
>      otherwise delegates operations to the original mapping
>
> All except the last already exist in the Py3k branch (as the role of the
> last suggested wrapper type is currently being handled by the dict data
> type itself)

This is a clear explanation of the implementation; but can you also
explain the benefits?

> Similar to Raymond's suggestion of a new concrete container type (rather
> than a wrapper type), this could be included in the standard library for
> Python 2.6, making it significantly easier to write forward compatible code.

I doubt that writing forward compatible code will be hard anyways. The
most compatible code simply doesn't use any of the six affected
methods (keys(), iterkeys(), etc.) and instead relies on directly
manipulating or iterating over the dict. This is backwards compatible
all the way back to 2.2. Also, the conversion tool will make it easy
to write compatible code that uses iterkeys() but not keys().

> Given such a new dict wrapper type (or standalone container type, if
> Raymond's approach is taken), then it would also be possible to change
> the basic mapping API to define different return types for the 3 methods
> that currently return lists:
>    - keys() would be changed to return a set
>    - values() would be changed to return a multiset (non-hash based)
>    - items() would be changed to return a keyed set (ala sort keys)

I'm not sure what you mean by "ala sort keys". I hope there's no
requirement that items() be sorted.

Note that the implementation of a multiset type will be quite tricky
(I'm punting on this in the rewrite of PEP 3106 that just got
refreshed on python.org).

Also, this implementation will make it hard to ensure that

list(zip(d.keys(), d.values())) == list(d.items())

as the keys() and values() return different object types.

> The difference from the current Py3k branch is that these would still
> involve copying the data from the original dictionary to a new concrete
> container object which is then returned.

But that's the main thing I'm trying to *avoid* with the new API!

> The more I've seen of these discussions (the original dict method one,
> as well as the string concatenation/slicing one), the more leery I
> become of including any view type behaviour in the basic data types.

I can't say I see much similarity between the two discussions. The
issues are all completely different -- for dicts they focus on API
semantics, while for strings they focus on performance in all sorts of
odd cases.

> One factor in this is that I've been getting back into C++ coding
> lately, and keep getting reminded of the various cases where the C++
> standard defaults to behaviours that are faster (use less memory,
> whatever) when they're valid, but silently do the wrong thing when
> they're inappropriate (default assignment and copy constructions
> operators that lead to significant memory double-free problems are a
> nice example, as is the fact that methods are non-virtual by default).
> So rather than spend the time to figure out whether or not the default
> behaviour is safe for each case, it becomes quicker and easier to just
> stick in the boilerplate to tell the compiler "don't do the default
> thing, it is probably wrong", thus completely invalidating the supposed
> performance improvement that was meant to be provided by the default
> behaviour (and requiring a programmer to put in a comment saying so when
> the default behaviour really is what they want).

Well, unless you plan to get rid of dict.__iter__(), that's a default
behavior that is wrong whenever you mutate the dict in the loop -- but
what are you going to do about it?

> Typically, Python doesn't work that way - it defaults to 'safe'
> behaviour, but provides sufficient flexibility to permit optimisation
> when it is necessary (e.g., with the size of the data sets the NumPy
> folks sling around, view-based behaviour is essential, and Python lets
> them do it that way).
>
> My apologies for rambling a bit - I can't currently give a succinct
> explanation for why the current direction feels wrong, but I felt it was
> worth supporting Raymond on this point.

Apologies accepted -- but yes, you did ramble a bit, and I still wish
you'd collected your thoughts a bit more. if there are simple clear
arguments it's easier for me to accept or reject them than with a
bunch of ramblings. Sorry to be grumpy, but given the implementation
stage this is in and how long the PEP has been sitting unchanged I'm a
bit annoyed that the criticism, valid or not, comes so late.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Feb 21 01:17:55 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 16:17:55 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <d11dcfba0702201610g725a0ca0ncd7fb8acd100bd8e@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
	<ca471dc20702201551o1935104s3a7c45ca35635cb3@mail.gmail.com>
	<d11dcfba0702201610g725a0ca0ncd7fb8acd100bd8e@mail.gmail.com>
Message-ID: <ca471dc20702201617j5ad39ebbh231c5d82c860be74@mail.gmail.com>

On 2/20/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> > On 2/20/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> > > Just to clarfiy, you're suggesting that we still change .keys()
> > > .values() and .items() to iterators, right?
> >
> > But this isn't really easier to explain to noobs than views, is it?
> > What's the advantage of
> >
> > >>> {}.keys()
> > <dictionary-keyiterator object at 0xb7f82f60>
> > >>>
> >
> > over
> >
> > >>> {}.keys()
> > <dict_keys object at 0xb7fb6540>
> > >>>
> >
> > ???
>
> No advantage at the interactive prompt of course. ;-)
>
> The advantage is only in what you have to explain about the object. In
> the former case, you can simply say "it's an iterator over the keys"
> and they can understand it with their existing knowledge of iterators.

Uhm, you gotta be kidding. You don't seriously expect noobs to have a
priori understanding if iterators do you? Those most likely come
*after* dict views.

> And if they don't know what iterators are, once they learn about them
> for this case, they'll also know how iterators work in other
> situations, e.g. list iterators, set iterators, deque iterators, etc.

Most of which one rarely needs to know about. In fact, i'd say that if
it wasn't for dict.iterkeys() we could probably hide iterators quite
effectively for a long time from noobs.

> On the other hand, when they're told "it's a dict key view object",
> they can't use any existing knowledge. They have to go and look up the
> API for what exactly a dict key view object does. And once they've
> learned what API a dict key view object supports, that knowledge is
> not really helpful in any new situations. They won't see key views on
> lists, sets or deques, for example.

But they will see them (I hope) on other mappings.

> So it's mainly about keeping the mental footprint small. Knowing how
> iterators work is a useful bit of knowledge that is widely applicable
> across a variety of Python objects. Knowing how the various dict views
> work is not so generally useful.

Since they mostly behave like sets (that you can't mutate directly)
they should be very low conceptual overhead. (Raymond already remarked
on the success of the set API and I wholeheartedly agree.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steven.bethard at gmail.com  Wed Feb 21 01:35:15 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 20 Feb 2007 17:35:15 -0700
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <ca471dc20702201617j5ad39ebbh231c5d82c860be74@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<d11dcfba0702200708of12a6d3s82ec3fb1e65d1076@mail.gmail.com>
	<ca471dc20702201551o1935104s3a7c45ca35635cb3@mail.gmail.com>
	<d11dcfba0702201610g725a0ca0ncd7fb8acd100bd8e@mail.gmail.com>
	<ca471dc20702201617j5ad39ebbh231c5d82c860be74@mail.gmail.com>
Message-ID: <d11dcfba0702201635i540622b4had8e807542106fb2@mail.gmail.com>

On 2/20/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> Just to clarfiy, you're suggesting that we still change .keys()
> .values() and .items() to iterators, right?

On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> But this isn't really easier to explain to noobs than views, is it?

On 2/20/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On the other hand, when they're told "it's a dict key view object",
> they can't use any existing knowledge. They have to go and look up the
> API for what exactly a dict key view object does. And once they've
> learned what API a dict key view object supports, that knowledge is
> not really helpful in any new situations. They won't see key views on
> lists, sets or deques, for example.

On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> But they will see them (I hope) on other mappings.

Presumably.

All I was really pointing out is that your average Python programmer
encounters more iterable objects than they do mapping-like objects.
(Inevitable, of course, since all mapping-like objects are iterable.)
My conclusion was therefore that iterability was a more basic part of
Python.

IMVHO, the fewer building blocks you have to understand to use the
basic Python types, the better. But I'm going to let the discussion go
for a while now, because it's a much better use of your time
convincing Raymond than it is convincing me. ;-)

STeVe
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From tdelaney at avaya.com  Wed Feb 21 01:33:42 2007
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Wed, 21 Feb 2007 11:33:42 +1100
Subject: [Python-3000] Thoughts on dictionary views
Message-ID: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com>

Steven Bethard wrote:

> The advantage is only in what you have to explain about the object. In
> the former case, you can simply say "it's an iterator over the keys"
> and they can understand it with their existing knowledge of iterators.

"it's an iterator over the keys"

They use their knowledge of iterators (a standard concept in Python
2.2+).

> On the other hand, when they're told "it's a dict key view object",
> they can't use any existing knowledge. They have to go and look up the

"it's a set view of the keys"

They use their knowledge of sets (a standard concept in Python 2.3+) and
views (a standard concept in Python 2.6+).

The standard concept of a view will be something like:

A view is a lightweight object that implements an interface by
delegating to an underlying object. The underlying object cannot be
changed through the view, but could be changed directly, in which case
the view will reflect the new contents of the object.

Note that some changes to the underlying object may invalidate the view,
in which case using it will throw an exception.

Note also that there is nothing preventing someone from creating a
view-like class that allows changing the underlying object through it,
but such a class should probably not be described as a view.

Tim Delaney

From guido at python.org  Wed Feb 21 01:45:24 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 16:45:24 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com>
Message-ID: <ca471dc20702201645i5a8a2008t9e69e30ff89c53a2@mail.gmail.com>

On 2/20/07, Delaney, Timothy (Tim) <tdelaney at avaya.com> wrote:
> Steven Bethard wrote:
>
> > The advantage is only in what you have to explain about the object. In
> > the former case, you can simply say "it's an iterator over the keys"
> > and they can understand it with their existing knowledge of iterators.
>
> "it's an iterator over the keys"
>
> They use their knowledge of iterators (a standard concept in Python
> 2.2+).
>
> > On the other hand, when they're told "it's a dict key view object",
> > they can't use any existing knowledge. They have to go and look up the
>
> "it's a set view of the keys"
>
> They use their knowledge of sets (a standard concept in Python 2.3+) and
> views (a standard concept in Python 2.6+).
>
> The standard concept of a view will be something like:
>
> A view is a lightweight object that implements an interface by
> delegating to an underlying object. The underlying object cannot be
> changed through the view, but could be changed directly, in which case
> the view will reflect the new contents of the object.
>
> Note that some changes to the underlying object may invalidate the view,
> in which case using it will throw an exception.

No, this only invalidates an in-progress iterator.

> Note also that there is nothing preventing someone from creating a
> view-like class that allows changing the underlying object through it,
> but such a class should probably not be described as a view.

You can also think of dict views as a straightforward application of
the GoF adapter pattern.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From larry at hastings.org  Wed Feb 21 01:51:16 2007
From: larry at hastings.org (Larry Hastings)
Date: Tue, 20 Feb 2007 16:51:16 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5F074468@au3010avexu1.global.avaya.com>
Message-ID: <45DB9784.4030903@hastings.org>

Delaney, Timothy (Tim) wrote:
> A view is a lightweight object that implements an interface by
> delegating to an underlying object. The underlying object cannot be
> changed through the view, but could be changed directly, in which case
> the view will reflect the new contents of the object.
It certainly makes sense that views would *usually* be read-only, but is 
that really a *requirement*?  attrview(), recently discussed in 
Python-Dev, allowed changing the underlying object. See Martin v. 
Lowis's implementation of attrview() here:
    http://mail.python.org/pipermail/python-dev/2007-February/071044.html
It allowed setting attributes on the underlying object, like this:
    attrview(self)[method_name] = attrview(self.metadata)[method_name]

Cheers,


/larry/

From tdelaney at avaya.com  Wed Feb 21 02:03:28 2007
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Wed, 21 Feb 2007 12:03:28 +1100
Subject: [Python-3000] Thoughts on dictionary views
Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1ECAA@au3010avexu1.global.avaya.com>

Larry Hastings wrote:

> Delaney, Timothy (Tim) wrote:
>> A view is a lightweight object that implements an interface by
>> delegating to an underlying object. The underlying object cannot be
>> changed through the view, but could be changed directly, in which
>> case the view will reflect the new contents of the object.
>
> It certainly makes sense that views would *usually* be read-only, but
> is that really a *requirement*?

No, but I think it would be worthwhile (and definitely simplest) if the
standard concept of a view in python was read-only.

Then any non-read-only view becomes the exception, and needs to be
flagged as such.

Tim Delaney

From tdelaney at avaya.com  Wed Feb 21 02:08:36 2007
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Wed, 21 Feb 2007 12:08:36 +1100
Subject: [Python-3000] Thoughts on dictionary views
Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com>

Guido van Rossum wrote:

>> Note that some changes to the underlying object may invalidate the
>> view, in which case using it will throw an exception.
> 
> No, this only invalidates an in-progress iterator.

Yeah - that's what I meant - just couldn't think if there were any other
situations that might (at least with the standard views).

>> Note also that there is nothing preventing someone from creating a
>> view-like class that allows changing the underlying object through
>> it, but such a class should probably not be described as a view.
> 
> You can also think of dict views as a straightforward application of
> the GoF adapter pattern.

Yep - and I think that would be a good secondary explanation, instantly
understandable by anyone with much programming experience.

I think it's important though to set the expectations of what a view
will normally be used for, so that any unqualified use of the term
"view" will have a common understanding. And I think that the
unqualified "view" should mean read-only.

Tim Delaney

From jcarlson at uci.edu  Wed Feb 21 03:07:57 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 20 Feb 2007 18:07:57 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com>
Message-ID: <20070220174008.ADCE.JCARLSON@uci.edu>


(merging a few replies to reduce traffic)

"Delaney, Timothy (Tim)" <tdelaney at avaya.com> wrote:
> Guido van Rossum wrote:
> > You can also think of dict views as a straightforward application of
> > the GoF adapter pattern.
> 
> Yep - and I think that would be a good secondary explanation, instantly
> understandable by anyone with much programming experience.

Not necessarily.  I have been programming for almost a decade (most of
it in Python), and I haven't yet taken a software engineering course
(none were offered or required during my undergrad, and I didn't take
any in my masters program); so the whole "patterns" thing is generally
opaque to me.  Some of them are self-evident in the name (observer,
visitor, etc.), but "GoF adapter pattern" is Greek to me.


"Guido van Rossum" <guido at python.org> wrote:
> Perhaps it will be more palatable now that the views aren't mutable?
> Also, I think you may have the wrong semantic model -- it's not a
> self-updating view, it's just a different way to look at the same
> underlying mapping. (Did you see PEP 3106? Since you don't quote it
> this is not clear.)

I was "eh, why bother?" prior to reading the updated PEP 3106, but now
can see the benefit to keys(), values(), and items() returning views. 
I'm not sure I would use the added features (I don't believe I've ever
compared the equalities of keys or values of two dictionaries separately,
and I tend to stick to .iter*() methods), but I can also see how it
would be useful to some users.


"Guido van Rossum" <guido at python.org> wrote:
> On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > The speed costs may become negligible, but I believe the main concern
> > here is memory consumption (minimising memory usage is certainly the
> > only reason I've ever made sure to use the dict.iter* methods).
> 
> As I said, it's still O(N) time and space, vs. O(1) for creating a view.

But that's only if the .keys(), .values(), .items() produced actual sets
versus producing a view.  In the case of .values(), I don't see how one
can do *any* better than O(n) for the a.values() == b.values() (or
really O(nlogn) for comparable objects, and O(n^2) when they are not).

There are going to be special cases that ruin performance with all 3
options (use Python 2.x equivalent .iter*(), use a view, use a set
variant).  While I can *see* the use of views, I can also see the
benefit with just renaming .iter*() as .*() .


 - Josiah


From jcarlson at uci.edu  Wed Feb 21 03:10:37 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 20 Feb 2007 18:10:37 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <ca471dc20702201548l74c8dbb7he2106bd2eb87bf7@mail.gmail.com>
References: <79990c6b0702200713s69c90510o617595344fd17af@mail.gmail.com>
	<ca471dc20702201548l74c8dbb7he2106bd2eb87bf7@mail.gmail.com>
Message-ID: <20070220171429.ADC5.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 2/20/07, Paul Moore <p.f.moore at gmail.com> wrote:
> > (I have similar concerns over the "new IO" proposals I've
> > seen, but there's nothing concrete there yet, so I'll save that
> > argument for another day...)
> 
> Then you should also have misgivings about the Unicode/str
> unification. If you are cool with that, I don't see how we can avoid
> redoing the I/O library.

I'm not so sure.  The return type on socket.recv and os.read could be
changed to bytes (seemingly without much difficulty), and likely could
even be changed to *take* a bytes object as the destination buffer
(ditto for files opened as 'raw').  From there, aside from updating the
standard library to handle socket, os.read, etc., for incoming data
expecting a bytes object, and raising an exception when trying to write
a unicode object, that is the limit to the changes.

Of course, even with the proposed updated I/O library, every one of
those modules would have to be changed anyways.

Then again, I've been "eh?" on the whole I/O library thing, and
generally annoyed at the "everything is unicode" idea.  Converting all
libraries that currently deal with IO is going to be a pain, especially
if it does any sort of parsing of mixed binary and non-unicode textual
data (like http headers combined with binary posted data or a utf-8
encoded stream).


As a heavy user of quite a few of the current standard library IO
modules (SocketServer, asyncore, urllib, socket, etc.) and as someone
who has the "opportunity" to write line-level protocols, I'd be quite
happy with the following...

1) add bytes (or add features to array)
2) rename unicode to text (or str)
3) renaming str to bin (or some other sufficiently clear name)
4) making string literals 'hello' be unicode
5) allow for b'constant' be the renamed str
6) add a mandatory 3rd argument to file/open which is the codec to use
for reading
7) offer a new function for opening 'binary' files (which are opened as
'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will
remove confusion on Windows platforms

Indeed, it isn't as revolutionary as "everything is unicode", but it
would allow the standard library to be updated with a relative minimum
of fuss and muss, without needing to intermix...
    x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...)
or
    sock.send(unicode.encode('latin-1'))


 - Josiah


From guido at python.org  Wed Feb 21 04:32:19 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 19:32:19 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <20070220174008.ADCE.JCARLSON@uci.edu>
References: <2773CAC687FD5F4689F526998C7E4E5FF1ECAB@au3010avexu1.global.avaya.com>
	<20070220174008.ADCE.JCARLSON@uci.edu>
Message-ID: <ca471dc20702201932odb9caafjd691ac278647e3e9@mail.gmail.com>

On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> I was "eh, why bother?" prior to reading the updated PEP 3106, but now
> can see the benefit to keys(), values(), and items() returning views.
> I'm not sure I would use the added features (I don't believe I've ever
> compared the equalities of keys or values of two dictionaries separately,
> and I tend to stick to .iter*() methods), but I can also see how it
> would be useful to some users.

Thanks for the (relative) vote of confidence. Was it an update I made
to the PEP, or did you not read it at all before?

> "Guido van Rossum" <guido at python.org> wrote:
> > On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > > The speed costs may become negligible, but I believe the main concern
> > > here is memory consumption (minimising memory usage is certainly the
> > > only reason I've ever made sure to use the dict.iter* methods).
> >
> > As I said, it's still O(N) time and space, vs. O(1) for creating a view.
>
> But that's only if the .keys(), .values(), .items() produced actual sets
> versus producing a view.

Which (producing an actual set) is Nick's (and Raymond's) proposal.

> In the case of .values(), I don't see how one
> can do *any* better than O(n) for the a.values() == b.values() (or
> really O(nlogn) for comparable objects, and O(n^2) when they are not).

The PEP's algorithm for comparing values in O(N**2); I'm not sure it's
worth attempting to optimize it, since I'm not aware of any use case;
but it still seems better to do this than to compare values views by
object identity.

> There are going to be special cases that ruin performance with all 3
> options (use Python 2.x equivalent .iter*(), use a view, use a set
> variant).  While I can *see* the use of views, I can also see the
> benefit with just renaming .iter*() as .*() .

Name a special case that ruins performance with either PEP 3106 or
renaming .iter*() to .*()?

Methinks that both views and iterators can be optimal, at least for
.keys() and .items()), certainly in terms of O() notation; there may
be edge cases where many many lookups are done, where making a copy
into a set could be faster, but that requires the total # of lookups
to be much larger than N.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed Feb 21 04:44:22 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 19:44:22 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
Message-ID: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>

[Note: changed subject]

On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > On 2/20/07, Paul Moore <p.f.moore at gmail.com> wrote:
> > > (I have similar concerns over the "new IO" proposals I've
> > > seen, but there's nothing concrete there yet, so I'll save that
> > > argument for another day...)
> >
> > Then you should also have misgivings about the Unicode/str
> > unification. If you are cool with that, I don't see how we can avoid
> > redoing the I/O library.
>
> I'm not so sure.  The return type on socket.recv and os.read could be
> changed to bytes (seemingly without much difficulty),

Yes, that's the plan anyway.

> and likely could
> even be changed to *take* a bytes object as the destination buffer
> (ditto for files opened as 'raw').

This already works -- bytes support the buffer API.

> From there, aside from updating the
> standard library to handle socket, os.read, etc., for incoming data
> expecting a bytes object, and raising an exception when trying to write
> a unicode object, that is the limit to the changes.

Sure.

> Of course, even with the proposed updated I/O library, every one of
> those modules would have to be changed anyways.

Right. But I expect the higher-level APIs (sock.makefile()) to be
relatively stable.

> Then again, I've been "eh?" on the whole I/O library thing, and
> generally annoyed at the "everything is unicode" idea.

Well, unless you remove the str type, how are you going to get rid of
the endless problems with unicode where mixing unicode and str
sometimes works and sometimes doesn't?

> Converting all
> libraries that currently deal with IO is going to be a pain, especially
> if it does any sort of parsing of mixed binary and non-unicode textual
> data (like http headers combined with binary posted data or a utf-8
> encoded stream).

Yeah, I'm not looking forward to that, but I expect it'll be
relatively straightforward once we figure out the right patterns;
there's just a lot of code to convert. But that's the whole Py3k plan.

> As a heavy user of quite a few of the current standard library IO
> modules (SocketServer, asyncore, urllib, socket, etc.) and as someone
> who has the "opportunity" to write line-level protocols, I'd be quite
> happy with the following...
>
> 1) add bytes (or add features to array)
> 2) rename unicode to text (or str)
> 3) renaming str to bin (or some other sufficiently clear name)

So you'd have THREE types (bytes, text, bin)? Or are you proposing bin
instead of bytes, contrary to what you suggested above?

> 4) making string literals 'hello' be unicode
> 5) allow for b'constant' be the renamed str
> 6) add a mandatory 3rd argument to file/open which is the codec to use
> for reading

And how does that help users or compatibility?

> 7) offer a new function for opening 'binary' files (which are opened as
> 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will
> remove confusion on Windows platforms

This is a red herring. Or I'm not sure I understand this part of your
proposal. What's wrong with 'rb'?

> Indeed, it isn't as revolutionary as "everything is unicode", but it
> would allow the standard library to be updated with a relative minimum
> of fuss and muss, without needing to intermix...
>     x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...)
> or
>     sock.send(unicode.encode('latin-1'))

Actually, with the renamings and everything, it's just about as
disruptive as the current proposal, so I'm unclear why you think this
is so different.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Wed Feb 21 06:52:08 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 20 Feb 2007 21:52:08 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
Message-ID: <20070220205651.ADD7.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> [Note: changed subject]
> On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > I'm not so sure.  The return type on socket.recv and os.read could be
> > changed to bytes (seemingly without much difficulty),
> 
> Yes, that's the plan anyway.

Better than returning unicode, but not as good as returning "binary".

> > and likely could
> > even be changed to *take* a bytes object as the destination buffer
> > (ditto for files opened as 'raw').
> 
> This already works -- bytes support the buffer API.

I was thinking of...

    buff = bytes(4096*[0])
    received = sock.recv(buff)

It's really only useful when you have a known protocol with fixed size
blocks, but need it to run more or less forever.  By fixing the buffer
size, you can have significantly reduced memory fragmentation.


> > Then again, I've been "eh?" on the whole I/O library thing, and
> > generally annoyed at the "everything is unicode" idea.
> 
> Well, unless you remove the str type, how are you going to get rid of
> the endless problems with unicode where mixing unicode and str
> sometimes works and sometimes doesn't?

Ooh, one of my favorite games!

* Explicit <conversion to unicode> is better than implicit.
* In the face of ambiguity, refuse the temptation to guess <what codec
to use to decode the string>.
* Errors <when adding strings to unicode> should never pass silently.

There are at least two approaches to solving the problem:
1) make everything unicode
2) make all implicit conversions an error.

Adding strings to unicode should produce an exception.  The fact that it
doesn't right now, I believe, is both a result of implementation details
getting in the way of what should happen. Remove the ambiguity, codec
guessing, etc., raise a TypeError("cannot concatenate str and unicode
objects"), and move on.

Don't allow up-casting in u''.join() or ''.join() (or their equivalents
in py3k).


> > Converting all
> > libraries that currently deal with IO is going to be a pain, especially
> > if it does any sort of parsing of mixed binary and non-unicode textual
> > data (like http headers combined with binary posted data or a utf-8
> > encoded stream).
> 
> Yeah, I'm not looking forward to that, but I expect it'll be
> relatively straightforward once we figure out the right patterns;
> there's just a lot of code to convert. But that's the whole Py3k plan.

No offense, but the plan to convert it all to use bytes, stinks.
Starting with the API defined in PEP 358, I started converting smtpd (as
an example), and I found myself *wanting* to use unicode because the
whole numeric constants and/or bytes('unicode', 'latin-1') got really
old really fast.


> > As a heavy user of quite a few of the current standard library IO
> > modules (SocketServer, asyncore, urllib, socket, etc.) and as someone
> > who has the "opportunity" to write line-level protocols, I'd be quite
> > happy with the following...
> >
> > 1) add bytes (or add features to array)
> > 2) rename unicode to text (or str)
> > 3) renaming str to bin (or some other sufficiently clear name)
> 
> So you'd have THREE types (bytes, text, bin)? Or are you proposing bin
> instead of bytes, contrary to what you suggested above?

While I would have some personal uses for bytes, all of them could be
fulfilled with an expanded array type.  If I could have my way
<dreaming>I'd rename string and unicode, fold some of the features of
bytes into array, and make socket, etc., return the renamed string
type</dreaming>. In the case of the standard library that deal with
sockets, the only changes would generally be a replacing of 'const' to
b'const'.  That could *almost* be automatic, and would be significantly
faster (for a computer + human) than converting all of the .split(),
.find(), etc., uses in the ftplib, *Server, smtplib, smtpd, etc. to
bytes eqivalents (or converting to and from unicode).

It would take me perhaps 20 minutes to update asyncore, asynchat and
smtpd with the b'binary' semantic.  Based on the last list of methods I
saw for bytes in PEP 358, I would be, more or less, doing bytes.decode
('latin-1') instead of trying to deal with the *crippled* interface that
bytes offers.

Regardless, the performance of those modules would likely suffer when
confronted with bytes rather than a renamed str, as the current bytes
type lacks a large number of convenience methods, that I previously
complained about it not having (which is why I brought up the string
view and sample implementation in late August/early September 2006).


> > 4) making string literals 'hello' be unicode
> > 5) allow for b'constant' be the renamed str
> > 6) add a mandatory 3rd argument to file/open which is the codec to use
> > for reading
> 
> And how does that help users or compatibility?

Users who need binary literals (like every socket module in the standard
library, anyone who does processing of any non-unicode disk/socket/pipe
data, like marshal or pickle, etc.) wouldn't go insane and add bugs
trying to switch to the bytes type, or add performance overhead trying
to convert the received bytes to unicode to get a useful API.


> > 7) offer a new function for opening 'binary' files (which are opened as
> > 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will
> > remove confusion on Windows platforms
> 
> This is a red herring. Or I'm not sure I understand this part of your
> proposal. What's wrong with 'rb'?

Presumption:
    a = open(filename, 'r' or 'w' ['+'], codec)
will open a file as unicode in Py3k (if I am wrong, please correct me).

Proposal:
    b = somename(filename, 'r' or 'w' ['+'])
will be equivalent to:
    b = open(filename, 'rb' or 'wb' ['+'])
today.  This prevents the confusion over different argument values
resulting in different types being returned and accepted by certain
methods.


> > Indeed, it isn't as revolutionary as "everything is unicode", but it
> > would allow the standard library to be updated with a relative minimum
> > of fuss and muss, without needing to intermix...
> >     x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...)
> > or
> >     sock.send(unicode.encode('latin-1'))
> 
> Actually, with the renamings and everything, it's just about as
> disruptive as the current proposal, so I'm unclear why you think this
> is so different.

    sock.send(b'Header: value\r\n')
              ^
The above change can be more or less automatic.  The below?

    sock.send(bytes('Header: value\r\n', 'latin-1'))

    sock.send('Header: value\r\n'.encode('latin-1'))

Either of the above is 17 characters of noise that really shouldn't need
to be there.


 - Josiah


From guido at python.org  Wed Feb 21 07:33:27 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 20 Feb 2007 22:33:27 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <20070220205651.ADD7.JCARLSON@uci.edu>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
Message-ID: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>

On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Guido van Rossum" <guido at python.org> wrote:
> > [Note: changed subject]
> > On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > I'm not so sure.  The return type on socket.recv and os.read could be
> > > changed to bytes (seemingly without much difficulty),
> >
> > Yes, that's the plan anyway.
>
> Better than returning unicode, but not as good as returning "binary".

It never was the plan to have this return unicode BTW.

What's the difference between "binary" and "bytes"? To me, bytes *means* binary.

> > > and likely could
> > > even be changed to *take* a bytes object as the destination buffer
> > > (ditto for files opened as 'raw').
> >
> > This already works -- bytes support the buffer API.
>
> I was thinking of...
>
>     buff = bytes(4096*[0])
>     received = sock.recv(buff)
>
> It's really only useful when you have a known protocol with fixed size
> blocks, but need it to run more or less forever.  By fixing the buffer
> size, you can have significantly reduced memory fragmentation.

You can do that already with recv_into(), which takes anything that
supports the writable buffer API.

> > > Then again, I've been "eh?" on the whole I/O library thing, and
> > > generally annoyed at the "everything is unicode" idea.
> >
> > Well, unless you remove the str type, how are you going to get rid of
> > the endless problems with unicode where mixing unicode and str
> > sometimes works and sometimes doesn't?
>
> Ooh, one of my favorite games!
>
> * Explicit <conversion to unicode> is better than implicit.
> * In the face of ambiguity, refuse the temptation to guess <what codec
> to use to decode the string>.
> * Errors <when adding strings to unicode> should never pass silently.
>
> There are at least two approaches to solving the problem:
> 1) make everything unicode
> 2) make all implicit conversions an error.

The plan is both.

> Adding strings to unicode should produce an exception.  The fact that it
> doesn't right now, I believe, is both a result of implementation details
> getting in the way of what should happen.

No, it was by design to make things more compatible. I think we can
say that was a mistake; but it was done for that reason, not for
reasons of implementation details.

> Remove the ambiguity, codec
> guessing, etc., raise a TypeError("cannot concatenate str and unicode
> objects"), and move on.
>
> Don't allow up-casting in u''.join() or ''.join() (or their equivalents
> in py3k).

So what would you use the str type for?

> > > Converting all
> > > libraries that currently deal with IO is going to be a pain, especially
> > > if it does any sort of parsing of mixed binary and non-unicode textual
> > > data (like http headers combined with binary posted data or a utf-8
> > > encoded stream).
> >
> > Yeah, I'm not looking forward to that, but I expect it'll be
> > relatively straightforward once we figure out the right patterns;
> > there's just a lot of code to convert. But that's the whole Py3k plan.
>
> No offense, but the plan to convert it all to use bytes, stinks.
> Starting with the API defined in PEP 358, I started converting smtpd (as
> an example), and I found myself *wanting* to use unicode because the
> whole numeric constants and/or bytes('unicode', 'latin-1') got really
> old really fast.

Have you actually looked at the Py3k implementation? It's quite
different from that PEP.

But nevertheless, it's a good experiment; I'll have a look at this myself.

> > > As a heavy user of quite a few of the current standard library IO
> > > modules (SocketServer, asyncore, urllib, socket, etc.) and as someone
> > > who has the "opportunity" to write line-level protocols, I'd be quite
> > > happy with the following...
> > >
> > > 1) add bytes (or add features to array)
> > > 2) rename unicode to text (or str)
> > > 3) renaming str to bin (or some other sufficiently clear name)
> >
> > So you'd have THREE types (bytes, text, bin)? Or are you proposing bin
> > instead of bytes, contrary to what you suggested above?
>
> While I would have some personal uses for bytes, all of them could be
> fulfilled with an expanded array type.

Well, that's what it is, but without the baggage of being able how it
maps to Python objects (that's up to the encode/decode operations
instead).

> If I could have my way
> <dreaming>I'd rename string and unicode, fold some of the features of
> bytes into array, and make socket, etc., return the renamed string
> type</dreaming>.

But which of the two renamed string types? The 8-bit or the unicode string?

> In the case of the standard library that deal with
> sockets, the only changes would generally be a replacing of 'const' to
> b'const'.  That could *almost* be automatic, and would be significantly
> faster (for a computer + human) than converting all of the .split(),
> .find(), etc., uses in the ftplib, *Server, smtplib, smtpd, etc. to
> bytes eqivalents (or converting to and from unicode).

Actually, while they don't exist now, I plan for the bytes type to
have .split() and .find() and most other string methods *except*
.lower() and .islower() and everything else that interprets bytes as
characters.

> It would take me perhaps 20 minutes to update asyncore, asynchat and
> smtpd with the b'binary' semantic.  Based on the last list of methods I
> saw for bytes in PEP 358, I would be, more or less, doing bytes.decode
> ('latin-1') instead of trying to deal with the *crippled* interface that
> bytes offers.

So forget that PEP and help adding these methods to the bytes type in
the p3yk branch.

The b"..." literal proposal is not unpleasant, as long as we can limit
it to ASCII characters and hex/octal escapes.

> Regardless, the performance of those modules would likely suffer when
> confronted with bytes rather than a renamed str, as the current bytes
> type lacks a large number of convenience methods, that I previously
> complained about it not having (which is why I brought up the string
> view and sample implementation in late August/early September 2006).

I think you misunderstood the plans for bytes. The plan is for the
performance with bytes to scream, in part because they are immutable
so one would occasionally save copying a buffer an extra time.

> > > 4) making string literals 'hello' be unicode
> > > 5) allow for b'constant' be the renamed str
> > > 6) add a mandatory 3rd argument to file/open which is the codec to use
> > > for reading
> >
> > And how does that help users or compatibility?
>
> Users who need binary literals (like every socket module in the standard
> library, anyone who does processing of any non-unicode disk/socket/pipe
> data, like marshal or pickle, etc.) wouldn't go insane and add bugs
> trying to switch to the bytes type, or add performance overhead trying
> to convert the received bytes to unicode to get a useful API.

Let's drop the hyperbole.

> > > 7) offer a new function for opening 'binary' files (which are opened as
> > > 'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will
> > > remove confusion on Windows platforms
> >
> > This is a red herring. Or I'm not sure I understand this part of your
> > proposal. What's wrong with 'rb'?
>
> Presumption:
>     a = open(filename, 'r' or 'w' ['+'], codec)
> will open a file as unicode in Py3k (if I am wrong, please correct me).

Right.

> Proposal:
>     b = somename(filename, 'r' or 'w' ['+'])
> will be equivalent to:
>     b = open(filename, 'rb' or 'wb' ['+'])
> today.  This prevents the confusion over different argument values
> resulting in different types being returned and accepted by certain
> methods.

Possibly. Though if we keep the 'rb' semantics for open() and this is
just an alias, I'm not sure what we gain except Two Ways To Do It.

In your view, what *do* we gain by using separate factories for binary
and text files? (Except some opportunity for static typechecking, as
binary files don't have the same API!)

> > > Indeed, it isn't as revolutionary as "everything is unicode", but it
> > > would allow the standard library to be updated with a relative minimum
> > > of fuss and muss, without needing to intermix...
> > >     x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...)
> > > or
> > >     sock.send(unicode.encode('latin-1'))
> >
> > Actually, with the renamings and everything, it's just about as
> > disruptive as the current proposal, so I'm unclear why you think this
> > is so different.
>
>     sock.send(b'Header: value\r\n')
>               ^
> The above change can be more or less automatic.  The below?
>
>     sock.send(bytes('Header: value\r\n', 'latin-1'))
>
>     sock.send('Header: value\r\n'.encode('latin-1'))
>
> Either of the above is 17 characters of noise that really shouldn't need
> to be there.

If the spelling of a bytes string with an ASCII character value is all
you are complaining about, you should have said so right away.

IMO the hard part with automatically converting sock.send('abc') to
either alternative is to know when when to convert and when not to
convert; the conversion itself is trivial using the sandbox/2to3
refactoring tool. You really should have a look at that.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Wed Feb 21 09:22:56 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 21 Feb 2007 00:22:56 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
References: <20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
Message-ID: <20070220231135.ADDD.JCARLSON@uci.edu>


"Guido van Rossum" <guido at python.org> wrote:
> On 2/20/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Better than returning unicode, but not as good as returning "binary".
> 
> It never was the plan to have this return unicode BTW.
> 
> What's the difference between "binary" and "bytes"? To me, bytes *means* binary.

Bytes as the type defined in PEP 358 and in the p3yk branch.  Binary is
a renamed Python 2.x str.


> > Ooh, one of my favorite games!
> >
> > * Explicit <conversion to unicode> is better than implicit.
> > * In the face of ambiguity, refuse the temptation to guess <what codec
> > to use to decode the string>.
> > * Errors <when adding strings to unicode> should never pass silently.
> >
> > There are at least two approaches to solving the problem:
> > 1) make everything unicode
> > 2) make all implicit conversions an error.
> 
> The plan is both.

Indeed, but this train of thought was more or less along the lines of
'rename str to binary, rename unicode to text, make adding binary and
text raise an exception'.


> > Adding strings to unicode should produce an exception.  The fact that it
> > doesn't right now, I believe, is both a result of implementation details
> > getting in the way of what should happen.
> 
> No, it was by design to make things more compatible. I think we can
> say that was a mistake; but it was done for that reason, not for
> reasons of implementation details.

Fair enough.  I didn't start using unicode until Python 2.3.


> > Remove the ambiguity, codec
> > guessing, etc., raise a TypeError("cannot concatenate str and unicode
> > objects"), and move on.
> >
> > Don't allow up-casting in u''.join() or ''.join() (or their equivalents
> > in py3k).
> 
> So what would you use the str type for?

The bytes API as defined in PEP 358 is crap.  Using that API for
anything involving sockets, file IO, marshal/pickle, etc., is worse than
writing in pure C.  But I'll get into how happy I am with that later.


> > No offense, but the plan to convert it all to use bytes, stinks.
> > Starting with the API defined in PEP 358, I started converting smtpd (as
> > an example), and I found myself *wanting* to use unicode because the
> > whole numeric constants and/or bytes('unicode', 'latin-1') got really
> > old really fast.
> 
> Have you actually looked at the Py3k implementation? It's quite
> different from that PEP.

Really?  The source tells me that it's more or less the same:
http://svn.python.org/view/python/branches/p3yk/Objects/bytesobject.c?rev=53064&view=auto

About the only thing it has gained is a .join() method, but seems to
have lost append, count, extend, index, insert, pop, remove.  From your
later comments, it seems as though the methods I'm looking for just
haven't been implemented yet, but are going in.


> > While I would have some personal uses for bytes, all of them could be
> > fulfilled with an expanded array type.
> 
> Well, that's what it is, but without the baggage of being able how it
> maps to Python objects (that's up to the encode/decode operations
> instead).

Except that bytes(...)[0] is an integer in range(256).  That smells like
array.array('B', ...) to me.


> > If I could have my way
> > <dreaming>I'd rename string and unicode, fold some of the features of
> > bytes into array, and make socket, etc., return the renamed string
> > type</dreaming>.
> 
> But which of the two renamed string types? The 8-bit or the unicode string?

8-bit; unicode strings being returned from sockets, os.read(), etc.,
would be a waste of time and memory.


> > In the case of the standard library that deal with
> > sockets, the only changes would generally be a replacing of 'const' to
> > b'const'.  That could *almost* be automatic, and would be significantly
> > faster (for a computer + human) than converting all of the .split(),
> > .find(), etc., uses in the ftplib, *Server, smtplib, smtpd, etc. to
> > bytes eqivalents (or converting to and from unicode).
> 
> Actually, while they don't exist now, I plan for the bytes type to
> have .split() and .find() and most other string methods *except*
> .lower() and .islower() and everything else that interprets bytes as
> characters.

Thank Guido.  If bytes gets those methods, then 30% of my concerns
regarding the unicode conversion go out the window.


> > It would take me perhaps 20 minutes to update asyncore, asynchat and
> > smtpd with the b'binary' semantic.  Based on the last list of methods I
> > saw for bytes in PEP 358, I would be, more or less, doing bytes.decode
> > ('latin-1') instead of trying to deal with the *crippled* interface that
> > bytes offers.
> 
> So forget that PEP and help adding these methods to the bytes type in
> the p3yk branch.
> 
> The b"..." literal proposal is not unpleasant, as long as we can limit
> it to ASCII characters and hex/octal escapes.

With a b"..." literal producing bytes (or even a renamed 8-bit string
type), another 30% of my concerns regarding the unicode conversion go
out the window.

Limiting it to ascii and hex\octal escapes is perfectly reasonable to me,
though I don't know enough about the underlying parser to know if such
restrictions are possible, with or without a defined coding: directive
at the beginning of the file.


> > Regardless, the performance of those modules would likely suffer when
> > confronted with bytes rather than a renamed str, as the current bytes
> > type lacks a large number of convenience methods, that I previously
> > complained about it not having (which is why I brought up the string
> > view and sample implementation in late August/early September 2006).
> 
> I think you misunderstood the plans for bytes. The plan is for the
> performance with bytes to scream, in part because they are immutable
> so one would occasionally save copying a buffer an extra time.

...mutable, but yeah - prior to your above statements saying 'we are
going to add find, split, and a bunch of other goodies', I was under the
impression that PEP 358 was more or less the API that we would be
getting - which just about made me cry, until I remembered Python 2.x .


> > > > 4) making string literals 'hello' be unicode
> > > > 5) allow for b'constant' be the renamed str
> > > > 6) add a mandatory 3rd argument to file/open which is the codec to use
> > > > for reading
> > >
> > > And how does that help users or compatibility?
> >
> > Users who need binary literals (like every socket module in the standard
> > library, anyone who does processing of any non-unicode disk/socket/pipe
> > data, like marshal or pickle, etc.) wouldn't go insane and add bugs
> > trying to switch to the bytes type, or add performance overhead trying
> > to convert the received bytes to unicode to get a useful API.
> 
> Let's drop the hyperbole.

If bytes didn't get .find(), .split(), (hopefully .partition()), etc.,
that isn't hyperbole.  The PEP 358 API is horrible.  With bytes getting
those methods, the above statements are no longer relevant.


> > Presumption:
> >     a = open(filename, 'r' or 'w' ['+'], codec)
> > will open a file as unicode in Py3k (if I am wrong, please correct me).
> 
> Right.
> 
> > Proposal:
> >     b = somename(filename, 'r' or 'w' ['+'])
> > will be equivalent to:
> >     b = open(filename, 'rb' or 'wb' ['+'])
> > today.  This prevents the confusion over different argument values
> > resulting in different types being returned and accepted by certain
> > methods.
> 
> Possibly. Though if we keep the 'rb' semantics for open() and this is
> just an alias, I'm not sure what we gain except Two Ways To Do It.

Well, if we moved bytes reading/writing off to the alternate constructor,
then there would be one way to open a file containing unicode, and
another way to open a file containing binary data, which by definition
isn't text, so we should be able to ignore '\r\n' conversions (though I
would miss it, it may be a good idea).


> In your view, what *do* we gain by using separate factories for binary
> and text files? (Except some opportunity for static typechecking, as
> binary files don't have the same API!)

At one time there was a fairly substantial argument over foo(a, b)
returning different types if the *value* of b changed, or in the case of
a.foo(b). For example...

    def decode_codec(a, b):
        return a.decode(b)

    decode_codec('68656c6c6f20776f726c64', 'hex') -> 'hello world'
    decode_codec('hello world', 'latin-1') -> u'hello world'

By offering a secondary function that *only* dealt with the reading and
writing of bytes (or 8-bit renamed str), then we wouldn't have to worry
about...

    open(filename, 'r', 'latin-1').read()
    open(filename, 'r').read()

returning different types.  The latter would be spelled...

    somename(filename, 'r')

And it would be obvious to all readers that one is opening a binary file
and should expect to have .read() return bytes.


> >     sock.send(b'Header: value\r\n')
> >               ^
> > The above change can be more or less automatic.  The below?
> >
> >     sock.send(bytes('Header: value\r\n', 'latin-1'))
> >
> >     sock.send('Header: value\r\n'.encode('latin-1'))
> >
> > Either of the above is 17 characters of noise that really shouldn't need
> > to be there.
> 
> If the spelling of a bytes string with an ASCII character value is all
> you are complaining about, you should have said so right away.

Not just bytes with ascii character values, but not needing to jump
through hoops to send, write, etc., more or less 'fixed' data to a
handle.


> IMO the hard part with automatically converting sock.send('abc') to
> either alternative is to know when when to convert and when not to
> convert; the conversion itself is trivial using the sandbox/2to3
> refactoring tool. You really should have a look at that.

In a few weeks when I'm done with my thesis defense.


If one adds my "concerns are reduced by X%" statements above, one will
notice that it only adds to 60%.  The remaining 40% of my concerns are
more or less related to the pain of conversion.  b"..." and a usable
bytes API do help things significantly, but all conversions are a pain,
especially with a standard library the size of Python's.

A pessimist would say, "leave everything as it is, but make str+unicode
raise an exception" - and aside from pointing to the half-dozen "you are
so wrong" posts in response to my "unicode is easy" claim some time last
year, it would be hard for me to disagree with the "don't change"
position.  I'm sure I can get along *after* the changes, but the changes
aren't going to be pleasant.  Speaking of which, do all of the modules
have maintainers?  Make the maintainers convert them!


 - Josiah


From ncoghlan at gmail.com  Wed Feb 21 13:41:15 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 21 Feb 2007 22:41:15 +1000
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <ca471dc20702201613l67b11539g2aa9bffcd90224d9@mail.gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>	
	<45DAF693.1000004@gmail.com>
	<ca471dc20702201613l67b11539g2aa9bffcd90224d9@mail.gmail.com>
Message-ID: <45DC3DEB.1060702@gmail.com>

Guido van Rossum wrote:
> On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> My apologies for rambling a bit - I can't currently give a succinct
>> explanation for why the current direction feels wrong, but I felt it was
>> worth supporting Raymond on this point.
> 
> Apologies accepted -- but yes, you did ramble a bit, and I still wish
> you'd collected your thoughts a bit more. if there are simple clear
> arguments it's easier for me to accept or reject them than with a
> bunch of ramblings. Sorry to be grumpy, but given the implementation
> stage this is in and how long the PEP has been sitting unchanged I'm a
> bit annoyed that the criticism, valid or not, comes so late.

Views that don't allow you to modify the contents of the original 
mapping make me *much* happier - I hadn't realised you'd left that 
aspect out of the implementation.

Simply having different views of the underlying object is something with 
a strong precedent in normal iterators (and, in fact, it would be 
perfectly possible to *teach* keys(), values() and items() that way, 
leaving the introduction of their other features until later in the 
learning process).

Using multiple access points to edit the same data set, while a powerful 
idea, can be pretty difficult to keep straight while writing code - and 
I think having such a feature in the basic dict API is what was really 
bothering me. (It bothers me significantly less in more advanced API's, 
like NumPy, or the attrview wrapper class recipe)

As penance for my doubts, I've committed fixes for various dict-related 
test failures in the py3k branch :)

Cheers,
Nick.

P.S. I don't have bsddb in my devel tree, so I couldn't fix that, and 
there are a couple of other failures that require further investigation 
to figure out what is going on. I updated the BROKEN file to reflect the 
current status (that seems like a good way to avoid cluttering the SF 
tracker until we get the tree into a state where we want to start 
running buildbots on it)

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Wed Feb 21 17:01:40 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 21 Feb 2007 08:01:40 -0800
Subject: [Python-3000] Thoughts on dictionary views
In-Reply-To: <45DC3DEB.1060702@gmail.com>
References: <001701c754cc$f0f1aba0$6d00a8c0@RaymondLaptop1>
	<45DAF693.1000004@gmail.com>
	<ca471dc20702201613l67b11539g2aa9bffcd90224d9@mail.gmail.com>
	<45DC3DEB.1060702@gmail.com>
Message-ID: <ca471dc20702210801y33d74ad1rec9354273bddedc6@mail.gmail.com>

On 2/21/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> As penance for my doubts, I've committed fixes for various dict-related
> test failures in the py3k branch :)

Thanks!!! That was well beyond penance. :-)

> Cheers,
> Nick.
>
> P.S. I don't have bsddb in my devel tree, so I couldn't fix that, and
> there are a couple of other failures that require further investigation
> to figure out what is going on. I updated the BROKEN file to reflect the
> current status (that seems like a good way to avoid cluttering the SF
> tracker until we get the tree into a state where we want to start
> running buildbots on it)

Right -- thanks for this too.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Wed Feb 21 19:03:10 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 21 Feb 2007 13:03:10 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
Message-ID: <fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>

On 2/21/07, Guido van Rossum <guido at python.org> wrote:
> If the spelling of a bytes string with an ASCII character value is all
> you are complaining about, you should have said so right away.

That is my main objection.

A literal form does clear it up, though I'm not sure "b" is the right
prefix.  (I keep wanting to read "binary" or "boolean", rather than
"ASCII")

To be honest, it would probably be enough if there were an ascii
builtin, or if the example uses of the bytes constructor showed

    bytes(text)   # no encoding

just copying the low-order byte, and raising exceptions if any
high-order bytes were non-zero.

-jJ

From jimjjewett at gmail.com  Wed Feb 21 19:11:17 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 21 Feb 2007 13:11:17 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
Message-ID: <fb6fbf560702211011p195e2db1ic92a3b2671f478bd@mail.gmail.com>

Are bytes supposed to be mutable?

Josiah:
> even be changed to *take* a bytes object as the destination buffer

Guido:
> This already works -- bytes support the buffer API.

but later:

> I think you misunderstood the plans for bytes. The plan is for the
> performance with bytes to scream, in part because they are immutable
> so one would occasionally save copying a buffer an extra time.

Or did you mean that (C code only?) could pass a newly constructed
bytes object to be filled in?

Josiah mentioned several dropped methods (append, extend, remove, pop)
that don't really make sense with an immutable.  Was this just a
set-difference observation, or are those methods you actually need on
a bytes type?

-jJ

From guido at python.org  Wed Feb 21 19:13:34 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 21 Feb 2007 10:13:34 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <fb6fbf560702211011p195e2db1ic92a3b2671f478bd@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211011p195e2db1ic92a3b2671f478bd@mail.gmail.com>
Message-ID: <ca471dc20702211013n3e1a19b7hdd07ec6693ba6866@mail.gmail.com>

Sorry, that was an unfortunate typo. bytes are Mutable. (It's the same
as in Java, really.)

On 2/21/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> Are bytes supposed to be mutable?
>
> Josiah:
> > even be changed to *take* a bytes object as the destination buffer
>
> Guido:
> > This already works -- bytes support the buffer API.
>
> but later:
>
> > I think you misunderstood the plans for bytes. The plan is for the
> > performance with bytes to scream, in part because they are immutable
> > so one would occasionally save copying a buffer an extra time.
>
> Or did you mean that (C code only?) could pass a newly constructed
> bytes object to be filled in?
>
> Josiah mentioned several dropped methods (append, extend, remove, pop)
> that don't really make sense with an immutable.  Was this just a
> set-difference observation, or are those methods you actually need on
> a bytes type?

Even though bytes are mutable sequences, I'm not sure that they need
to support every method that lists have. I expect an in-place +=
operator solves most needs. Slices copy.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jcarlson at uci.edu  Wed Feb 21 19:32:50 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 21 Feb 2007 10:32:50 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
References: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
Message-ID: <20070221102243.ADF1.JCARLSON@uci.edu>


"Jim Jewett" <jimjjewett at gmail.com> wrote:
> 
> On 2/21/07, Guido van Rossum <guido at python.org> wrote:
> > If the spelling of a bytes string with an ASCII character value is all
> > you are complaining about, you should have said so right away.
> 
> That is my main objection.
> 
> A literal form does clear it up, though I'm not sure "b" is the right
> prefix.  (I keep wanting to read "binary" or "boolean", rather than
> "ASCII")
> 
> To be honest, it would probably be enough if there were an ascii
> builtin, or if the example uses of the bytes constructor showed
> 
>     bytes(text)   # no encoding
> 
> just copying the low-order byte, and raising exceptions if any
> high-order bytes were non-zero.

That's more or less changing the signature of bytes to be bytes(<text>,
codec='ascii'), but it breaks when faced with hex or octal escapes
greater than 127.  Making it codec='latin-1' is marginally better, but
having a default, regardless of the default, is begging for trouble
(especially when dealing with unicode).

 - Josiah


From guido at python.org  Wed Feb 21 20:22:50 2007
From: guido at python.org (Guido van Rossum)
Date: Wed, 21 Feb 2007 11:22:50 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <20070221102243.ADF1.JCARLSON@uci.edu>
References: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<20070221102243.ADF1.JCARLSON@uci.edu>
Message-ID: <ca471dc20702211122u10505688yea2db83c5215d04c@mail.gmail.com>

Right. The b"..." literal doesn't have this problem because problems
always show up in the bytecode compilation stage; that's the beauty of
b"...". Patch anyone?

On 2/21/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Jim Jewett" <jimjjewett at gmail.com> wrote:
> >
> > On 2/21/07, Guido van Rossum <guido at python.org> wrote:
> > > If the spelling of a bytes string with an ASCII character value is all
> > > you are complaining about, you should have said so right away.
> >
> > That is my main objection.
> >
> > A literal form does clear it up, though I'm not sure "b" is the right
> > prefix.  (I keep wanting to read "binary" or "boolean", rather than
> > "ASCII")
> >
> > To be honest, it would probably be enough if there were an ascii
> > builtin, or if the example uses of the bytes constructor showed
> >
> >     bytes(text)   # no encoding
> >
> > just copying the low-order byte, and raising exceptions if any
> > high-order bytes were non-zero.
>
> That's more or less changing the signature of bytes to be bytes(<text>,
> codec='ascii'), but it breaks when faced with hex or octal escapes
> greater than 127.  Making it codec='latin-1' is marginally better, but
> having a default, regardless of the default, is begging for trouble
> (especially when dealing with unicode).
>
>  - Josiah
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Thu Feb 22 01:21:23 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 22 Feb 2007 13:21:23 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
Message-ID: <45DCE203.2010804@canterbury.ac.nz>

Jim Jewett wrote:

> A literal form does clear it up, though I'm not sure "b" is the right
> prefix.  (I keep wanting to read "binary" or "boolean", rather than
> "ASCII")

It means "bytes". The ASCII part is that you've
written characters in quotes after it.

--
Greg

From guido at python.org  Fri Feb 23 01:01:42 2007
From: guido at python.org (Guido van Rossum)
Date: Thu, 22 Feb 2007 16:01:42 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45DCE203.2010804@canterbury.ac.nz>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
Message-ID: <ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>

FWIW, I've updated PEP 358 (the bytes object) to more closely reflect
my plans for it, showing the preservation of most string methods. It
should be updated on the website in a few minutes.

If someone would like to volunteer a small PEP on the b"..." literal I
would appreciate it. The main concern here is that bytes objects are
mutable; I think the right semantics will be that each time a b"..."
literal is evaluated a *new* bytes object is created, just like [1, 2,
3] constructs a new list each time it is evaluated. The alternative
would be a literal that could be modified in place, which reminds me
of the worst of Fortran.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Fri Feb 23 02:11:36 2007
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 22 Feb 2007 20:11:36 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com><20070220205651.ADD7.JCARLSON@uci.edu><ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com><fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com><45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
Message-ID: <erlf07$q0p$1@sea.gmane.org>


"Guido van Rossum" <guido at python.org> wrote in message 
news:ca471dc20702221601s703a9fe4i8fc69810b8fb8da6 at mail.gmail.com...
| FWIW, I've updated PEP 358 (the bytes object) to more closely reflect
| my plans for it, showing the preservation of most string methods. It
| should be updated on the website in a few minutes.
|
| If someone would like to volunteer a small PEP on the b"..." literal I
| would appreciate it. The main concern here is that bytes objects are
| mutable; I think the right semantics will be that each time a b"..."
| literal is evaluated a *new* bytes object is created, just like [1, 2,
| 3] constructs a new list each time it is evaluated.

I always thought that aliasing of immutable objects was an 
implementation-dependent optimization, so that seems right.  Certainly,

a=[]
for i in range(3): a.append(b'bytes')

had better append three separate objects.  Someone who wants just one can 
write

a = 3*[b'bytes']

Is a separate PEP really needed, rather that a few lines in PEP358?

tjr





From jason.orendorff at gmail.com  Fri Feb 23 17:29:02 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Fri, 23 Feb 2007 11:29:02 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
Message-ID: <bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>

On 2/22/07, Guido van Rossum <guido at python.org> wrote:
> If someone would like to volunteer a small PEP on the b"..." literal I
> would appreciate it.

I'll do this, unless someone tells me not to.  A few questions.

The grammar for string literals is already changing in py3k (removing
the tolerance of bogus escape sequences and the u"" prefix, I think).
Is the new grammar documented anywhere?  p3yk/Doc/ref/ref2.tex seems
to still have the 2.x grammar, and I didn't see anything in the PEPs.

How do you feel about raw byte-strings (br'a\b\c') and long
byte-strings (b'''...''')?

> The main concern here is that bytes objects are
> mutable; I think the right semantics will be that each time a b"..."
> literal is evaluated a *new* bytes object is created, just like [1, 2,
> 3] constructs a new list each time it is evaluated. The alternative
> would be a literal that could be modified in place, which reminds me
> of the worst of Fortran.

Yes, that seems clear.

-j

From bwinton at latte.ca  Fri Feb 23 17:48:58 2007
From: bwinton at latte.ca (Blake Winton)
Date: Fri, 23 Feb 2007 11:48:58 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>	<20070220205651.ADD7.JCARLSON@uci.edu>	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>	<45DCE203.2010804@canterbury.ac.nz>	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
Message-ID: <45DF1AFA.2090909@latte.ca>

Jason Orendorff wrote:
> On 2/22/07, Guido van Rossum <guido at python.org> wrote:
>> If someone would like to volunteer a small PEP on the b"..." literal I
>> would appreciate it.
> How do you feel about raw byte-strings (br'a\b\c')

Not that my opinion particularly matters, but I would say "sure" to this 
one.  On the other hand, I really don't use raw strings that often, and 
the places I do are pretty much solely regexes, which shouldn't really 
be passed bytes.

 > and long byte-strings (b'''...''')?

What would:
b"""abc
def"""
translate into, exactly?
[ 97, 98, 99, 10, 100, 101, 102 ]?
[ 97, 98, 99, 13, 10, 100, 101, 102 ]?
Platform-dependent?  (Ewwww!)

Later,
Blake.

From g.brandl at gmx.net  Fri Feb 23 18:09:20 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 23 Feb 2007 18:09:20 +0100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45DF1AFA.2090909@latte.ca>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>	<20070220205651.ADD7.JCARLSON@uci.edu>	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>	<45DCE203.2010804@canterbury.ac.nz>	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca>
Message-ID: <ern741$3pt$1@sea.gmane.org>

Blake Winton schrieb:
> Jason Orendorff wrote:
>> On 2/22/07, Guido van Rossum <guido at python.org> wrote:
>>> If someone would like to volunteer a small PEP on the b"..." literal I
>>> would appreciate it.
>> How do you feel about raw byte-strings (br'a\b\c')
> 
> Not that my opinion particularly matters, but I would say "sure" to this 
> one.  On the other hand, I really don't use raw strings that often, and 
> the places I do are pretty much solely regexes, which shouldn't really 
> be passed bytes.
> 
>  > and long byte-strings (b'''...''')?
> 
> What would:
> b"""abc
> def"""
> translate into, exactly?
> [ 97, 98, 99, 10, 100, 101, 102 ]?
> [ 97, 98, 99, 13, 10, 100, 101, 102 ]?
> Platform-dependent?  (Ewwww!)

The same that """abc
def""" translates to today, which is, "abc\ndef" on every platform.

Georg


From thomas at python.org  Fri Feb 23 18:17:08 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 23 Feb 2007 09:17:08 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
Message-ID: <9e804ac0702230917j2112c083ld49704657365d77e@mail.gmail.com>

I'm not telling you not to do this, but I already wrote a preliminary patch
(well, it's not actually *working* yet, but the hard part, the grammar
changes, are working ;) Of course, it may be fun to compare implementations.

On 2/23/07, Jason Orendorff <jason.orendorff at gmail.com> wrote:
>
> On 2/22/07, Guido van Rossum <guido at python.org> wrote:
> > If someone would like to volunteer a small PEP on the b"..." literal I
> > would appreciate it.
>
> I'll do this, unless someone tells me not to.  A few questions.
>
> The grammar for string literals is already changing in py3k (removing
> the tolerance of bogus escape sequences and the u"" prefix, I think).
> Is the new grammar documented anywhere?  p3yk/Doc/ref/ref2.tex seems
> to still have the 2.x grammar, and I didn't see anything in the PEPs.
>
> How do you feel about raw byte-strings (br'a\b\c') and long
> byte-strings (b'''...''')?
>
> > The main concern here is that bytes objects are
> > mutable; I think the right semantics will be that each time a b"..."
> > literal is evaluated a *new* bytes object is created, just like [1, 2,
> > 3] constructs a new list each time it is evaluated. The alternative
> > would be a literal that could be modified in place, which reminds me
> > of the worst of Fortran.
>
> Yes, that seems clear.
>
> -j
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/thomas%40python.org
>



-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070223/5d7a36aa/attachment.html 

From brett at python.org  Fri Feb 23 18:16:52 2007
From: brett at python.org (Brett Cannon)
Date: Fri, 23 Feb 2007 09:16:52 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
Message-ID: <bbaeab100702230916l624c54b7p2baf991912009e4a@mail.gmail.com>

On 2/23/07, Jason Orendorff <jason.orendorff at gmail.com> wrote:
> On 2/22/07, Guido van Rossum <guido at python.org> wrote:
> > If someone would like to volunteer a small PEP on the b"..." literal I
> > would appreciate it.
>
> I'll do this, unless someone tells me not to.  A few questions.
>

Thomas Wouters has been working on it while here at PyCon.  I would
wait to see what he says before you dive into it.

-Brett

> The grammar for string literals is already changing in py3k (removing
> the tolerance of bogus escape sequences and the u"" prefix, I think).
> Is the new grammar documented anywhere?  p3yk/Doc/ref/ref2.tex seems
> to still have the 2.x grammar, and I didn't see anything in the PEPs.
>
> How do you feel about raw byte-strings (br'a\b\c') and long
> byte-strings (b'''...''')?
>
> > The main concern here is that bytes objects are
> > mutable; I think the right semantics will be that each time a b"..."
> > literal is evaluated a *new* bytes object is created, just like [1, 2,
> > 3] constructs a new list each time it is evaluated. The alternative
> > would be a literal that could be modified in place, which reminds me
> > of the worst of Fortran.
>
> Yes, that seems clear.
>
> -j
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/brett%40python.org
>

From thomas at python.org  Fri Feb 23 19:02:53 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 23 Feb 2007 10:02:53 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702211122u10505688yea2db83c5215d04c@mail.gmail.com>
References: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<20070221102243.ADF1.JCARLSON@uci.edu>
	<ca471dc20702211122u10505688yea2db83c5215d04c@mail.gmail.com>
Message-ID: <9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com>

On 2/21/07, Guido van Rossum <guido at python.org> wrote:
>
> Patch anyone?


See attachement. It's preliminary -- it just calls the global name 'bytes'
currently (and not even using the 'right' AST concretion mechanism) which
means you can override what the bytes literal creates by assigning to
'bytes' (although I'm sure there's people out there that would love to keep
it that way ;-P) It should probably get its own bytecode (no pun intended.)

On 2/21/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > "Jim Jewett" <jimjjewett at gmail.com> wrote:
> > >
> > > On 2/21/07, Guido van Rossum <guido at python.org> wrote:
> > > > If the spelling of a bytes string with an ASCII character value is
> all
> > > > you are complaining about, you should have said so right away.
> > >
> > > That is my main objection.
> > >
> > > A literal form does clear it up, though I'm not sure "b" is the right
> > > prefix.  (I keep wanting to read "binary" or "boolean", rather than
> > > "ASCII")
> > >
> > > To be honest, it would probably be enough if there were an ascii
> > > builtin, or if the example uses of the bytes constructor showed
> > >
> > >     bytes(text)   # no encoding
> > >
> > > just copying the low-order byte, and raising exceptions if any
> > > high-order bytes were non-zero.
> >
> > That's more or less changing the signature of bytes to be bytes(<text>,
> > codec='ascii'), but it breaks when faced with hex or octal escapes
> > greater than 127.  Making it codec='latin-1' is marginally better, but
> > having a default, regardless of the default, is begging for trouble
> > (especially when dealing with unicode).
> >
> >  - Josiah
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/thomas%40python.org
>



-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070223/4a12c812/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bytesliteral.diff
Type: text/x-patch
Size: 7011 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070223/4a12c812/attachment-0001.bin 

From jason.orendorff at gmail.com  Fri Feb 23 19:49:15 2007
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Fri, 23 Feb 2007 13:49:15 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com>
References: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<20070221102243.ADF1.JCARLSON@uci.edu>
	<ca471dc20702211122u10505688yea2db83c5215d04c@mail.gmail.com>
	<9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com>
Message-ID: <bb8868b90702231049p530720f8i66215433afbaa34e@mail.gmail.com>

On 2/23/07, Thomas Wouters <thomas at python.org> wrote:
> On 2/21/07, Guido van Rossum <guido at python.org> wrote:
> > Patch anyone?
>
> See attachement. It's preliminary -- it just calls the global name 'bytes'
> currently (and not even using the 'right' AST concretion mechanism) which
> means you can override what the bytes literal creates by assigning to
> 'bytes' (although I'm sure there's people out there that would love to keep
> it that way ;-P) It should probably get its own bytecode (no pun intended.)

Cool!  I finished writing up the PEP about the same time I got this,
but the PEP isn't executable. :)  I would attach it, but I have a
feeling the PEP is probably unnecessary at this point...?

-j

From g.brandl at gmx.net  Fri Feb 23 22:22:16 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 23 Feb 2007 22:22:16 +0100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <bb8868b90702231049p530720f8i66215433afbaa34e@mail.gmail.com>
References: <ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>	<20070221102243.ADF1.JCARLSON@uci.edu>	<ca471dc20702211122u10505688yea2db83c5215d04c@mail.gmail.com>	<9e804ac0702231002p58f99b7bg557116c3168de633@mail.gmail.com>
	<bb8868b90702231049p530720f8i66215433afbaa34e@mail.gmail.com>
Message-ID: <ernlu8$qnb$1@sea.gmane.org>

Jason Orendorff schrieb:
> On 2/23/07, Thomas Wouters <thomas at python.org> wrote:
>> On 2/21/07, Guido van Rossum <guido at python.org> wrote:
>> > Patch anyone?
>>
>> See attachement. It's preliminary -- it just calls the global name 'bytes'
>> currently (and not even using the 'right' AST concretion mechanism) which
>> means you can override what the bytes literal creates by assigning to
>> 'bytes' (although I'm sure there's people out there that would love to keep
>> it that way ;-P) It should probably get its own bytecode (no pun intended.)
> 
> Cool!  I finished writing up the PEP about the same time I got this,
> but the PEP isn't executable. :)  I would attach it, but I have a
> feeling the PEP is probably unnecessary at this point...?

Not really - I wrote one for the print function too, when most of the semantics
were already fixed - but I think this could be added to the existing bytes
PEP, as a new section.

Georg


From greg.ewing at canterbury.ac.nz  Sat Feb 24 00:40:10 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 24 Feb 2007 12:40:10 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45DF1AFA.2090909@latte.ca>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca>
Message-ID: <45DF7B5A.7080503@canterbury.ac.nz>

Blake Winton wrote:

> What would:
> b"""abc
> def"""
> translate into, exactly?
> [ 97, 98, 99, 10, 100, 101, 102 ]?
> [ 97, 98, 99, 13, 10, 100, 101, 102 ]?
> Platform-dependent?  (Ewwww!)

No, presumably it would always translate the newline
into "\n" regardless of platform, as with current strings.

--
Greg

From thomas at python.org  Sat Feb 24 01:09:20 2007
From: thomas at python.org (Thomas Wouters)
Date: Fri, 23 Feb 2007 16:09:20 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45DF7B5A.7080503@canterbury.ac.nz>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
Message-ID: <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>

That's exactly what it does in current p3yk:

Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> b"""abc
... def"""
bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66])

On 2/23/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Blake Winton wrote:
>
> > What would:
> > b"""abc
> > def"""
> > translate into, exactly?
> > [ 97, 98, 99, 10, 100, 101, 102 ]?
> > [ 97, 98, 99, 13, 10, 100, 101, 102 ]?
> > Platform-dependent?  (Ewwww!)
>
> No, presumably it would always translate the newline
> into "\n" regardless of platform, as with current strings.
>
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/thomas%40python.org
>



-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070223/653a8b6b/attachment.html 

From g.brandl at gmx.net  Sat Feb 24 11:20:46 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 24 Feb 2007 11:20:46 +0100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>	<20070220205651.ADD7.JCARLSON@uci.edu>	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>	<45DCE203.2010804@canterbury.ac.nz>	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>	<45DF1AFA.2090909@latte.ca>
	<45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
Message-ID: <erp3hu$36b$1@sea.gmane.org>

Thomas Wouters schrieb:
> 
> That's exactly what it does in current p3yk:
> 
> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03)
> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> b"""abc
> ... def"""
> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66])

Seeing that, I made a patch that makes bytes_repr output a bytes literal,
see attached diff.

Happy PyCon-ing,
Georg

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bytes-repr.diff
Url: http://mail.python.org/pipermail/python-3000/attachments/20070224/cac06f37/attachment.diff 

From rasky at develer.com  Sat Feb 24 12:21:04 2007
From: rasky at develer.com (Giovanni Bajo)
Date: Sat, 24 Feb 2007 12:21:04 +0100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <erp3hu$36b$1@sea.gmane.org>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>	<20070220205651.ADD7.JCARLSON@uci.edu>	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>	<45DCE203.2010804@canterbury.ac.nz>	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>	<45DF1AFA.2090909@latte.ca>	<45DF7B5A.7080503@canterbury.ac.nz>	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org>
Message-ID: <erp730$c5d$1@sea.gmane.org>

On 24/02/2007 11.20, Georg Brandl wrote:

> Thomas Wouters schrieb:
>>
>> That's exactly what it does in current p3yk:
>>
>> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03)
>> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>>  >>> b"""abc
>> ... def"""
>> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66])
> 
> Seeing that, I made a patch that makes bytes_repr output a bytes literal,
> see attached diff.

I thought that the repr format of bytes was a deliberate choice to make life 
harder to people trying to use bytes to handle text.
-- 
Giovanni Bajo


From g.brandl at gmx.net  Sat Feb 24 13:50:23 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 24 Feb 2007 13:50:23 +0100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <erp730$c5d$1@sea.gmane.org>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>	<20070220205651.ADD7.JCARLSON@uci.edu>	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>	<45DCE203.2010804@canterbury.ac.nz>	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>	<45DF1AFA.2090909@latte.ca>	<45DF7B5A.7080503@canterbury.ac.nz>	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>	<erp3hu$36b$1@sea.gmane.org>
	<erp730$c5d$1@sea.gmane.org>
Message-ID: <erpcaf$qiq$1@sea.gmane.org>

Giovanni Bajo schrieb:
> On 24/02/2007 11.20, Georg Brandl wrote:
> 
>> Thomas Wouters schrieb:
>>>
>>> That's exactly what it does in current p3yk:
>>>
>>> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03)
>>> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>  >>> b"""abc
>>> ... def"""
>>> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66])
>> 
>> Seeing that, I made a patch that makes bytes_repr output a bytes literal,
>> see attached diff.
> 
> I thought that the repr format of bytes was a deliberate choice to make life 
> harder to people trying to use bytes to handle text.

That contradicts the "consenting adults" mantra. If a bytes object contains
readable text (and that's not going to be exceptional), it should not be
obscured -- in any case, I can just call str() on it and get my text.

PEP 358 now states "Now that a b"..." literal exists, shouldn't repr()
return one?" which suggests that the repr was the most canonical way
to represent the bytes object at a time when there was no literal.

Georg


From guido at python.org  Sat Feb 24 15:25:59 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 24 Feb 2007 08:25:59 -0600
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <erpcaf$qiq$1@sea.gmane.org>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <erp730$c5d$1@sea.gmane.org>
	<erpcaf$qiq$1@sea.gmane.org>
Message-ID: <ca471dc20702240625v34511a74t114eac4fe48e756e@mail.gmail.com>

Georg is channeling me well. Also, my thinking has evolved some after
talking to various folks here at PyCon.

Georg, please check it in! Feel free to update the PEP if you will.

--Guido

On 2/24/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Giovanni Bajo schrieb:
> > On 24/02/2007 11.20, Georg Brandl wrote:
> >
> >> Thomas Wouters schrieb:
> >>>
> >>> That's exactly what it does in current p3yk:
> >>>
> >>> Python 3.0x (p3yk:53867M, Feb 23 2007, 20:06:03)
> >>> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
> >>> Type "help", "copyright", "credits" or "license" for more information.
> >>>  >>> b"""abc
> >>> ... def"""
> >>> bytes([0x61, 0x62, 0x63, 0x0a, 0x64, 0x65, 0x66])
> >>
> >> Seeing that, I made a patch that makes bytes_repr output a bytes literal,
> >> see attached diff.
> >
> > I thought that the repr format of bytes was a deliberate choice to make life
> > harder to people trying to use bytes to handle text.
>
> That contradicts the "consenting adults" mantra. If a bytes object contains
> readable text (and that's not going to be exceptional), it should not be
> obscured -- in any case, I can just call str() on it and get my text.
>
> PEP 358 now states "Now that a b"..." literal exists, shouldn't repr()
> return one?" which suggests that the repr was the most canonical way
> to represent the bytes object at a time when there was no literal.
>
> Georg
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From python-dev at zesty.ca  Sat Feb 24 20:07:54 2007
From: python-dev at zesty.ca (Ka-Ping Yee)
Date: Sat, 24 Feb 2007 13:07:54 -0600 (CST)
Subject: [Python-3000] Bytes <-> string conversion methods
Message-ID: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>

Hi Guido,

I'm in your keynote and looking at a slide right now that says

    * bytes has .encode() method returning a string
    * str has a .decode() method returning bytes

Should the names of those two methods be swapped?  I think it
makes more sense to say that an encoding is something that
transforms a string into a sequence of bytes.


-- ?!ng

From g.brandl at gmx.net  Sat Feb 24 20:44:46 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 24 Feb 2007 20:44:46 +0100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ca471dc20702240625v34511a74t114eac4fe48e756e@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>	<45DCE203.2010804@canterbury.ac.nz>	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>	<45DF1AFA.2090909@latte.ca>
	<45DF7B5A.7080503@canterbury.ac.nz>	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>	<erp3hu$36b$1@sea.gmane.org>
	<erp730$c5d$1@sea.gmane.org>	<erpcaf$qiq$1@sea.gmane.org>
	<ca471dc20702240625v34511a74t114eac4fe48e756e@mail.gmail.com>
Message-ID: <erq4je$s8$1@sea.gmane.org>

Guido van Rossum schrieb:
> Georg is channeling me well. Also, my thinking has evolved some after
> talking to various folks here at PyCon.
> 
> Georg, please check it in! Feel free to update the PEP if you will.

I will, if you answer me one question: in Python 2.6, should the repr()
return "bytes(<string literal>)" or still "bytes(<list of ints>)"?

Georg


From collinw at gmail.com  Sat Feb 24 22:27:43 2007
From: collinw at gmail.com (Collin Winter)
Date: Sat, 24 Feb 2007 15:27:43 -0600
Subject: [Python-3000] Transition to Python 3's raise syntax
Message-ID: <43aa6ff70702241327m67a70812odd414ad2c0428db2@mail.gmail.com>

(Finally getting back around to this)

On 2/9/07, Phillip J. Eby <pje at telecommunity.com> wrote:
[snip]
> Hm.  Actually, that's not necessary.  We could include .with_traceback(T)
> in 2.6, and just have old-style except: clauses delete the traceback from
> the returned objects.  New-style except: clauses would work just as they
> would in 3.0.

What do you mean by "new-style" and "old-style except: clauses"? Are
"new-style" except clauses the ones spelled "except E as NAME" while
"old-style" ones are spelled "except E, NAME"?

> To summarize, in 2.6 we could support .with_traceback() and create
> exception instances with traceback attributes, but the old-style except:
> clauses could discard them to prevent cycles.

Clear enough.

> Raising an exception
> instance with a __traceback__ attribute would get some special handling so
> that it's equivalent to 3-argument raise in today's Python.  Likewise,
> generator.throw() would need the same special handling in 2.6.

What happens in this case:

e = Exception()
e.__traceback__ = T1
raise Exception, e, T2

Which traceback takes precedence? My preference would be to raise an
exception in this case.

Collin Winter

From pje at telecommunity.com  Sat Feb 24 23:23:53 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sat, 24 Feb 2007 17:23:53 -0500
Subject: [Python-3000] Transition to Python 3's raise syntax
In-Reply-To: <43aa6ff70702241327m67a70812odd414ad2c0428db2@mail.gmail.co
 m>
Message-ID: <5.1.1.6.0.20070224172003.01be64d8@sparrow.telecommunity.com>

At 03:27 PM 2/24/2007 -0600, Collin Winter wrote:
>Are "new-style" except clauses the ones spelled "except E as NAME" while
>"old-style" ones are spelled "except E, NAME"?

Yes.


>What happens in this case:
>
>e = Exception()
>e.__traceback__ = T1
>raise Exception, e, T2
>
>Which traceback takes precedence? My preference would be to raise an
>exception in this case.

Hm.  How would you get that case in normal code?  I guess if you had a 
new-style except: that then used a 3-argument raise, you could end up with 
that.  I'm not sure if that's really a problem though.

In 2.6, we'll still be using the old exception machinery, so the raise will 
"do the right thing", it's just that the exception instance will have a 
redundant __traceback__.

I guess, if anything, my inclination would be to have the three-argument 
"raise" delete e.__traceback__.  T2 will get put on it if it's caught by a 
new-style except: clause.


From oliphant.travis at ieee.org  Sat Feb 24 23:47:12 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Sat, 24 Feb 2007 15:47:12 -0700
Subject: [Python-3000] Pre-PEP: Altering buffer protocol (tp_as_buffer)
In-Reply-To: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>
References: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>
Message-ID: <erqf99$tq5$1@sea.gmane.org>


Hi everybody,

It was great to see so many of you at PyCon --- even if we saw each 
other for too long during the PSF meeting.

After hearing Guido's keynote talk today and realizing that the alpha 
release of Python 3.0 is so soon,  I decided that the right approach I 
should take in pushing the array protocol/interface is to actually 
propose it being used as a replacement/enhancement of the buffer 
protocol for Python 3.0

I will write a PEP and the implementation but I would like to start a 
discussion about what concerns or issues developers have with a proposal 
like this and try to weed most of these out.

I've started a Wiki page where we can document the issues that are 
raised.  Perhaps this Wiki can become the PEP within a few weeks.

The Wiki is at http://wiki.python.org/moin/ArrayInterface

The basic idea is given below (a copy of the web-page).

Thanks for any and all feedback.

Best regards,

-Travis Oliphant



This pre-PEP proposes enhancing the buffer protocol in Python 3000 to 
implement the array interface (protocol).

= Overview =

The buffer protocol allows different Python types to exchange a pointer 
to a sequence of internal buffers.  This functionality is 
'''extremely''' useful for sharing large segments of memory between 
different high-level objects, but it's too limited and has issues.

   1. There is the little used "sequence-of-segments" option.
   2. There is no way for a consumer to tell the protocol-exporting 
object it is "finished" with its view of the memory and therefore no way 
for the object to be sure that it can reallocate the pointer to the 
memory that it owns (the array object reallocating its memory after 
sharing it with the buffer object led to the infamous buffer-object 
problem).
   3. Memory is just a pointer. There is no way to describe what's "in" 
the memory (float, int, C-structure, etc.)
   4. There is no shape information provided for the memory.  But, 
several array-like Python types could make use of a standard way to 
describe the shape of the memory (!wxPython, GTK, CVXOPT, !PyVox, Audio 
and Video Libraries, ctypes, !NumPy)

= Proposal =

   1. Replace the buffer protocol that allows sharing of a single 
pointer to memory
   2. Have the protocol define a way to describe what's in the memory 
location (this should unify what is done now in struct, array, ctypes, 
and NumPy)
   3. Have the protocol be able to share information about shape (and 
striding if any)
   4. Allow exporting objects to define some function that should be 
called when the consumer object is "done" with the view.

= Idea =

All that is needed is to create a Python "memory_view" object that can 
contain all the information needed and be returned when the buffer 
protocol is called --- when it is garbage-collected, the 
"bp_release_view" function is called on the exporting object.

This "memory_view" is essentially the old Numeric C-structure (including 
the fact that the data-format is described by another C-structure).

This object is what the buffer protocol should return.


From guido at python.org  Sun Feb 25 05:46:11 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 24 Feb 2007 22:46:11 -0600
Subject: [Python-3000] Bytes <-> string conversion methods
In-Reply-To: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>
References: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>
Message-ID: <ca471dc20702242046x536997e1q293a8a37aac695c9@mail.gmail.com>

Yup, that was a typo. Someone else noticed it too. It's fixed in the
version of the slides I'll post to python.org.

--Guido

On 2/24/07, Ka-Ping Yee <python-dev at zesty.ca> wrote:
> Hi Guido,
>
> I'm in your keynote and looking at a slide right now that says
>
>     * bytes has .encode() method returning a string
>     * str has a .decode() method returning bytes
>
> Should the names of those two methods be swapped?  I think it
> makes more sense to say that an encoding is something that
> transforms a string into a sequence of bytes.
>
>
> -- ?!ng
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun Feb 25 05:52:00 2007
From: guido at python.org (Guido van Rossum)
Date: Sat, 24 Feb 2007 22:52:00 -0600
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <erq4je$s8$1@sea.gmane.org>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <erp730$c5d$1@sea.gmane.org>
	<erpcaf$qiq$1@sea.gmane.org>
	<ca471dc20702240625v34511a74t114eac4fe48e756e@mail.gmail.com>
	<erq4je$s8$1@sea.gmane.org>
Message-ID: <ca471dc20702242052r50889661y5cee9d0ac48a9931@mail.gmail.com>

Why not add the literal to 2.6 too?

If that's deemed undesirable, make it bytes(<string literal>), as long
as that actually works when read back.

--Guido

On 2/24/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum schrieb:
> > Georg is channeling me well. Also, my thinking has evolved some after
> > talking to various folks here at PyCon.
> >
> > Georg, please check it in! Feel free to update the PEP if you will.
>
> I will, if you answer me one question: in Python 2.6, should the repr()
> return "bytes(<string literal>)" or still "bytes(<list of ints>)"?
>
> Georg
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Sun Feb 25 22:21:42 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 10:21:42 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <erp3hu$36b$1@sea.gmane.org>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<20070220205651.ADD7.JCARLSON@uci.edu>
	<ca471dc20702202233p4529a2e4t212b000334ab3c83@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org>
Message-ID: <45E1FDE6.6060200@canterbury.ac.nz>

Georg Brandl wrote:

> Seeing that, I made a patch that makes bytes_repr output a bytes literal,

I'm not sure that's a good idea. Any given bytes object
is as likely to have been constructed using bytes(...)
as using b"...". There's no way of being sure whether
displaying it as a string is appropriate or not.

I suppose you could scan it for non-ascii codes or
something, but that seems a bit dwimish.

--
Greg

From thomas at python.org  Sun Feb 25 22:29:09 2007
From: thomas at python.org (Thomas Wouters)
Date: Sun, 25 Feb 2007 13:29:09 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45E1FDE6.6060200@canterbury.ac.nz>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
Message-ID: <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>

I'm not sure what makes you say that. There isn't anyone actually using
bytes() right now, so what makes you think how it's created? Besides, lists
can be created with list("foo") too, but they still repr() as ['f', 'o',
'o'].

On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Georg Brandl wrote:
>
> > Seeing that, I made a patch that makes bytes_repr output a bytes
> literal,
>
> I'm not sure that's a good idea. Any given bytes object
> is as likely to have been constructed using bytes(...)
> as using b"...". There's no way of being sure whether
> displaying it as a string is appropriate or not.
>
> I suppose you could scan it for non-ascii codes or
> something, but that seems a bit dwimish.
>
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/thomas%40python.org
>



-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070225/7ed74f3f/attachment.html 

From greg.ewing at canterbury.ac.nz  Sun Feb 25 22:46:55 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 10:46:55 +1300
Subject: [Python-3000] Pre-PEP: Altering buffer protocol (tp_as_buffer)
In-Reply-To: <erqf99$tq5$1@sea.gmane.org>
References: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>
	<erqf99$tq5$1@sea.gmane.org>
Message-ID: <45E203CF.2030307@canterbury.ac.nz>

Travis Oliphant wrote:

>    2. There is no way for a consumer to tell the protocol-exporting 
> object it is "finished" with its view of the memory and therefore no way 
> for the object to be sure that it can reallocate the pointer to the 
> memory that it owns (the array object reallocating its memory after 
> sharing it with the buffer object led to the infamous buffer-object 
> problem).

I'm not sure I'd categorise this problem that way -- it was
more the buffer object's fault for assuming that it could
hold on to a C pointer to the memory long-term.

I'm a bit worried about having a get/release kind of thing
in the protocol, because it risks forcing all objects which
implement the protocol to provide some kind of refcounting
and locking mechanism for their data. Some objects may not
be able to do that easily or efficiently, especially if
they're wrapping some external library that has no such
notion.

> All that is needed is to create a Python "memory_view" object that can 
> contain all the information needed and be returned when the buffer 
> protocol is called --- when it is garbage-collected, the 
> "bp_release_view" function is called on the exporting object.

That sounds too heavyweight. Getting a memory view through
this protocol should be a very lightweight operation -- ideally
it shouldn't require allocating any memory at all, and it
certainly shouldn't require creating a Python object.

--
Greg

From greg.ewing at canterbury.ac.nz  Sun Feb 25 23:15:20 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 11:15:20 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
	<9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
Message-ID: <45E20A78.60902@canterbury.ac.nz>

Thomas Wouters wrote:
> 
> I'm not sure what makes you say that. There isn't anyone actually using 
> bytes() right now, so what makes you think how it's created?

That's my point -- you *don't* know how any given bytes
object was created, so there's no reason to display it
in anything other than the most generic way.

Another thing is that the idea of displaying a mutable
object in a way that closely resembles a non-mutable
literal makes me uncomfortable. Actually, writing that
sort of literal makes me uncomfortable too, but I'm
not sure what to do about that.

--
Greg

From tdelaney at avaya.com  Sun Feb 25 23:28:20 2007
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Mon, 26 Feb 2007 09:28:20 +1100
Subject: [Python-3000] Thoughts on new I/O library and bytecode
Message-ID: <2773CAC687FD5F4689F526998C7E4E5F07446B@au3010avexu1.global.avaya.com>

Greg Ewing wrote:

> Another thing is that the idea of displaying a mutable
> object in a way that closely resembles a non-mutable
> literal makes me uncomfortable. Actually, writing that
> sort of literal makes me uncomfortable too, but I'm
> not sure what to do about that.

We obviously need another quote character. I think we're going to have
to dip into unicode to get it ;)

Just to get it out of the way - as a totally unfeasible and bad idea ...
why not use double quotes for unicode literals, and single quotes for
byte literals?

"this is a unicode literal"
'this is a byte literal'

There - got it out of my system. Although, now something else is
suggesting that backticks could be repurposed for the job ...

`this is a byte literal`

Tim Delaney

From exarkun at divmod.com  Sun Feb 25 23:28:47 2007
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Sun, 25 Feb 2007 17:28:47 -0500
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45E20A78.60902@canterbury.ac.nz>
Message-ID: <20070225222847.25807.213332275.divmod.quotient.32012@ohm>

On Mon, 26 Feb 2007 11:15:20 +1300, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>Thomas Wouters wrote:
>>
>> I'm not sure what makes you say that. There isn't anyone actually using
>> bytes() right now, so what makes you think how it's created?
>
>That's my point -- you *don't* know how any given bytes
>object was created, so there's no reason to display it
>in anything other than the most generic way.
>
>Another thing is that the idea of displaying a mutable
>object in a way that closely resembles a non-mutable
>literal makes me uncomfortable. Actually, writing that
>sort of literal makes me uncomfortable too, but I'm
>not sure what to do about that.
>

   [1, 2, 3]
   (1, 2, 3)

:)

Jean-Paul

From thomas at python.org  Sun Feb 25 23:29:11 2007
From: thomas at python.org (Thomas Wouters)
Date: Sun, 25 Feb 2007 14:29:11 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45E20A78.60902@canterbury.ac.nz>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
	<9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
	<45E20A78.60902@canterbury.ac.nz>
Message-ID: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com>

I think you're confused. There isn't anything 'less generic' about the bytes
literal. Both bytes([...]) and b"..." can express the full 256 value range.

On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Thomas Wouters wrote:
> >
> > I'm not sure what makes you say that. There isn't anyone actually using
> > bytes() right now, so what makes you think how it's created?
>
> That's my point -- you *don't* know how any given bytes
> object was created, so there's no reason to display it
> in anything other than the most generic way.
>
> Another thing is that the idea of displaying a mutable
> object in a way that closely resembles a non-mutable
> literal makes me uncomfortable. Actually, writing that
> sort of literal makes me uncomfortable too, but I'm
> not sure what to do about that.
>
> --
> Greg
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/thomas%40python.org
>



-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070225/dda372c5/attachment.htm 

From nas at arctrix.com  Sun Feb 25 23:37:09 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Sun, 25 Feb 2007 22:37:09 +0000 (UTC)
Subject: [Python-3000] Thoughts on new I/O library and bytecode
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
	<9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
	<45E20A78.60902@canterbury.ac.nz>
Message-ID: <ert32l$nls$1@sea.gmane.org>

Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> That's my point -- you *don't* know how any given bytes
> object was created, so there's no reason to display it
> in anything other than the most generic way.

Practicality beats purity here, I think.  For example, if I'm
debugging a network protocol, I'd prefer

    b"EHLO ...\x0d\x0a"

over 

   bytes([69, 72, 76, 79, ..., 13, 10])

Cheers,

  Neil


From nas at arctrix.com  Mon Feb 26 00:25:38 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Sun, 25 Feb 2007 23:25:38 +0000 (UTC)
Subject: [Python-3000] Weird error message from bytes type
Message-ID: <ert5ti$3p6$1@sea.gmane.org>

>>> x = b'a'
>>> x[0] = b'a'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'bytes' object cannot be interpreted as an index

Huh?  0 is not a 'bytes' object and I don't see how the RHS is being
used as an index.  Obviously I wanted something like:

>>> x[0] = ord(b'a')


From guido at python.org  Mon Feb 26 00:26:36 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Feb 2007 17:26:36 -0600
Subject: [Python-3000] Pre-PEP: Altering buffer protocol (tp_as_buffer)
In-Reply-To: <45E203CF.2030307@canterbury.ac.nz>
References: <Pine.LNX.4.58.0702241305470.6744@server1.LFW.org>
	<erqf99$tq5$1@sea.gmane.org> <45E203CF.2030307@canterbury.ac.nz>
Message-ID: <ca471dc20702251526g39ce377bsa9261197077a063e@mail.gmail.com>

On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Travis Oliphant wrote:
>
> >    2. There is no way for a consumer to tell the protocol-exporting
> > object it is "finished" with its view of the memory and therefore no way
> > for the object to be sure that it can reallocate the pointer to the
> > memory that it owns (the array object reallocating its memory after
> > sharing it with the buffer object led to the infamous buffer-object
> > problem).
>
> I'm not sure I'd categorise this problem that way -- it was
> more the buffer object's fault for assuming that it could
> hold on to a C pointer to the memory long-term.
>
> I'm a bit worried about having a get/release kind of thing
> in the protocol, because it risks forcing all objects which
> implement the protocol to provide some kind of refcounting
> and locking mechanism for their data. Some objects may not
> be able to do that easily or efficiently, especially if
> they're wrapping some external library that has no such
> notion.

Only if their buffer can actually move; if the buffer can't be moved
or resized once the object is created, the acquire and release can be
no-ops.

Another problem that would be solved by this is the current unsafety
of blocking I/O operations like file.readinto() and
socket.recv_into(). These operations do roughly the following:

(1) get the pointer and length from the buffer API
(2) release the GIL
(3) call the blocking read() or recv() system call with the pointer and length
(4) reacquire the GIL

The problem is that while the GIL is released, another thread with
access to the object whose buffer is being read into, could modify it
causing the buffer to be moved in memory, and the read() or recv()
operation will be overwriting freed memory (or worse, memory allocated
for a different purpose).

I realized this thinking about the 3.0 bytes object, but the 2.x array
object has the same problems, and probably every other object that
uses the buffer API and has a mutable size (if there are any).

> > All that is needed is to create a Python "memory_view" object that can
> > contain all the information needed and be returned when the buffer
> > protocol is called --- when it is garbage-collected, the
> > "bp_release_view" function is called on the exporting object.
>
> That sounds too heavyweight. Getting a memory view through
> this protocol should be a very lightweight operation -- ideally
> it shouldn't require allocating any memory at all, and it
> certainly shouldn't require creating a Python object.

I agree that getting the pointer and length should be separated from
finding out how the bytes should be interpreted. I'd like to propose a
simple stack or hierarchy of classes to address (what I think are)
Travis's needs:

- At the bottom is a redesigned buffer API: add locking, remove
segcount and char buffers.

- This API is implemented by things like mmap, and also by a "raw
bytes" object which allocates a buffer from the heap; other libraries
may have their own objects that implement this (e.g. numpy, PIL).

- There is a mixin class (at least conceptually it's a mixin) which
takes anything implementing the redesigned buffer API and adds the
bytes API (see recently updated PEP 358); operations like .strip() or
slicing should return copies (of the same or a different type) or
views at the discretion of the underlying object. (Maybe there should
be a read-only and read-write version of this; note that read-only is
not the same as immutable, since the underlying buffer may be modified
by other APIs, if it allows this.)

- *Another* API built on top of the redesigned buffer API would be
something more aligned with numpy's needs, adding (a) a shape
descriptor indicating the size, offset and stride of each dimension,
and (b) a record descriptor indicating the interpretation of one
element of the array. For (a), a list of 3-tuples of ints would
probably be sufficient (constrained so that no valid combination of
indexes points outside the buffer); for (b), I propose (with Jim
Hugunin who first suggested this at PyCon) to use the same concise but
expressing format-string-like notation used by the struct module. (The
bytes API is not quite a special case of this, since it provides more
string-like operations.)

The crucial idea here (like so often :-) is not to use inheritance but
composition. This means that we can separate management of the buffer
(e.g. malloc, mmap, whatever) from providing APIs on top of this
(either the bytes API or the multi-dimensional array API).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From thomas at python.org  Mon Feb 26 00:35:19 2007
From: thomas at python.org (Thomas Wouters)
Date: Sun, 25 Feb 2007 15:35:19 -0800
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <ert5ti$3p6$1@sea.gmane.org>
References: <ert5ti$3p6$1@sea.gmane.org>
Message-ID: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>

This is because a bytes object is not a sequence of bytes objects, like
strings. It's a sequence of small integer values, so you need to assign a
small integer value to it. You can assign b'a'[0] to it, or assign b'a' to
x[:1]. I guess we could specialcase length-1 bytes to make this work
'naturally', but I'm not sure that's the right approach. Guido?

On 2/25/07, Neil Schemenauer <nas at arctrix.com> wrote:
>
> >>> x = b'a'
> >>> x[0] = b'a'
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: 'bytes' object cannot be interpreted as an index
>
> Huh?  0 is not a 'bytes' object and I don't see how the RHS is being
> used as an index.  Obviously I wanted something like:
>
> >>> x[0] = ord(b'a')
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/thomas%40python.org
>



-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070225/244ef688/attachment.htm 

From guido at python.org  Mon Feb 26 00:40:12 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Feb 2007 17:40:12 -0600
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
Message-ID: <ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>

Thomas is correct. You can only assign ints in range(256) to a single
index. This would work:

x[:1] = b"a"

The error comes from the call to PyNumber_AsSsize_t() in
bytes_setitem(), which apparently looks for __index__ or the tp_index
slot.

--Guido

On 2/25/07, Thomas Wouters <thomas at python.org> wrote:
>
> This is because a bytes object is not a sequence of bytes objects, like
> strings. It's a sequence of small integer values, so you need to assign a
> small integer value to it. You can assign b'a'[0] to it, or assign b'a' to
> x[:1]. I guess we could specialcase length-1 bytes to make this work
> 'naturally', but I'm not sure that's the right approach. Guido?
>
>
> On 2/25/07, Neil Schemenauer <nas at arctrix.com> wrote:
> >
> > >>> x = b'a'
> > >>> x[0] = b'a'
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > TypeError: 'bytes' object cannot be interpreted as an index
> >
> > Huh?  0 is not a 'bytes' object and I don't see how the RHS is being
> > used as an index.  Obviously I wanted something like:
> >
> > >>> x[0] = ord(b'a')

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Mon Feb 26 00:52:30 2007
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 26 Feb 2007 00:52:30 +0100
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
Message-ID: <ert7fu$81j$1@sea.gmane.org>

Thomas Wouters schrieb:
> 
> This is because a bytes object is not a sequence of bytes objects, like 
> strings. It's a sequence of small integer values, so you need to assign 
> a small integer value to it. You can assign b'a'[0] to it, or assign 
> b'a' to x[:1]. I guess we could specialcase length-1 bytes to make this 
> work 'naturally', but I'm not sure that's the right approach. Guido?

If it is deemed right, see attached patch.

BTW, is it intentional that the setitem/setslice code is duplicated in
bytesobject.c?

Georg
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bytes-ass.diff
Url: http://mail.python.org/pipermail/python-3000/attachments/20070226/faef94aa/attachment.diff 

From guido at python.org  Mon Feb 26 00:56:42 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Feb 2007 17:56:42 -0600
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <ert7fu$81j$1@sea.gmane.org>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ert7fu$81j$1@sea.gmane.org>
Message-ID: <ca471dc20702251556q68fbc5d7xb67c11944d3763cc@mail.gmail.com>

No, I don't want length-1-bytes to get special treatment here. That
would just perpetuate confusion, since b[0] *returns* an int no matter
what you might have set it to.

--Guido

On 2/25/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Thomas Wouters schrieb:
> >
> > This is because a bytes object is not a sequence of bytes objects, like
> > strings. It's a sequence of small integer values, so you need to assign
> > a small integer value to it. You can assign b'a'[0] to it, or assign
> > b'a' to x[:1]. I guess we could specialcase length-1 bytes to make this
> > work 'naturally', but I'm not sure that's the right approach. Guido?
>
> If it is deemed right, see attached patch.
>
> BTW, is it intentional that the setitem/setslice code is duplicated in
> bytesobject.c?
>
> Georg
>
> Index: Objects/bytesobject.c
> ===================================================================
> --- Objects/bytesobject.c       (Revision 53912)
> +++ Objects/bytesobject.c       (Arbeitskopie)
> @@ -451,7 +451,18 @@
>              slicelen = 1;
>          }
>          else {
> -            Py_ssize_t ival = PyNumber_AsSsize_t(values, PyExc_ValueError);
> +            Py_ssize_t ival;
> +            /* if the value is a length-one bytes object, assign it */
> +            if (PyBytes_Check(values)) {
> +                if (PyBytes_GET_SIZE(values) != 1) {
> +                    PyErr_SetString(PyExc_ValueError, "cannot assign bytes "
> +                                    "object of length != 1");
> +                    return -1;
> +                }
> +                self->ob_bytes[i] = ((PyBytesObject *)values)->ob_bytes[0];
> +                return 0;
> +            }
> +            ival = PyNumber_AsSsize_t(values, PyExc_ValueError);
>              if (ival == -1 && PyErr_Occurred())
>                  return -1;
>              if (ival < 0 || ival >= 256) {
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nas at arctrix.com  Mon Feb 26 01:29:31 2007
From: nas at arctrix.com (Neil Schemenauer)
Date: Sun, 25 Feb 2007 18:29:31 -0600
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
Message-ID: <20070226002930.GB3067@python.ca>

On Sun, Feb 25, 2007 at 05:40:12PM -0600, Guido van Rossum wrote:
> Thomas is correct. You can only assign ints in range(256) to a single
> index.

Yes, I understand that.  I think the error message is bad though.

> The error comes from the call to PyNumber_AsSsize_t() in
> bytes_setitem(), which apparently looks for __index__ or the tp_index
> slot.

I think PyNumber_AsSsize_t is being used on the RHS operand.  That's
perhaps convenient but makes for a confusing message.  There was
nothing wrong with the value I was using for an index.

  Neil

From guido at python.org  Mon Feb 26 01:39:16 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Feb 2007 18:39:16 -0600
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <20070226002930.GB3067@python.ca>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
	<20070226002930.GB3067@python.ca>
Message-ID: <ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>

Correct (I wasn't saying it was used on the lhs operand :-).

I find it important to use that API since anything that wants to
behave like a (small) int should be acceptable. Can you suggest a
better way to formulate the error from that API?

--Guido

On 2/25/07, Neil Schemenauer <nas at arctrix.com> wrote:
> On Sun, Feb 25, 2007 at 05:40:12PM -0600, Guido van Rossum wrote:
> > Thomas is correct. You can only assign ints in range(256) to a single
> > index.
>
> Yes, I understand that.  I think the error message is bad though.
>
> > The error comes from the call to PyNumber_AsSsize_t() in
> > bytes_setitem(), which apparently looks for __index__ or the tp_index
> > slot.
>
> I think PyNumber_AsSsize_t is being used on the RHS operand.  That's
> perhaps convenient but makes for a confusing message.  There was
> nothing wrong with the value I was using for an index.
>
>   Neil
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Mon Feb 26 04:04:22 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 16:04:22 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <20070225222847.25807.213332275.divmod.quotient.32012@ohm>
References: <20070225222847.25807.213332275.divmod.quotient.32012@ohm>
Message-ID: <45E24E36.7030005@canterbury.ac.nz>

Jean-Paul Calderone wrote:
> > Actually, writing that
> > sort of literal makes me uncomfortable too, but I'm
> > not sure what to do about that.
>
>    [1, 2, 3]
>    (1, 2, 3)

Not quite sure what your point is. My point is that
I'm thoroughly conditioned to think of anything in
quotes as immutable, and it will take a while to
get out of that habit, I suspect.

Also I'm a little worried about the pedagogical
implications of teaching people that x"..." is
a unicode string for all values of x *except* b,
whereupon it's not unicode and isn't even a string.

I'm wondering whether it would be better to have
the compiler recognise bytes("...") and special
case it. At least it *looks* like a constructor
call then, which is what b"..." would actually
be.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Feb 26 04:12:37 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 16:12:37 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
	<9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
	<45E20A78.60902@canterbury.ac.nz>
	<9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com>
Message-ID: <45E25025.6020107@canterbury.ac.nz>

Thomas Wouters wrote:
> 
> I think you're confused. There isn't anything 'less generic' about the 
> bytes literal. Both bytes([...]) and b"..." can express the full 256 
> value range.

Yes, but it only makes sense to try to display it as
characters if it's meant to represent characters in
the first place. Otherwise you get something that
looks like line noise.

BTW, I don't really think that bytes([104, 101, 108,
108, 111]) is the right way to display it either.
There ought to be some kind of compact hex format.
Maybe something like

    $[68656C6C6F]

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Feb 26 04:19:47 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 16:19:47 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <ert32l$nls$1@sea.gmane.org>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<fb6fbf560702211003o213f7b95p6fff0a32a38c819e@mail.gmail.com>
	<45DCE203.2010804@canterbury.ac.nz>
	<ca471dc20702221601s703a9fe4i8fc69810b8fb8da6@mail.gmail.com>
	<bb8868b90702230829k11f2a589w3f3b6829b38ca602@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
	<9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
	<45E20A78.60902@canterbury.ac.nz> <ert32l$nls$1@sea.gmane.org>
Message-ID: <45E251D3.7050601@canterbury.ac.nz>

Neil Schemenauer wrote:

> Practicality beats purity here, I think.  For example, if I'm
> debugging a network protocol, I'd prefer
> 
>     b"EHLO ...\x0d\x0a"

But what if I'm *not* debugging a network protocol,
and my bytes objects all look like random gibberish
when displayed as characters?

To put it another way: If bytes objects are displayed
in hex by default (see previous post), I can easily get
them displayed as characters if that's what I want using
str(b, suitable_encoding).

But if they're displayed as characters by default,
what do I do to get them displayed as not-characters?

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Feb 26 04:42:46 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 26 Feb 2007 16:42:46 +1300
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
	<20070226002930.GB3067@python.ca>
	<ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>
Message-ID: <45E25736.2010404@canterbury.ac.nz>

Guido van Rossum wrote:

> I find it important to use that API since anything that wants to
> behave like a (small) int should be acceptable.

But by calling __index__ and giving error messages about
indexes, PyInt_AsSsize_t seems to be assuming that the
value is going to be used as an index.

If that's the true purpose of PyInt_AsSsize_t, then it
shouldn't be getting called in this situation.

If it's not, then it shouldn't be giving error messages
that talk about indexes, and there should be another
API such as PyObject_AsIndex for values that really
are going to be used as indexes.

--
Greg

From jcarlson at uci.edu  Mon Feb 26 04:58:39 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 25 Feb 2007 19:58:39 -0800
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45E25025.6020107@canterbury.ac.nz>
References: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com>
	<45E25025.6020107@canterbury.ac.nz>
Message-ID: <20070225194742.AE4B.JCARLSON@uci.edu>


Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
> Thomas Wouters wrote:
> > 
> > I think you're confused. There isn't anything 'less generic' about the 
> > bytes literal. Both bytes([...]) and b"..." can express the full 256 
> > value range.
> 
> Yes, but it only makes sense to try to display it as
> characters if it's meant to represent characters in
> the first place. Otherwise you get something that
> looks like line noise.
> 
> BTW, I don't really think that bytes([104, 101, 108,
> 108, 111]) is the right way to display it either.
> There ought to be some kind of compact hex format.
> Maybe something like
> 
>     $[68656C6C6F]

I think it's a bad idea to choose a representation with any format that
isn't able to do the eval(repr(obj)) loop.  I'm not a fan of 'bytes([101,
108, ...])', nor do I like 'bytes([0xd7, 0x19, ...])'. 'bytes(b"stuff")'
is a bit redundant, but it would get the point across. I'm not sure I
*like* b"stuff", but I don't loathe it like I do the other two that are
passed lists.  Maybe 'bytes("stuff", "latin-1")', but then it is
underlying platform and/or file encoding sensitive.  It may be the case
that b"stuff" is the most concise and reasonable repr form...


 - Josiah


From guido at python.org  Mon Feb 26 06:36:39 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Feb 2007 23:36:39 -0600
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <45E251D3.7050601@canterbury.ac.nz>
References: <ca471dc20702201944x7dd5f899nd0cf0c6169d00a69@mail.gmail.com>
	<45DF1AFA.2090909@latte.ca> <45DF7B5A.7080503@canterbury.ac.nz>
	<9e804ac0702231609t3f463c60xa8bee6e736c3bb@mail.gmail.com>
	<erp3hu$36b$1@sea.gmane.org> <45E1FDE6.6060200@canterbury.ac.nz>
	<9e804ac0702251329k216d69d4odb96a8feee53250f@mail.gmail.com>
	<45E20A78.60902@canterbury.ac.nz> <ert32l$nls$1@sea.gmane.org>
	<45E251D3.7050601@canterbury.ac.nz>
Message-ID: <ca471dc20702252136ld80621dqad34728895dbf720@mail.gmail.com>

On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> But if they're displayed as characters by default,
> what do I do to get them displayed as not-characters?

Well anything that's not an ASCII printable is \x escaped anyway. If
you want all hex, use the .hex() method described in the PEP.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Feb 26 06:38:26 2007
From: guido at python.org (Guido van Rossum)
Date: Sun, 25 Feb 2007 23:38:26 -0600
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <45E25736.2010404@canterbury.ac.nz>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
	<20070226002930.GB3067@python.ca>
	<ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>
	<45E25736.2010404@canterbury.ac.nz>
Message-ID: <ca471dc20702252138oce44b50vf83e1ce146cc447c@mail.gmail.com>

Please give the poor function a break. It was added to 2.5 and used
only for indexing there. In 3.0 it is more generally useful (I want to
use it whenever an int is needed). but in our pre-alpha code the error
messages haven't been fixed yet. That's the whole story.

On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > I find it important to use that API since anything that wants to
> > behave like a (small) int should be acceptable.
>
> But by calling __index__ and giving error messages about
> indexes, PyInt_AsSsize_t seems to be assuming that the
> value is going to be used as an index.
>
> If that's the true purpose of PyInt_AsSsize_t, then it
> shouldn't be getting called in this situation.
>
> If it's not, then it shouldn't be giving error messages
> that talk about indexes, and there should be another
> API such as PyObject_AsIndex for values that really
> are going to be used as indexes.
>
> --
> Greg
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Mon Feb 26 11:10:36 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 26 Feb 2007 20:10:36 +1000
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <ca471dc20702252138oce44b50vf83e1ce146cc447c@mail.gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>	<20070226002930.GB3067@python.ca>	<ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>	<45E25736.2010404@canterbury.ac.nz>
	<ca471dc20702252138oce44b50vf83e1ce146cc447c@mail.gmail.com>
Message-ID: <45E2B21C.1030101@gmail.com>

Guido van Rossum wrote:
> Please give the poor function a break. It was added to 2.5 and used
> only for indexing there. In 3.0 it is more generally useful (I want to
> use it whenever an int is needed). but in our pre-alpha code the error
> messages haven't been fixed yet. That's the whole story.

A couple of locations in the 2.5 standard library actually had to deal 
with the same problem. They currently make their own calls to 
PyIndex_Check() in order to override the default error message. This 
happens in:
   sequence_repeat (abstract.c)
   _GetMapSize (mmapmodule.c)

slice_indices (sliceobject.c) also uses PyNumber_AsSsize_t to check the 
length argument that is passed in, but it just allows the PyNumber_Index 
error message to flow through:

.>>> slice(2).indices('1')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an index

> On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> But by calling __index__ and giving error messages about
>> indexes, PyInt_AsSsize_t seems to be assuming that the
>> value is going to be used as an index.
>>
>> If that's the true purpose of PyInt_AsSsize_t, then it
>> shouldn't be getting called in this situation.
>>
>> If it's not, then it shouldn't be giving error messages
>> that talk about indexes, and there should be another
>> API such as PyObject_AsIndex for values that really
>> are going to be used as indexes.

Generating a different error message when passing invalid types to 
PyNumber_AsSsize_t (as opposed to PyNumber_Index) wasn't particularly 
high on the to-do list when we were trying to fix the __index__() 
clipping bugs for the 2.5 release - the exception raised by the eventual 
implementation was of the correct type, even if the message wasn't 
perfect. Further complicating a C API that was already somewhat complex 
(a type checking function, plus two different conversion functions, one 
with an extra argument relating to overflow handling) didn't seem to be 
a desirable thing to do.

With additional usage in non-index contexts (and a bit more time to do 
the work!), then it probably makes sense to modify PyNumber_Index and 
PyNumber_AsSsize_t to call a common static function which allows them to 
specify different format strings for the type error.

However, given that the error message has to make sense even when the 
object involved is a float, finding appropriate wording that doesn't 
mention the __index__ slot is somewhat challenging. For example, simply 
replacing 'index' with 'integer' could lead to a different kind of 
confusion:
   TypeError: 'float' object cannot be interpreted as an integer

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From oliphant.travis at ieee.org  Mon Feb 26 19:43:26 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Mon, 26 Feb 2007 11:43:26 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
Message-ID: <erv9or$53s$1@sea.gmane.org>


It was so nice to see many of you at PyCon this year.  The event was 
very well handled and congradulations are deserved all around.

I brought up the idea of the array interface several times.  After I 
heard Guido's keynote and saw the scheduled time-lines, I realized that 
my approach should be to push for the array interface into Python 3.0 as 
an enhancement/adaptation of the buffer protocol (which I have not heard 
or seen much discussion about).

Later we can back-port the result to Python 2.6.

To encourage a useful discussion, I've started a Wiki that describes the 
idea behind my proposal and placed it at:

http://wiki.python.org/moin/ArrayInterface

The basic idea is to define a memory-view object which is returned by 
the buffer-protocol and contains not just a pointer to the memory but 
also shape, stride, and data-format information.

It would be nice if there were also some additions to the Python C-API 
to make it easy to work with this memory-view object, but I don't 
envision needing to make this object available to Python directly.

I'm willing to work on this buffer protocol and maintain it as well in 
the future.

Thanks for any comments and/or feedback.

Best regards,

-Travis







From oliphant.travis at ieee.org  Mon Feb 26 19:46:42 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Mon, 26 Feb 2007 11:46:42 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <erv9or$53s$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org>
Message-ID: <erv9ui$53s$2@sea.gmane.org>



I'm sorry for creating two threads on the buffer protocol.  I didn't see 
the first one because I mistakenly put it as a reply to another thread.

This one is independent and should be more helpful as a discussion 
place-holder.

-Travis


From oliphant.travis at ieee.org  Mon Feb 26 19:53:35 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Mon, 26 Feb 2007 11:53:35 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <erv9or$53s$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org>
Message-ID: <ervabi$822$1@sea.gmane.org>


> 
>>   2. There is no way for a consumer to tell the protocol-exporting 
>>object it is "finished" with its view of the memory and therefore no way 
>>for the object to be sure that it can reallocate the pointer to the 
>>memory that it owns (the array object reallocating its memory after 
>>sharing it with the buffer object led to the infamous buffer-object 
>>problem).
> 
>
> I'm a bit worried about having a get/release kind of thing
> in the protocol, because it risks forcing all objects which
> implement the protocol to provide some kind of refcounting
> and locking mechanism for their data. Some objects may not
> be able to do that easily or efficiently, especially if
> they're wrapping some external library that has no such
> notion.

If they can't do it easily, then they don't have to define the 
release-function and Python will never call it.

> 
> 
>>All that is needed is to create a Python "memory_view" object that can 
>>contain all the information needed and be returned when the buffer 
>>protocol is called --- when it is garbage-collected, the 
>>"bp_release_view" function is called on the exporting object.
> 
> 
> That sounds too heavyweight. Getting a memory view through
> this protocol should be a very lightweight operation -- ideally
> it shouldn't require allocating any memory at all, and it
> certainly shouldn't require creating a Python object.

If you want shape information you are going to have to allocate memory. 
   If you are going to do that you might as well return a Python object 
so you can manage this memory easily.

If you don't want or need shape or detailed type information, I could 
also see and have no objection to keeping a lightweight version of the 
protocol that only returns simple integers.

I'll put that in the PEP.

-Travis


From oliphant.travis at ieee.org  Mon Feb 26 20:24:39 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Mon, 26 Feb 2007 12:24:39 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <erv9or$53s$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org>
Message-ID: <ervc5o$fff$1@sea.gmane.org>

Guido van Rossum wrote:
> On 2/25/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
>>Travis Oliphant wrote:
>>
>>
>>>   2. There is no way for a consumer to tell the protocol-exporting
>>>object it is "finished" with its view of the memory and therefore no way
>>>for the object to be sure that it can reallocate the pointer to the
>>>memory that it owns (the array object reallocating its memory after
>>>sharing it with the buffer object led to the infamous buffer-object
>>>problem).
>>
>
> Another problem that would be solved by this is the current unsafety
> of blocking I/O operations like file.readinto() and
> socket.recv_into(). These operations do roughly the following:
> 
> (1) get the pointer and length from the buffer API
> (2) release the GIL
> (3) call the blocking read() or recv() system call with the pointer and length
> (4) reacquire the GIL
> 
> The problem is that while the GIL is released, another thread with
> access to the object whose buffer is being read into, could modify it
> causing the buffer to be moved in memory, and the read() or recv()
> operation will be overwriting freed memory (or worse, memory allocated
> for a different purpose).
> 
> I realized this thinking about the 3.0 bytes object, but the 2.x array
> object has the same problems, and probably every other object that
> uses the buffer API and has a mutable size (if there are any).

Yes, the NumPy object has this problem as well (although it has *very* 
conservative checks so that if the reference count on the array is not 
1, memory is not reallocated).

> 
> I agree that getting the pointer and length should be separated from
> finding out how the bytes should be interpreted. I'd like to propose a
> simple stack or hierarchy of classes to address (what I think are)
> Travis's needs:
> 
> - At the bottom is a redesigned buffer API: add locking, remove
> segcount and char buffers.

Great.  I have no problem with this.  Is your idea of locking the same 
as mine (i.e. a function in the API for release?)

> 
> - There is a mixin class (at least conceptually it's a mixin) which
> takes anything implementing the redesigned buffer API and adds the
> bytes API (see recently updated PEP 358); operations like .strip() or
> slicing should return copies (of the same or a different type) or
> views at the discretion of the underlying object. (Maybe there should
> be a read-only and read-write version of this; note that read-only is
> not the same as immutable, since the underlying buffer may be modified
> by other APIs, if it allows this.)

I'm not sure what this mixin class is.  Is this a base class for the 
bytes object?   I need to understand this better in order to write a PEP.

> 
> - *Another* API built on top of the redesigned buffer API would be
> something more aligned with numpy's needs, adding (a) a shape
> descriptor indicating the size, offset and stride of each dimension,
> and (b) a record descriptor indicating the interpretation of one
> element of the array. For (a), a list of 3-tuples of ints would
> probably be sufficient (constrained so that no valid combination of
> indexes points outside the buffer); for (b), I propose (with Jim
> Hugunin who first suggested this at PyCon) to use the same concise but
> expressing format-string-like notation used by the struct module. (The
> bytes API is not quite a special case of this, since it provides more
> string-like operations.)
> 

Great.  NumPy has already adopted the struct standard for it's "hidden" 
character codes.

We also need to add some format codes for complex-data ('F','D','G') and 
for long doubles ('g').    I would also propose that we make an 
enumeration in Python so we can refer to these codes in C/C++ as constants:

PYFORMAT_LONG
PYFORMAT_UINT

etc.


a) I would prefer a 3-tuple of lists for the shape descriptor
(shape list, stride list, offset list)

That way default striding could be given as None and there would not 
have to be any offset as well.

My view on the offset is that it is not necessary as the start of the 
array is already given by the memory pointer.  But, if others see a 
strong need for it, I have no problem with including it.


b) I'm also fine with just returning a string for the record descriptor 
like the struct module uses.


-Travis









> The crucial idea here (like so often :-) is not to use inheritance but
> composition. This means that we can separate management of the buffer
> (e.g. malloc, mmap, whatever) from providing APIs on top of this
> (either the bytes API or the multi-dimensional array API).
> 


From guido at python.org  Mon Feb 26 21:28:47 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Feb 2007 14:28:47 -0600
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <ervc5o$fff$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org> <ervc5o$fff$1@sea.gmane.org>
Message-ID: <ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>

On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> Guido van Rossum wrote:
> > I realized this thinking about the 3.0 bytes object, but the 2.x array
> > object has the same problems, and probably every other object that
> > uses the buffer API and has a mutable size (if there are any).
>
> Yes, the NumPy object has this problem as well (although it has *very*
> conservative checks so that if the reference count on the array is not
> 1, memory is not reallocated).

That would be *too* conservative for me -- just passing it as an
argument to another function increfs it (for the duration of the
call).

> > I agree that getting the pointer and length should be separated from
> > finding out how the bytes should be interpreted. I'd like to propose a
> > simple stack or hierarchy of classes to address (what I think are)
> > Travis's needs:
> >
> > - At the bottom is a redesigned buffer API: add locking, remove
> > segcount and char buffers.
>
> Great.  I have no problem with this.  Is your idea of locking the same
> as mine (i.e. a function in the API for release?)

Right.

> > - There is a mixin class (at least conceptually it's a mixin) which
> > takes anything implementing the redesigned buffer API and adds the
> > bytes API (see recently updated PEP 358); operations like .strip() or
> > slicing should return copies (of the same or a different type) or
> > views at the discretion of the underlying object. (Maybe there should
> > be a read-only and read-write version of this; note that read-only is
> > not the same as immutable, since the underlying buffer may be modified
> > by other APIs, if it allows this.)
>
> I'm not sure what this mixin class is.  Is this a base class for the
> bytes object?   I need to understand this better in order to write a PEP.

Yes, that's a good way to describe it.

> > - *Another* API built on top of the redesigned buffer API would be
> > something more aligned with numpy's needs, adding (a) a shape
> > descriptor indicating the size, offset and stride of each dimension,
> > and (b) a record descriptor indicating the interpretation of one
> > element of the array. For (a), a list of 3-tuples of ints would
> > probably be sufficient (constrained so that no valid combination of
> > indexes points outside the buffer); for (b), I propose (with Jim
> > Hugunin who first suggested this at PyCon) to use the same concise but
> > expressing format-string-like notation used by the struct module. (The
> > bytes API is not quite a special case of this, since it provides more
> > string-like operations.)
>
> Great.  NumPy has already adopted the struct standard for it's "hidden"
> character codes.

Glad to get agreement.

> We also need to add some format codes for complex-data ('F','D','G') and
> for long doubles ('g').

No problem. Just make this  a separate section in your PEP ("proposed
additions for the struct module").

> I would also propose that we make an
> enumeration in Python so we can refer to these codes in C/C++ as constants:
>
> PYFORMAT_LONG
> PYFORMAT_UINT
>
> etc.

Not sure I follow but sounds fine; hopefully the PEP draft will clarify this.

> a) I would prefer a 3-tuple of lists for the shape descriptor
> (shape list, stride list, offset list)
>
> That way default striding could be given as None and there would not
> have to be any offset as well.

Of course. I don't know much about the traditional way of representing
MD array structure.

> My view on the offset is that it is not necessary as the start of the
> array is already given by the memory pointer.  But, if others see a
> strong need for it, I have no problem with including it.

Well don't you end up with an offset as soon as you take a rectangular
slice out of a 2d array?

> b) I'm also fine with just returning a string for the record descriptor
> like the struct module uses.

Excellent. Are we all set then?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant.travis at ieee.org  Mon Feb 26 21:37:32 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Mon, 26 Feb 2007 13:37:32 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
References: <erv9or$53s$1@sea.gmane.org> <ervc5o$fff$1@sea.gmane.org>
	<ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
Message-ID: <ervged$mh$1@sea.gmane.org>

Guido van Rossum wrote:
> On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> 
>>Guido van Rossum wrote:
>>
>>>I realized this thinking about the 3.0 bytes object, but the 2.x array
>>>object has the same problems, and probably every other object that
>>>uses the buffer API and has a mutable size (if there are any).
>>
>>Yes, the NumPy object has this problem as well (although it has *very*
>>conservative checks so that if the reference count on the array is not
>>1, memory is not reallocated).
> 
> 
> That would be *too* conservative for me -- just passing it as an
> argument to another function increfs it (for the duration of the
> call).
> 

It's too conservative for us to.  We just don't see anyway around it 
without the locking mechanism (right now you can over-ride the ref-count 
checking if you know what you are doing).

>>
>>I'm not sure what this mixin class is.  Is this a base class for the
>>bytes object?   I need to understand this better in order to write a PEP.
> 
> 
> Yes, that's a good way to describe it.
> 
> 
>>>- *Another* API built on top of the redesigned buffer API would be
>>>something more aligned with numpy's needs, adding (a) a shape
>>>descriptor indicating the size, offset and stride of each dimension,
>>>and (b) a record descriptor indicating the interpretation of one
>>>element of the array. For (a), a list of 3-tuples of ints would
>>>probably be sufficient (constrained so that no valid combination of
>>>indexes points outside the buffer); for (b), I propose (with Jim
>>>Hugunin who first suggested this at PyCon) to use the same concise but
>>>expressing format-string-like notation used by the struct module. (The
>>>bytes API is not quite a special case of this, since it provides more
>>>string-like operations.)
>>
>>Great.  NumPy has already adopted the struct standard for it's "hidden"
>>character codes.
> 
> 
> Glad to get agreement.
> 
> 
>>We also need to add some format codes for complex-data ('F','D','G') and
>>for long doubles ('g').
> 
> 
> No problem. Just make this  a separate section in your PEP ("proposed
> additions for the struct module").
> 

O.K. great.


> 
>>I would also propose that we make an
>>enumeration in Python so we can refer to these codes in C/C++ as constants:
>>
>>PYFORMAT_LONG
>>PYFORMAT_UINT
>>
>>etc.
> 
> 
> Not sure I follow but sounds fine; hopefully the PEP draft will clarify this.
> 

This is just some header magic (either defines or an enum statement so 
you don't have to remember character codes in C/C++).

> 
>>a) I would prefer a 3-tuple of lists for the shape descriptor
>>(shape list, stride list, offset list)
>>
>>That way default striding could be given as None and there would not
>>have to be any offset as well.
> 
> 
> Of course. I don't know much about the traditional way of representing
> MD array structure.
> 
> 
>>My view on the offset is that it is not necessary as the start of the
>>array is already given by the memory pointer.  But, if others see a
>>strong need for it, I have no problem with including it.
> 
> 
> Well don't you end up with an offset as soon as you take a rectangular
> slice out of a 2d array?

You can either 1) keep the same base memory pointer and create an offset 
list, or 2) have no offset and change the starting memory pointer.

NumPy uses option 2 (it stores the starting point of the array).


> 
> 
>>b) I'm also fine with just returning a string for the record descriptor
>>like the struct module uses.
> 
> 
> Excellent. Are we all set then?

I think so.  I have some additional ideas about the string format 
description that I will explain in the PEP.   The draft is coming along at

http://wiki.python.org/moin/ArrayInterface


Feel free to make changes there.

-Travis






From oliphant.travis at ieee.org  Mon Feb 26 21:44:01 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Mon, 26 Feb 2007 13:44:01 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
References: <erv9or$53s$1@sea.gmane.org> <ervc5o$fff$1@sea.gmane.org>
	<ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
Message-ID: <ervgqh$1b3$1@sea.gmane.org>

Guido van Rossum wrote:
> On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> 
>>Guido van Rossum wrote:
>>
> 
> 
> Excellent. Are we all set then?
> 

One more question?  What is the reason for separate read/write getbuffer 
calls.  What is the problem with just one getbuffer call with a flag to 
indicate whether or not you want a writeable memory area?

I prefer fewer function pointers because it means that extension types 
must implement fewer functions.  But, either way.

I know there is some stylistic distaste for "flags" in APIs.  One could 
still keep two C-API calls for getting read-only and writeable buffers.


-Travis



From guido at python.org  Mon Feb 26 22:12:47 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Feb 2007 15:12:47 -0600
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <ervgqh$1b3$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org> <ervc5o$fff$1@sea.gmane.org>
	<ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
	<ervgqh$1b3$1@sea.gmane.org>
Message-ID: <ca471dc20702261312t766aaa7an68b0ad186f3f56ae@mail.gmail.com>

On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> One more question?  What is the reason for separate read/write getbuffer
> calls.  What is the problem with just one getbuffer call with a flag to
> indicate whether or not you want a writeable memory area?

I'm not sure; that API grew somewhat organically. I guess having
separate functions makes it possible to test whether the buffer is
writable at all, but IMO checking for an error is just as expedient,
so as long as we're redesigning the whole API you can design whatever
you want.

> I prefer fewer function pointers because it means that extension types
> must implement fewer functions.  But, either way.

Right.

> I know there is some stylistic distaste for "flags" in APIs.

That's more a Python-level preference.

> One could still keep two C-API calls for getting read-only and writeable buffers.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mike.verdone at gmail.com  Mon Feb 26 22:35:54 2007
From: mike.verdone at gmail.com (Mike Verdone)
Date: Mon, 26 Feb 2007 15:35:54 -0600
Subject: [Python-3000] Draft PEP for New IO system
Message-ID: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>

Hi all,

Daniel Stutzbach and I have prepared a draft PEP for the new IO system
for Python 3000. This document is, hopefully, true to the info that
Guido wrote on the whiteboards here at PyCon. This is still a draft
and there's quite a few decisions that need to be made. Feedback is
welcomed.

We've published it on Google Docs here:
http://docs.google.com/Doc?id=dfksfvqd_1cn5g5m

What follows is a plaintext version.

Thanks,

Mike.


PEP: XXX
Title: New IO
Version:
Last-Modified:
Authors: Daniel Stutzbach, Mike Verdone
Status: Draft
Type:
Created: 26-Feb-2007

Rationale and Goals
Python allows for a variety of file-like objects that can be worked
with via bare read() and write() calls using duck typing. Anything
that provides read() and write() is stream-like. However, more exotic
and extremely useful functions like readline() or seek() may or may
not be available on a file-like object. Python needs a specification
for basic byte-based IO streams to which we can add buffering and
text-handling features.

Once we have a defined raw byte-based IO interface, we can add
buffering and text-handling layers on top of any byte-based IO class.
The same buffering and text handling logic can be used for files,
sockets, byte arrays, or custom IO classes developed by Python
programmers. Developing a standard definition of a stream lets us
separate stream-based operations like read() and write() from
implementation specific operations like fileno() and isatty(). It
encourages programmers to write code that uses streams as streams and
not require that all streams support file-specific or socket-specific
operations.

The new IO spec is intended to be similar to the Java IO libraries,
but generally less confusing. Programmers who don't want to muck about
in the new IO world can expect that the open() factory method will
produce an object backwards-compatible with old-style file objects.
Specification
The Python I/O Library will consist of three layers: a raw I/O layer,
a buffer I/O layer, and a text I/O layer.  Each layer is defined by an
abstract base class, which may have multiple implementations.  The raw
I/O and buffer I/O layers deal with units of bytes, while the text I/O
layer deals with units of characters.
Raw I/O
The abstract base class for raw I/O is RawIOBase.  It has several
methods which are wrappers around the appropriate operating system
call.  If one of these functions would not make sense on the object,
the implementation must raise an IOError exception.  For example, if a
file is opened read-only, the .write() method will raise an IOError.
As another example, if the object represents a socket, then .seek(),
.tell(), and .truncate() will raise an IOError.

    .read()
    .write()
    .seek()
    .tell()
    .truncate()
    .close()

Additionally, it defines a few other methods:

    (should these "is_" functions be attributes instead?
"file.readable == True")

    .is_readable()

       Returns True if the object was opened for reading, False
otherwise.  If False, .read() will raise an IOError if called.

    .is_writable()

       Returns True if the object was opened write writing, False
otherwise.  If False, .write() and .truncate() will raise an IOError
if called.

    .is_seekable()  (Should this be called .is_random()?  or
.is_sequential() with opposite return values?)

       Returns True if the object supports random-access (such as disk
files), or False if the object only supports sequential access (such
as sockets, pipes, and ttys).  If False, .seek(), .tell(), and
.truncate() will raise an IOError if called.

Iff a RawIOBase implementation operates on an underlying file
descriptor, it must additionally provide a .fileno() member function.
This could be defined specifically by the implementation, or a mix-in
class could be used (Need to decide about this).

    .fileno()

       Returns the underlying file descriptor (an integer)

Initially, three implementations will be provided that implement the
RawIOBase interface: FileIO, SocketIO, and ByteIO (also MMapIO?).
Each implementation must determine whether the object supports random
access as the information provided by the user may not be sufficient
(consider open("/dev/tty", "rw") or open("/tmp/named-pipe", "rw").  As
an example, FileIO can determine this by calling the seek() system
call; if it returns an error, the object does not support random
access.  Each implementation may provided additional methods
appropriate to its type.  The ByteIO object is analogous to Python 2's
cStringIO library, but operating on the new bytes type instead of
strings.
Buffered I/O
The next layer is the Buffer I/O layer which provides more efficient
access to file-like objects.  The abstract base class for all Buffered
I/O implementations is BufferedIOBase, which provides similar methods
to RawIOBase:

    .read()
    .write()
    .seek()
    .tell()
    .truncate()
    .close()
    .is_readable()
    .is_writable()
    .is_seekable()

Additionally, the abstract base class provides one member variable:

    .raw

       Provides a reference to the underling RawIOBase object.

The BufferIOBase methods' syntax is identical to that of RawIOBase,
but may have different semantics.  In particular, BufferIOBase
implementations may read more data than requested or delay writing
data using buffers.  For the most part, this will be transparent to
the user (unless, for example, they open the same file through a
different descriptor).

There are four implementations of the BufferIOBase abstract base
class, described below.
BufferedReader
The BufferedReader implementation is for sequential-access read-only
objects.  It does not provide a .flush() method, since there is no
sensible circumstance where the user would want to discard the read
buffer.
BufferedWriter
The BufferedWriter implementation is for sequential-access write-only
objects.  It provides a .flush() method, which forces all cached data
to be written to the underlying RawIOBase object.
BufferedRWPair
The BufferRWPair implementation is for sequential-access read-write
objects such as sockets and ttys.  As the read and write streams of
these objects are completely independent, it could be implemented by
simply incorporating a BufferedReader and BufferedWriter instance.  It
provides a .flush() method that has the same semantics as a
BufferWriter's .flush() method.
BufferedRandom
The BufferRandom implementation is for all random-access objects,
whether they are read-only, write-only, or read-write.  Compared to
the previous classes that operate on sequential-access objects, the
BufferedRandom class must contend with the user calling .seek() to
reposition the stream.  Therefore, an instance of BufferRandom must
keep track of both the logical and true position within the object.
It provides a .flush() method that forces all cached write data to be
written to the underlying RawIOBase object and all cached read data to
be forgotten (so that future reads are forced to go back to the disk).

Q: Do we want to mandate in the specification that switching between
reading to writing on a read-write object implies a .flush()?  Or is
that an implementation convenience that users should not rely on?

For a read-only BufferRandom object, .is_writable() returns False and
the .write() and .truncate() methods throw IOError.

For a write-only BufferRandom object, .is_readable() returns False and
the .read() method throws IOError.
Text I/O
The text I/O layer provides functions to read and write strings from
streams. Some new features include universal newlines and character
set encoding and decoding.  The Text I/O layer is defined by a
TextIOBase abstract base class.  It provides several methods that are
similar to the BufferIOBase methods, but operate on a per-character
basis instead of a per-byte basis.  These methods are:

    .read()
    .write()
    .seek()
    .tell()
    .truncate()

TextIOBase implementations also provide several methods that are
pass-throughs to the underlaying BufferIOBase objects:

    .close()
    .is_readable()
    .is_writable()
    .is_seekable()

TextIOBase class implementations additionally provide the following methods:

    .readline(self)

       Read until newline or EOF and return the line.

    .readlinesiter()

       Returns an iterator that returns lines from the file (which
happens to be 'self').

    .next()

       Same as readline()

    .__iter__()

       Same as readlinesiter()

    .__enter__()

       Context management protocol. Returns self.

    .__exit__()

       Context management protocol. No-op.

Two implementations will be provided by the Python library.  The
primary implementation, TextIOWrapper, wraps a Buffered I/O object.
Each TextIOWrapper object has a property name ".buffer" that provides
a reference to the underlying BufferIOBase object.  It's initializer
has the following signature:

    .__init__(self, buffer, encoding=None, universal_newlines=True, crlf=None)

       Buffer is a reference to the BufferIOBase object to be wrapped
with the TextIOWrapper.  "Encoding" refers to an encoding to be used
for translating between the byte-representation and
character-representation.  If "None", then the system's locale setting
will be used as the default.  If "universal_newlines" is true, then
the TextIOWrapper will automatically translate the bytes "\r\n" into a
single newline character during reads.  If "crlf" is False, then a
newline will be written as "\r\n".  If "crlf" is True, then a newline
will be written as "\n".  If "crlf" is None, then a system-specific
default will be used.

Another way to do it is as follows (we should pick one or the other):

    .__init__(self, buffer, encoding=None, newline=None)

       Same as above but if newline is not None use that as the
newline pattern (for reading and writing), and if newline is not set
attempt to find the newline pattern from the file and if we can't for
some reason use the system default newline pattern.

Another implementation, StringIO, creates a file-like TextIO
implementation without an underlying Buffer I/O object.  While similar
functionality could be provided by wrapping a BytesIO object in a
Buffered I/O object in a TextIOWrapper, the String I/O object allows
for much greater efficiency as it does not need to actually performing
encoding and decoding.  A String I/O object can just store the encoded
string as-is.  The String I/O object's __init__ signature is similar
to the TextIOWrapper, but without the "buffer" parameter.

END OF PEP

From steven.bethard at gmail.com  Tue Feb 27 00:00:39 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 26 Feb 2007 16:00:39 -0700
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
Message-ID: <d11dcfba0702261500o1bd360a1n5fd5de8e3697e9ba@mail.gmail.com>

On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> Daniel Stutzbach and I have prepared a draft PEP for the new IO system
> for Python 3000.

Thanks for doing this! Generally, it looks pretty good.

> Additionally, it defines a few other methods:
>
>     (should these "is_" functions be attributes instead?
> "file.readable == True")
>
>     .is_readable()
[snip]
>     .is_writable()
[snip]
>     .is_seekable()
[snip]
> Additionally, the abstract base class provides one member variable:
>
>     .raw
[snip]

I gather that the reason for methods instead of attributes is that
it's easier to delegate to a method than it is to an attribute?  That
is::

    def is_readable(self):
        return self.raw.is_readable()

is easier to write than::

    @property
    def readable(self):
        return self.raw.readable

If that's the motivation, I'd assume that we'd want a ``get_raw()``
method instead of the ``.raw`` attribute.  FWLIW, as a user, I'd
rather just work with attributes.

> TextIOBase class implementations additionally provide the following methods:
>
>     .readline(self)
>        Read until newline or EOF and return the line.
>
>     .readlinesiter()
>        Returns an iterator that returns lines from the file (which
> happens to be 'self').
>
>     .next()
>        Same as readline()
>
>     .__iter__()
>        Same as readlinesiter()

If they do the same thing, why do we want them?  I gather that the
next()/readline() duplication is for backwards compatibility, but why
the __iter__()/readlinesiter() duplication?
> Another way to do it is as follows (we should pick one or the other):
>
>     .__init__(self, buffer, encoding=None, newline=None)
>
>        Same as above but if newline is not None use that as the
> newline pattern (for reading and writing), and if newline is not set
> attempt to find the newline pattern from the file and if we can't for
> some reason use the system default newline pattern.

I like this API better, but I'm not certain I understand the proposal.
 If I call::

    TextIOWrapper(buffer, newline='\n')

does that mean that any '\r\n' strings in the file will appear as
'\n'?  Likewise, if I call::

    TextIOWrapper(buffer, newline='\r\n')

does that mean that any bare '\n' strings will appear as '\r\n'?  If
not, how do I get universal newline support with this API?  (FWLIW,
I'd be happy with the you-only-see-newlines-like-you-asked-for-them
semantics above.)

> Another implementation, StringIO, creates a file-like TextIO
> implementation without an underlying Buffer I/O object.  While similar
> functionality could be provided by wrapping a BytesIO object in a
> Buffered I/O object in a TextIOWrapper, the String I/O object allows
> for much greater efficiency as it does not need to actually performing
> encoding and decoding.

Sorry, I didn't understand this part. The StringIO won't have to do
encoding/decoding when ``.next()`` is called?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From greg.ewing at canterbury.ac.nz  Tue Feb 27 00:22:52 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 Feb 2007 12:22:52 +1300
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <ca471dc20702252138oce44b50vf83e1ce146cc447c@mail.gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
	<20070226002930.GB3067@python.ca>
	<ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>
	<45E25736.2010404@canterbury.ac.nz>
	<ca471dc20702252138oce44b50vf83e1ce146cc447c@mail.gmail.com>
Message-ID: <45E36BCC.6030102@canterbury.ac.nz>

Guido van Rossum wrote:
> In 3.0 it is more generally useful (I want to
> use it whenever an int is needed). but in our pre-alpha code the error
> messages haven't been fixed yet.

Okay, that's fine, thanks.

--
Greg

From guido at python.org  Tue Feb 27 00:48:13 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Feb 2007 17:48:13 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <d11dcfba0702261500o1bd360a1n5fd5de8e3697e9ba@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<d11dcfba0702261500o1bd360a1n5fd5de8e3697e9ba@mail.gmail.com>
Message-ID: <ca471dc20702261548g1111bbd3n74b5f0338974df77@mail.gmail.com>

On 2/26/07, Steven Bethard <steven.bethard at gmail.com> wrote:
> On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > Daniel Stutzbach and I have prepared a draft PEP for the new IO system
> > for Python 3000.
>
> Thanks for doing this! Generally, it looks pretty good.

Agreed. I made some changes to the published doc, you may want to refresh it.

> > Additionally, it defines a few other methods:
> >
> >     (should these "is_" functions be attributes instead?
> > "file.readable == True")
> >
> >     .is_readable()
> [snip]
> >     .is_writable()
> [snip]
> >     .is_seekable()
> [snip]

These are now .readable() etc.

> > Additionally, the abstract base class provides one member variable:
> >
> >     .raw
> [snip]
>
> I gather that the reason for methods instead of attributes is that
> it's easier to delegate to a method than it is to an attribute?  That
> is::
>
>     def is_readable(self):
>         return self.raw.is_readable()
>
> is easier to write than::
>
>     @property
>     def readable(self):
>         return self.raw.readable
>
> If that's the motivation, I'd assume that we'd want a ``get_raw()``
> method instead of the ``.raw`` attribute.  FWLIW, as a user, I'd
> rather just work with attributes.

No, the difference in API styles has more to do with that readable()
etc. *may* require actual work to be done to come up with a value
(especially seekable() may require one to try an lseek() syscall to
see if it work).

> > TextIOBase class implementations additionally provide the following methods:
> >
> >     .readline(self)
> >        Read until newline or EOF and return the line.
> >
> >     .readlinesiter()
> >        Returns an iterator that returns lines from the file (which
> > happens to be 'self').
> >
> >     .next()
> >        Same as readline()
> >
> >     .__iter__()
> >        Same as readlinesiter()
>
> If they do the same thing, why do we want them?  I gather that the
> next()/readline() duplication is for backwards compatibility, but why
> the __iter__()/readlinesiter() duplication?

Right. readlinesiter() is gone.

> > Another way to do it is as follows (we should pick one or the other):
> >
> >     .__init__(self, buffer, encoding=None, newline=None)
> >
> >        Same as above but if newline is not None use that as the
> > newline pattern (for reading and writing), and if newline is not set
> > attempt to find the newline pattern from the file and if we can't for
> > some reason use the system default newline pattern.
>
> I like this API better, but I'm not certain I understand the proposal.

Me neither. I'll think about this some more.

>  If I call::
>
>     TextIOWrapper(buffer, newline='\n')
>
> does that mean that any '\r\n' strings in the file will appear as
> '\n'?  Likewise, if I call::
>
>     TextIOWrapper(buffer, newline='\r\n')
>
> does that mean that any bare '\n' strings will appear as '\r\n'?  If
> not, how do I get universal newline support with this API?  (FWLIW,
> I'd be happy with the you-only-see-newlines-like-you-asked-for-them
> semantics above.)
>
> > Another implementation, StringIO, creates a file-like TextIO
> > implementation without an underlying Buffer I/O object.  While similar
> > functionality could be provided by wrapping a BytesIO object in a
> > Buffered I/O object in a TextIOWrapper, the String I/O object allows
> > for much greater efficiency as it does not need to actually performing
> > encoding and decoding.
>
> Sorry, I didn't understand this part. The StringIO won't have to do
> encoding/decoding when ``.next()`` is called?

The idea is that this should work like StringIO.py in Python 2.x when
you only write unicode strings to it. It will then store everything as
Unicode strings and the seek positions count characters, not bytes.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Feb 27 00:59:36 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 Feb 2007 12:59:36 +1300
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <ervabi$822$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org> <ervabi$822$1@sea.gmane.org>
Message-ID: <45E37468.9060400@canterbury.ac.nz>

Travis Oliphant wrote:

> If they can't do it easily, then they don't have to define the 
> release-function and Python will never call it.

The case I'm worried about is where the data can move,
so it really *needs* to be locked, yet the object has
no way of ensuring that. It would be impossible for
the object to correctly implement this kind of buffer
protocol.

> If you want shape information you are going to have to allocate memory.

But only when the shape changes, not every time you
want a pointer to the memory.

I like Guido's idea of separating the shape/type info
from getting the memory pointer.

--
Greg

From guido at python.org  Tue Feb 27 01:16:28 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Feb 2007 18:16:28 -0600
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <45E37468.9060400@canterbury.ac.nz>
References: <erv9or$53s$1@sea.gmane.org> <ervabi$822$1@sea.gmane.org>
	<45E37468.9060400@canterbury.ac.nz>
Message-ID: <ca471dc20702261616i48335f5ch236ec0ecb5029ab9@mail.gmail.com>

On 2/26/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Travis Oliphant wrote:
>
> > If they can't do it easily, then they don't have to define the
> > release-function and Python will never call it.
>
> The case I'm worried about is where the data can move,
> so it really *needs* to be locked, yet the object has
> no way of ensuring that. It would be impossible for
> the object to correctly implement this kind of buffer
> protocol.

Are you aware of an object that has such a requirement? I would think
that the object is in charge of moving its own buffer. If it doesn't
have control over when the buffer moves it shouldn't claim to
implement the buffer protocol.

> > If you want shape information you are going to have to allocate memory.
>
> But only when the shape changes, not every time you
> want a pointer to the memory.
>
> I like Guido's idea of separating the shape/type info
> from getting the memory pointer.

Well it's not my area of expertise. I thought that in order to
describe a generalized 3d slice of a 3d array you might need offsets
for at least some of the dimensions, but I could be wrong.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Feb 27 01:30:40 2007
From: guido at python.org (Guido van Rossum)
Date: Mon, 26 Feb 2007 18:30:40 -0600
Subject: [Python-3000] Pre-PEP: Simple input built-in in Python 3000
In-Reply-To: <ca471dc20702200909x22d0184eg3f56d6e9c8bacd26@mail.gmail.com>
References: <7528bcdd0612220545u147f07a4gb476dd43733dfe46@mail.gmail.com>
	<ca471dc20612221347k7c30071drb0654df98ccd51aa@mail.gmail.com>
	<7528bcdd0702191032n4347e8c8p6987553deb9be445@mail.gmail.com>
	<45DAFEDE.4030109@gmail.com>
	<ca471dc20702200751i30371e37yc5a4c5d187c7f8b@mail.gmail.com>
	<7528bcdd0702200901r62f8cc4fu7ea7f1e59725e4b6@mail.gmail.com>
	<ca471dc20702200909x22d0184eg3f56d6e9c8bacd26@mail.gmail.com>
Message-ID: <ca471dc20702261630t1cc8862bh9d5c7585683a17f1@mail.gmail.com>

We implemented this at today's sprint. Andre wrote the transformations
for the 2to3 tools, I copied the raw_input() implementation from the
trunk back into the p3yk branch. Thanks Andre for your efforts in
writing the PEP, pushing for its implementation, and writing the
transformations!

--Guido

On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> Consider the PEP accepted.
>
> Regarding the conversion, please do use the sandbox/2to3 framework.
> Write me if you have trouble understanding the many examples already
> in fixes/.
>
> On 2/20/07, Andre Roberge <andre.roberge at gmail.com> wrote:
> > On 2/20/07, Guido van Rossum <guido at python.org> wrote:
> > > Why do you want this *before* PyCon? It would be much easier to do
> > > this as part of the Py3k sprint.
> > >
> >
> > My main interest was to have, prior to Pycon, the PEP recorded as
> > such; it had been close to 2 months since the last post on this issue
> > on the list.
> >
> > As for the actual work, I'd be willing to volunteer to write the
> > required code (with test cases) that could be use to do the conversion
> > input(...)  ->  eval(input(...))
> > raw_input(...)  ->  input(...)
> >
> > Unfortunately, I will not be participating in any sprints.
> >
> > Andr?
> >
> >
> >
> > > On 2/20/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > > > Andre Roberge wrote:
> > > > > Any possibility that (some of) the following can be done before Pycon?
> > > > > Respectfully yours,
> > > > > Andr? Roberge
> > > >
> > > > I've added the PEP as 3111. I made a few small modifications (and
> > > > committed it directly as Accepted) based on Guido's comments in this thread.
> > > >
> > > > The actual change still needs to be made, though.
> > > >
> > > > Cheers,
> > > > Nick.
> > > >
> > > > --
> > > > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> > > > ---------------------------------------------------------------
> > > >              http://www.boredomandlaziness.org
> > > > _______________________________________________
> > > > Python-3000 mailing list
> > > > Python-3000 at python.org
> > > > http://mail.python.org/mailman/listinfo/python-3000
> > > > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
> > > >
> > >
> > >
> > > --
> > > --Guido van Rossum (home page: http://www.python.org/~guido/)
> > >
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Tue Feb 27 01:37:37 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 Feb 2007 13:37:37 +1300
Subject: [Python-3000] Weird error message from bytes type
In-Reply-To: <45E2B21C.1030101@gmail.com>
References: <ert5ti$3p6$1@sea.gmane.org>
	<9e804ac0702251535i7fc7aabiaabddbb0209ea4f1@mail.gmail.com>
	<ca471dc20702251540k550ee3e3m332ed5e8e52a025c@mail.gmail.com>
	<20070226002930.GB3067@python.ca>
	<ca471dc20702251639p6ea22e2o5c2363a2df2b6066@mail.gmail.com>
	<45E25736.2010404@canterbury.ac.nz>
	<ca471dc20702252138oce44b50vf83e1ce146cc447c@mail.gmail.com>
	<45E2B21C.1030101@gmail.com>
Message-ID: <45E37D51.4050201@canterbury.ac.nz>

Nick Coghlan wrote:
> For example, simply 
> replacing 'index' with 'integer' could lead to a different kind of 
> confusion:
>   TypeError: 'float' object cannot be interpreted as an integer

Maybe something like

   TypeError: 'float' object cannot be used as an integer in this context

Or maybe require the caller to pass in an error message
format, or at least have a version of the call which
allows that.

--
Greg

From greg.ewing at canterbury.ac.nz  Tue Feb 27 01:39:49 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 Feb 2007 13:39:49 +1300
Subject: [Python-3000] Thoughts on new I/O library and bytecode
In-Reply-To: <20070225194742.AE4B.JCARLSON@uci.edu>
References: <9e804ac0702251429i36623d1bg7ea69d0ad7945429@mail.gmail.com>
	<45E25025.6020107@canterbury.ac.nz>
	<20070225194742.AE4B.JCARLSON@uci.edu>
Message-ID: <45E37DD5.2050809@canterbury.ac.nz>

Josiah Carlson wrote:
> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
>>    $[68656C6C6F]
> 
> I think it's a bad idea to choose a representation with any format that
> isn't able to do the eval(repr(obj)) loop.

The intention was for that to be a valid literal
syntax as well.

 > It may be the case
> that b"stuff" is the most concise and reasonable repr form...

I can only see it being the most concise when most of
the bytes can be meaningfully interpreted as characters.
Otherwise it's full of \xyy escapes, making it up to
twice as long as necessary and harder to read.

I can't help feeling the people arguing for b"..." as the
repr format haven't really accepted the fact that text and
binary data will be distinct things in py3k, and are thinking
of bytes as being a replacement for the old string type. But
that's not true -- most of the time, *unicode* will be the
replacement for str when it is used to represent characters,
and bytes will mostly be used only for non-text.

I know that there will be exceptions, such as when writing
code to deal with raw SMTP connections and such like. But
how often do people write code like that? Usually it's
written once and put in a library. I think these cases will
be in the minority.

Guido wrote:
> If you want all hex, use the .hex() method described in the PEP.

That seems back-to-front to me. The default repr should
not be making assumptions about the meaning of the bytes.
It would make more sense to have a .chars() method or
something for when you want it interpreted that way.

--
Greg

From tdelaney at avaya.com  Tue Feb 27 01:55:39 2007
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Tue, 27 Feb 2007 11:55:39 +1100
Subject: [Python-3000] Weird error message from bytes type
Message-ID: <2773CAC687FD5F4689F526998C7E4E5F07446D@au3010avexu1.global.avaya.com>

Greg Ewing wrote:

> Nick Coghlan wrote:
>> For example, simply
>> replacing 'index' with 'integer' could lead to a different kind of
>>   confusion: TypeError: 'float' object cannot be interpreted as an
>> integer 
> 
> Maybe something like
> 
>    TypeError: 'float' object cannot be used as an integer in this
> context 

Going back to the original proposal:

>>> x = b'a'
>>> x[0] = b'a'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'bytes' object cannot be used as an integer in this context

Not too bad. +1 as a default.

> Or maybe require the caller to pass in an error message
> format, or at least have a version of the call which
> allows that.

+1

>>> x = b'a'
>>> x[0] = b'a'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot assign 'bytes' object to 'bytes' element

>>> x = b'a'
>>> x[0] = [1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Cannot assign 'list' object to 'bytes' element

Tim Delaney

From p.f.moore at gmail.com  Tue Feb 27 11:57:27 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 27 Feb 2007 10:57:27 +0000
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
Message-ID: <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>

On 26/02/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> Daniel Stutzbach and I have prepared a draft PEP for the new IO system
> for Python 3000. This document is, hopefully, true to the info that
> Guido wrote on the whiteboards here at PyCon. This is still a draft
> and there's quite a few decisions that need to be made. Feedback is
> welcomed.

Generally, this looks nice. A couple of minor points:

> The new IO spec is intended to be similar to the Java IO libraries,
> but generally less confusing. Programmers who don't want to muck about
> in the new IO world can expect that the open() factory method will
> produce an object backwards-compatible with old-style file objects.

Documenting the revised open() factory in this PEP would be useful. It
needs to address encoding issues, so it's not a simple copy of the
existing open().

Also, should there be a factory method for opening raw byte streams?
Once we start down this route, we open the can of worms, of course
(does socket.socket need to be specified in terms of the new IO
layers? what about the mmap module, the gzip/zipfile/tarfile modules,
etc?) These sould probably be noted in an "open issues" section, and
otherwise deferred for now.

> The BufferedReader implementation is for sequential-access read-only
> objects.  It does not provide a .flush() method, since there is no
> sensible circumstance where the user would want to discard the read
> buffer.

It's not something I've done personally, but programs sometimes flush
a read buffer before (eg) reading a password from stdin, to avoid
typeahead problems. I don't know if that would be relevant here.
>     .readlinesiter()
>     .__iter__()

I was going to object to the name readlinesiter, but I see it's gone already :-)

> Another way to do it is as follows (we should pick one or the other):
>
>     .__init__(self, buffer, encoding=None, newline=None)
>
>        Same as above but if newline is not None use that as the
> newline pattern (for reading and writing), and if newline is not set
> attempt to find the newline pattern from the file and if we can't for
> some reason use the system default newline pattern.

I'm not sure that can work - the point of universal newlines is that
*any* of \n, \r or \r\n count as a newline, so there's no one pattern.
So I think that explicitly specifying universal newlines is necessary
(even though it's clunky).

Regards,
Paul.

From rhamph at gmail.com  Tue Feb 27 16:00:37 2007
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 27 Feb 2007 08:00:37 -0700
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
Message-ID: <aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>

On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> Text I/O
> The text I/O layer provides functions to read and write strings from
> streams. Some new features include universal newlines and character
> set encoding and decoding.  The Text I/O layer is defined by a
> TextIOBase abstract base class.  It provides several methods that are
> similar to the BufferIOBase methods, but operate on a per-character
> basis instead of a per-byte basis.  These methods are:

"per-character" needs some clarification.  I'm guessing this will only
return entire code points, but the unicode type will expose them as
code units, so it could be seen as both per-code-point and
per-code-unit.

To be really pedantic, neither of them are truly "per-character" in
unicode parlance, despite the fact that they store "character data".

-- 
Adam Olsen, aka Rhamphoryncus

From steven.bethard at gmail.com  Tue Feb 27 16:09:21 2007
From: steven.bethard at gmail.com (Steven Bethard)
Date: Tue, 27 Feb 2007 08:09:21 -0700
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
Message-ID: <d11dcfba0702270709q55385136v23c7c5ba20d70b23@mail.gmail.com>

On 2/27/07, Paul Moore <p.f.moore at gmail.com> wrote:
> >     .__init__(self, buffer, encoding=None, newline=None)
> >
> >        Same as above but if newline is not None use that as the
> > newline pattern (for reading and writing), and if newline is not set
> > attempt to find the newline pattern from the file and if we can't for
> > some reason use the system default newline pattern.
>
> I'm not sure that can work - the point of universal newlines is that
> *any* of \n, \r or \r\n count as a newline, so there's no one pattern.
> So I think that explicitly specifying universal newlines is necessary
> (even though it's clunky).

Maybe there could be a special UNIVERSAL constant, so you'd write
something like::

    TextIOWrapper(buffer, newline=UNIVERSAL)

?

Steve
-- 
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
        --- Bucky Katt, Get Fuzzy

From guido at python.org  Tue Feb 27 17:38:23 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 10:38:23 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
Message-ID: <ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>

On 2/27/07, Paul Moore <p.f.moore at gmail.com> wrote:
[...]
> Documenting the revised open() factory in this PEP would be useful. It
> needs to address encoding issues, so it's not a simple copy of the
> existing open().

Check the doc again. I added on at the end. It could use some review.
I also added an elaboration into the p3yk branch in svn; that could
use some review as well.

> Also, should there be a factory method for opening raw byte streams?

The open() I added returns a raw byte stream when you specify binary
mode with buffering=0.

> Once we start down this route, we open the can of worms, of course
> (does socket.socket need to be specified in terms of the new IO
> layers?

No, but check the io.py in svn; it has a SocketIO class that wraps a
socket. Sockets themselves are much lower level than this; they have
all sort of other APIs. The SocketIO class only works for stream
socket (e.g., TCP/IO).

> what about the mmap module, the gzip/zipfile/tarfile modules,
> etc?) These sould probably be noted in an "open issues" section, and
> otherwise deferred for now.

Agreed that we should add these to the open issues section. I don't
think we should mess with mmap, but *perhaps* a mmap wrapper could be
provided (by the mmap module). gzip, bzip2 etc. should probably be
redefined in terms of the buffered (bytes) reader/writer protocol.
zipfile and tarfile should take bytes readers/writers; the API they
*provide* should be defined in terms of bytes and perhaps (when
appropriate, I don't recall if they have read/write methods) in terms
of buffered byte streams.

It *may* even be useful if many of these would support non-blocking
I/O; we're currently considering adding a standard API for returning
"EWOULDBLOCK" errors (e.g. return None from read() and write()) --
though we won't be providing an API to turn that on (since it depends
on the underlying implementation, e.g. sockets vs. files).

> > The BufferedReader implementation is for sequential-access read-only
> > objects.  It does not provide a .flush() method, since there is no
> > sensible circumstance where the user would want to discard the read
> > buffer.
>
> It's not something I've done personally, but programs sometimes flush
> a read buffer before (eg) reading a password from stdin, to avoid
> typeahead problems. I don't know if that would be relevant here.

We discussed this briefly at the sprint and came to the conclusion
that this is outside the scope of the PEP; you can do this by
(somehow) enabling non-blocking mode and then reading until you get
None.

> > Another way to do it is as follows (we should pick one or the other):
> >
> >     .__init__(self, buffer, encoding=None, newline=None)
> >
> >        Same as above but if newline is not None use that as the
> > newline pattern (for reading and writing), and if newline is not set
> > attempt to find the newline pattern from the file and if we can't for
> > some reason use the system default newline pattern.
>
> I'm not sure that can work - the point of universal newlines is that
> *any* of \n, \r or \r\n count as a newline, so there's no one pattern.
> So I think that explicitly specifying universal newlines is necessary
> (even though it's clunky).

I think for input we should always accept all three line endings so
you never need to specify anything; for output, we should pick a
platform default (\r\n on Windows, \n everywhere else) and have an API
to override it. So the API you quote above sounds about right:

  .__init__(self, buffer, encoding=None, newline=None)

I'd like to constrain newline to be either \n or \r\n for writing; for
reading IMO it should not be specified.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From p.f.moore at gmail.com  Tue Feb 27 17:59:24 2007
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 27 Feb 2007 16:59:24 +0000
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
Message-ID: <79990c6b0702270859s55ba98can384f45dc2cd47778@mail.gmail.com>

On 27/02/07, Guido van Rossum <guido at python.org> wrote:
> On 2/27/07, Paul Moore <p.f.moore at gmail.com> wrote:
> [...]
> > Documenting the revised open() factory in this PEP would be useful. It
> > needs to address encoding issues, so it's not a simple copy of the
> > existing open().
>
> Check the doc again. I added on at the end. It could use some review.
> I also added an elaboration into the p3yk branch in svn; that could
> use some review as well.

Sorry, I hadn't checked the updated version. I'll take a look.

[...]
> I think for input we should always accept all three line endings so
> you never need to specify anything; for output, we should pick a
> platform default (\r\n on Windows, \n everywhere else) and have an API
> to override it. So the API you quote above sounds about right:
>
>   .__init__(self, buffer, encoding=None, newline=None)
>
> I'd like to constrain newline to be either \n or \r\n for writing; for
> reading IMO it should not be specified.

Ah. If that's the intent, I agree - in effect universal newlines is
always on, and output uses platform semantics unless you force it to
be overridden.

Forcing only \n or \r\n sounds fine to me.

Paul.

From jimjjewett at gmail.com  Tue Feb 27 19:22:48 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 Feb 2007 13:22:48 -0500
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
Message-ID: <fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>

On 2/27/07, Guido van Rossum <guido at python.org> wrote:
> On 2/27/07, Paul Moore <p.f.moore at gmail.com> wrote:

> It *may* even be useful if many of these would support non-blocking
> I/O; we're currently considering adding a standard API for returning
> "EWOULDBLOCK" errors (e.g. return None from read() and write()) --
> though we won't be providing an API to turn that on (since it depends
> on the underlying implementation, e.g. sockets vs. files).

I thought the point of the IO subsystem was to abstract away those differences.

Trying to set (non-)blocking may raise an exception on some streams,
but that still seems better than having to know the internal details
before you can even ask.

> > > The BufferedReader implementation is for sequential-access read-only
> > > objects.  It does not provide a .flush() method, since there is no
> > > sensible circumstance where the user would want to discard the read
> > > buffer.

> > ... typeahead problems.

> ... outside the scope of the PEP; you can do this by
> (somehow) enabling non-blocking mode and then reading until you get
> None.

That does sound like a use case, and flush() is the obvious method.

Are you concerned that having the (rarely needed) method available may
be an attractive nuisance or source of confusion?

> I think for input we should always accept all three line endings so
> you never need to specify anything; for output, we should pick ...

So saving a text file can cause (whitespace) changes all over?

That might be OK, but it should at least be called out, so that
editors wanting minimal change will know that they have to implement
their own Text layer.

-jJ

From jimjjewett at gmail.com  Tue Feb 27 19:41:46 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 Feb 2007 13:41:46 -0500
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>
Message-ID: <fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>

On 2/27/07, Adam Olsen <rhamph at gmail.com> wrote:
> On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > Text I/O
> > ... operate on a per-character basis instead of a per-byte basis.

> "per-character" needs some clarification.  I'm guessing this will only
> return entire code points, but the unicode type will expose them as
> code units, so it could be seen as both per-code-point and
> per-code-unit.

Does this just mean that you assume
(1) UTF32
(2) surrogate pairs will show up as two characters
(3) diacritics may (or may not) show up separately from their base characters?

This does suggest that error-correction should be specified (or at
least explicitly not specified).  If the underlying input byte-stream
contains an invalid sequence, will the TextIO raise a
UnicodeDecodeError?  Or will its error/replace/delete behavior be
settable?

Does the Text class promise to catch things like an invalid
combination of surrogates?

-jJ

From guido at python.org  Tue Feb 27 19:51:47 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 12:51:47 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>
Message-ID: <ca471dc20702271051q21999822ye6c423bf4749217e@mail.gmail.com>

On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/27/07, Guido van Rossum <guido at python.org> wrote:
> > On 2/27/07, Paul Moore <p.f.moore at gmail.com> wrote:
>
> > It *may* even be useful if many of these would support non-blocking
> > I/O; we're currently considering adding a standard API for returning
> > "EWOULDBLOCK" errors (e.g. return None from read() and write()) --
> > though we won't be providing an API to turn that on (since it depends
> > on the underlying implementation, e.g. sockets vs. files).
>
> I thought the point of the IO subsystem was to abstract away those differences.

We will abstract away the differences of how you *use* a stream that's
in non-blocking (or timeout) mode. but we can't abstract away the APIs
used to *request* those modes since the APi depends on the abilities
of the system object -- sockets, pipes and disk files all have
different semantics here.

> Trying to set (non-)blocking may raise an exception on some streams,
> but that still seems better than having to know the internal details
> before you can even ask.

I doubt it -- non-blocking mode is pretty specialized. I want it to be
*possible* to use the new I/O library with file descriptors that can
return EWOULDBLOCK; I don't necessarily want to make it *easy*.

> > > > The BufferedReader implementation is for sequential-access read-only
> > > > objects.  It does not provide a .flush() method, since there is no
> > > > sensible circumstance where the user would want to discard the read
> > > > buffer.
>
> > > ... typeahead problems.
>
> > ... outside the scope of the PEP; you can do this by
> > (somehow) enabling non-blocking mode and then reading until you get
> > None.
>
> That does sound like a use case, and flush() is the obvious method.

No it isn't. Calling flush() for writing has no semantics at the
highest-level abstraction: you can insert flush() calls whenever you
want or omit them and the data will still be written; the only time
you care is when the abstraction is broken and you lose a buffer due
to a segfault etc. The semantics of this use case are very different;
perhaps we can add a reset() or discard() method which throws away the
buffer contents but that's as far as I want to go. The passwd-reading
example ought to be hidden in the getpass module.

> Are you concerned that having the (rarely needed) method available may
> be an attractive nuisance or source of confusion?

Perhaps; people will latch on to a name and call it; or they will
mindlessly copy code that happens to contain it and a new voodoo
religion or superstition is easily born. Also whether this makes sense
or not depends a lot on what kind of device you are reading; I can't
imagine a socket use case for example.

> > I think for input we should always accept all three line endings so
> > you never need to specify anything; for output, we should pick ...
>
> So saving a text file can cause (whitespace) changes all over?

It would only normalize line endings, but yeah.

> That might be OK, but it should at least be called out, so that
> editors wanting minimal change will know that they have to implement
> their own Text layer.

I expect them to do that anyway. But I would not be against being able
to specify newline="\n" on input and have it mean that \r\n line
endings remain in the data where present. I'm not sure that I would
like newline="\r\n" to mean that a lone \n should not be considered a
line ending, even if some stupid Windows apps behave that way.

A compromise would be to support what "U" mode currently does -- it
makes the line endings actually encountered available as an attribute
on the file.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Feb 27 20:02:20 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 13:02:20 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>
	<fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>
Message-ID: <ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>

The encoding/decoding behavior should be no different from that of the
encode() and decode() methods on unicode strings and byte arrays.

Certainly no normalization of diacritics will be done; surrogate
handling depends on the encoding and whether the unicode string
implementation uses 16 or 32 bits per character.

I agree that we need to be able to specify the error handling as well.
UnicodeErrors may be raised.

--Guido

On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/27/07, Adam Olsen <rhamph at gmail.com> wrote:
> > On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
> > > Text I/O
> > > ... operate on a per-character basis instead of a per-byte basis.
>
> > "per-character" needs some clarification.  I'm guessing this will only
> > return entire code points, but the unicode type will expose them as
> > code units, so it could be seen as both per-code-point and
> > per-code-unit.
>
> Does this just mean that you assume
> (1) UTF32
> (2) surrogate pairs will show up as two characters
> (3) diacritics may (or may not) show up separately from their base characters?
>
> This does suggest that error-correction should be specified (or at
> least explicitly not specified).  If the underlying input byte-stream
> contains an invalid sequence, will the TextIO raise a
> UnicodeDecodeError?  Or will its error/replace/delete behavior be
> settable?
>
> Does the Text class promise to catch things like an invalid
> combination of surrogates?
>
> -jJ
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Tue Feb 27 20:18:31 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 Feb 2007 14:18:31 -0500
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702271051q21999822ye6c423bf4749217e@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>
	<ca471dc20702271051q21999822ye6c423bf4749217e@mail.gmail.com>
Message-ID: <fb6fbf560702271118y661b8ea7o24e70b3a60e72f0c@mail.gmail.com>

On 2/27/07, Guido van Rossum <guido at python.org> wrote:
> On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:

> > Trying to set (non-)blocking may raise an exception on some streams,
> > but that still seems better than having to know the internal details
> > before you can even ask.

> I doubt it -- non-blocking mode is pretty specialized. I want it to be
> *possible* to use the new I/O library with file descriptors that can
> return EWOULDBLOCK; I don't necessarily want to make it *easy*.

Rewording to see if I understand:

source.read() will always block, unless something out-of-band has changed it.

*If* it has been changed out-of-band, then None is used to indicate this.

Therefore, normal code can ignore the possibility, or (to be really
robust against someone else messing with the input stream) add an "if
result is None: continue" clause to its loops.

> No it isn't. Calling flush() for writing has no semantics at the
> highest-level abstraction:

Are you saying that flush() need not be a blocking operation?
That makes it a bit hard to force interaction.

>> So saving a text file can cause (whitespace) changes all over?

> It would only normalize line endings, but yeah.

> > That might be OK, but it should at least be called out, so that
> > editors wanting minimal change will know that they have to implement
> > their own Text layer.

> I expect them to do that anyway.

I don't.  Wanting to minimize diffs doesn't imply any interest in unicode.

> But I would not be against being able
> to specify newline="\n" on input and have it mean that \r\n line
> endings remain in the data where present.

That sort of passthrough mode is enough for me.  Thank you.

-jJ

From guido at python.org  Tue Feb 27 21:39:25 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 14:39:25 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <fb6fbf560702271118y661b8ea7o24e70b3a60e72f0c@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>
	<ca471dc20702271051q21999822ye6c423bf4749217e@mail.gmail.com>
	<fb6fbf560702271118y661b8ea7o24e70b3a60e72f0c@mail.gmail.com>
Message-ID: <ca471dc20702271239y49523ecfuea95c7d520271948@mail.gmail.com>

On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/27/07, Guido van Rossum <guido at python.org> wrote:
> > On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> > > Trying to set (non-)blocking may raise an exception on some streams,
> > > but that still seems better than having to know the internal details
> > > before you can even ask.
>
> > I doubt it -- non-blocking mode is pretty specialized. I want it to be
> > *possible* to use the new I/O library with file descriptors that can
> > return EWOULDBLOCK; I don't necessarily want to make it *easy*.
>
> Rewording to see if I understand:
>
> source.read() will always block, unless something out-of-band has changed it.
>
> *If* it has been changed out-of-band, then None is used to indicate this.

Imprecise language, but I understand what you mean. More exacgt would
be None is returned instead of raising an IOError with errno set to
EWOULDBLOCK (or whatever its equivalent on Windows).

> Therefore, normal code can ignore the possibility, or (to be really
> robust against someone else messing with the input stream) add an "if
> result is None: continue" clause to its loops.

No, since that would mean busy-waiting while the I/O isn't ready,
unless there's a select or similar at the top of the loop, in which
case you're not "normal code". Better raise an exception if you get
this. Better even not to check for this at all if you're not prepared
to handle it -- attempting to use None as a string will raise an
exception for you. You could also treat it as EOF.

> > No it isn't. Calling flush() for writing has no semantics at the
> > highest-level abstraction:
>
> Are you saying that flush() need not be a blocking operation?
> That makes it a bit hard to force interaction.

I didn't intend to say that. Depending on whether and how often you
call flush(), the other side could see your bytes at different times,
but it should see the same data in the same order regardless (except
if you never flush your final writes).

FWIW we just discovered that the buffered writers need a __del__
method that calls flush()...

> >> So saving a text file can cause (whitespace) changes all over?
>
> > It would only normalize line endings, but yeah.
>
> > > That might be OK, but it should at least be called out, so that
> > > editors wanting minimal change will know that they have to implement
> > > their own Text layer.
>
> > I expect them to do that anyway.
>
> I don't.  Wanting to minimize diffs doesn't imply any interest in unicode.
>
> > But I would not be against being able
> > to specify newline="\n" on input and have it mean that \r\n line
> > endings remain in the data where present.
>
> That sort of passthrough mode is enough for me.  Thank you.

OK, I'll update the PEP text.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From walter at livinglogic.de  Tue Feb 27 21:39:48 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 27 Feb 2007 21:39:48 +0100
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>	<fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>
	<ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>
Message-ID: <45E49714.9060003@livinglogic.de>

Guido van Rossum wrote:

> The encoding/decoding behavior should be no different from that of the
> encode() and decode() methods on unicode strings and byte arrays.

Except that it must work in incremental mode. The new (in 2.5) 
incremental codecs should be usable for that.

> Certainly no normalization of diacritics will be done; surrogate
> handling depends on the encoding and whether the unicode string
> implementation uses 16 or 32 bits per character.
> 
> I agree that we need to be able to specify the error handling as well.

Should it be possible to change the error handling during the lifetime 
of a stream? Then this change would have to be passed through to the 
underlying codec.

> UnicodeErrors may be raised.

Servus,
    Walter

> On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
>> On 2/27/07, Adam Olsen <rhamph at gmail.com> wrote:
>>> On 2/26/07, Mike Verdone <mike.verdone at gmail.com> wrote:
>>>> Text I/O
>>>> ... operate on a per-character basis instead of a per-byte basis.
>>> "per-character" needs some clarification.  I'm guessing this will only
>>> return entire code points, but the unicode type will expose them as
>>> code units, so it could be seen as both per-code-point and
>>> per-code-unit.
>> Does this just mean that you assume
>> (1) UTF32
>> (2) surrogate pairs will show up as two characters
>> (3) diacritics may (or may not) show up separately from their base characters?
>>
>> This does suggest that error-correction should be specified (or at
>> least explicitly not specified).  If the underlying input byte-stream
>> contains an invalid sequence, will the TextIO raise a
>> UnicodeDecodeError?  Or will its error/replace/delete behavior be
>> settable?
>>
>> Does the Text class promise to catch things like an invalid
>> combination of surrogates?
>>
>> -jJ
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>
> 
> 


From guido at python.org  Tue Feb 27 21:44:25 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 14:44:25 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <45E49714.9060003@livinglogic.de>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>
	<fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>
	<ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>
	<45E49714.9060003@livinglogic.de>
Message-ID: <ca471dc20702271244u41aad6e9vcc1f9d54850475c2@mail.gmail.com>

On 2/27/07, Walter D?rwald <walter at livinglogic.de> wrote:
> Guido van Rossum wrote:
>
> > The encoding/decoding behavior should be no different from that of the
> > encode() and decode() methods on unicode strings and byte arrays.
>
> Except that it must work in incremental mode. The new (in 2.5)
> incremental codecs should be usable for that.

Thanks for reminding! Do the incremental codecs have internal state? I
wonder how this interacts with non-blocking reads. (I know
next-to-nothing about incremental codecs beyond that they exist. :-)

> > Certainly no normalization of diacritics will be done; surrogate
> > handling depends on the encoding and whether the unicode string
> > implementation uses 16 or 32 bits per character.
> >
> > I agree that we need to be able to specify the error handling as well.
>
> Should it be possible to change the error handling during the lifetime
> of a stream? Then this change would have to be passed through to the
> underlying codec.

Not unless you have a really good use case handy...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From oliphant.travis at ieee.org  Tue Feb 27 22:14:11 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 14:14:11 -0700
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
References: <erv9or$53s$1@sea.gmane.org> <ervc5o$fff$1@sea.gmane.org>
	<ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
Message-ID: <es26h4$1mh$1@sea.gmane.org>

Guido van Rossum wrote:
> On 2/26/07, Travis Oliphant <oliphant.travis at ieee.org> wrote:
>> Guido van Rossum wrote:
>> Great.  I have no problem with this.  Is your idea of locking the same
>> as mine (i.e. a function in the API for release?)
> 
> Right.

My understanding of this locking mechanism would require objects that 
wish to use it to keep track of how many views they have "exported" and 
refuse to re-allocate memory until the views have all been released.

In my understanding this would require the addition of at least one 
integer to the object structure.

So, for example, the bytesobject would need to at least add

int ob_views

to it's C-structure:

/* Object layout */
typedef struct {
     PyObject_VAR_HEAD
     Py_ssize_t ob_alloc; /* How many bytes allocated */
     int ob_views; /* Number of views to these bytes */
     char *ob_bytes;
} PyBytesObject;


On creation, ob_views would be initialized to 0 and whenever getbuffer 
was called it would increase this number and whenever releasebuffer was 
called it would decrease this number.

Am I missing something here?

-Travis


From guido at python.org  Tue Feb 27 22:18:33 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 15:18:33 -0600
Subject: [Python-3000] pre-PEP: Enhancing buffer protocol (tp_as_buffer)
In-Reply-To: <es26h4$1mh$1@sea.gmane.org>
References: <erv9or$53s$1@sea.gmane.org> <ervc5o$fff$1@sea.gmane.org>
	<ca471dc20702261228s6bf3760ay981b94a2e6030501@mail.gmail.com>
	<es26h4$1mh$1@sea.gmane.org>
Message-ID: <ca471dc20702271318x358fd360nf1877e261f61b2ff@mail.gmail.com>

On 2/27/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
> My understanding of this locking mechanism would require objects that
> wish to use it to keep track of how many views they have "exported" and
> refuse to re-allocate memory until the views have all been released.

Right.

> In my understanding this would require the addition of at least one
> integer to the object structure.

Right.

> So, for example, the bytesobject would need to at least add
>
> int ob_views
>
> to its C-structure:
>
> /* Object layout */
> typedef struct {
>      PyObject_VAR_HEAD
>      Py_ssize_t ob_alloc; /* How many bytes allocated */
>      int ob_views; /* Number of views to these bytes */
>      char *ob_bytes;
> } PyBytesObject;
>
> On creation, ob_views would be initialized to 0 and whenever getbuffer
> was called it would increase this number and whenever releasebuffer was
> called it would decrease this number.
>
> Am I missing something here?

I don't think so -- this is exactly what I was thinking of.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From walter at livinglogic.de  Tue Feb 27 22:27:02 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 27 Feb 2007 22:27:02 +0100
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702271244u41aad6e9vcc1f9d54850475c2@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>	
	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>	
	<fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>	
	<ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>	
	<45E49714.9060003@livinglogic.de>
	<ca471dc20702271244u41aad6e9vcc1f9d54850475c2@mail.gmail.com>
Message-ID: <45E4A226.9010908@livinglogic.de>

Guido van Rossum wrote:

> On 2/27/07, Walter D?rwald <walter at livinglogic.de> wrote:
>> Guido van Rossum wrote:
>>
>> > The encoding/decoding behavior should be no different from that of the
>> > encode() and decode() methods on unicode strings and byte arrays.
>>
>> Except that it must work in incremental mode. The new (in 2.5)
>> incremental codecs should be usable for that.
> 
> Thanks for reminding! Do the incremental codecs have internal state?

They might have, however in all *decoding* cases (except the CJK codecs, 
which I know nothing about) this is just undecoded input. E.g. if the 
UTF-16-LE incremental decoder (which is a BufferedIncrementalDecoder) 
gets passed an odd number of bytes in the decode() call, it decodes as 
much as possible and keeps the last byte in a buffer, which will be 
reused on the next call to decode().

AFAICR the only *encoder* that keeps state is the UTF-16 encoder: it has 
to remember whether a BOM has been output.

I don't know whether the CJK codecs do keep any state besides undecoded 
input for decoding. (E.g. a greedy UTF-7 incremental decoder might have to).

> I
> wonder how this interacts with non-blocking reads.

Non-blocking reads where the reason for implementing the incremental 
codecs: The codec decodes as much of the available input as possible and 
keeps the undecoded rest until the next decode() call.

> (I know
> next-to-nothing about incremental codecs beyond that they exist. :-)

The basic principle is that these codecs can encode strings and decode 
bytes in multiple chunks. If you want to encode a unicode string u in 
UTF-16 you can do it in one go:
    s = u.encode("utf-16")
or character by character:
    encoder = codecs.lookup("utf-16").incrementalencoder()
    s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True)
The incremental encoder makes sure, that the result contains only one BOM.

Decoding works in the same way:
    decoder = codecs.lookup("utf-16").incrementaldecoder()
    u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True)

>> > Certainly no normalization of diacritics will be done; surrogate
>> > handling depends on the encoding and whether the unicode string
>> > implementation uses 16 or 32 bits per character.
>> >
>> > I agree that we need to be able to specify the error handling as well.
>>
>> Should it be possible to change the error handling during the lifetime
>> of a stream? Then this change would have to be passed through to the
>> underlying codec.
> 
> Not unless you have a really good use case handy...

Not for decoding, but for encoding: If you're outputting XML and use an 
encoding that can't encode all unicode characters, then it makes sense 
to switch to "xmlcharrefreplace" error handling during the output of 
text nodes (and back to "strict" for element names etc.).

Servus,
    Walter

From guido at python.org  Tue Feb 27 22:37:29 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 15:37:29 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <45E4A226.9010908@livinglogic.de>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>
	<fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>
	<ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>
	<45E49714.9060003@livinglogic.de>
	<ca471dc20702271244u41aad6e9vcc1f9d54850475c2@mail.gmail.com>
	<45E4A226.9010908@livinglogic.de>
Message-ID: <ca471dc20702271337s625a1d41v42786b30dbf3ea5@mail.gmail.com>

On 2/27/07, Walter D?rwald <walter at livinglogic.de> wrote:
> Guido van Rossum wrote:
>
> > On 2/27/07, Walter D?rwald <walter at livinglogic.de> wrote:
> >> Guido van Rossum wrote:
> >>
> >> > The encoding/decoding behavior should be no different from that of the
> >> > encode() and decode() methods on unicode strings and byte arrays.
> >>
> >> Except that it must work in incremental mode. The new (in 2.5)
> >> incremental codecs should be usable for that.
> >
> > Thanks for reminding! Do the incremental codecs have internal state?
>
> They might have, however in all *decoding* cases (except the CJK codecs,
> which I know nothing about) this is just undecoded input. E.g. if the
> UTF-16-LE incremental decoder (which is a BufferedIncrementalDecoder)
> gets passed an odd number of bytes in the decode() call, it decodes as
> much as possible and keeps the last byte in a buffer, which will be
> reused on the next call to decode().
>
> AFAICR the only *encoder* that keeps state is the UTF-16 encoder: it has
> to remember whether a BOM has been output.
>
> I don't know whether the CJK codecs do keep any state besides undecoded
> input for decoding. (E.g. a greedy UTF-7 incremental decoder might have to).
>
> > I
> > wonder how this interacts with non-blocking reads.
>
> Non-blocking reads where the reason for implementing the incremental
> codecs: The codec decodes as much of the available input as possible and
> keeps the undecoded rest until the next decode() call.
>
> > (I know
> > next-to-nothing about incremental codecs beyond that they exist. :-)
>
> The basic principle is that these codecs can encode strings and decode
> bytes in multiple chunks. If you want to encode a unicode string u in
> UTF-16 you can do it in one go:
>     s = u.encode("utf-16")
> or character by character:
>     encoder = codecs.lookup("utf-16").incrementalencoder()
>     s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True)
> The incremental encoder makes sure, that the result contains only one BOM.
>
> Decoding works in the same way:
>     decoder = codecs.lookup("utf-16").incrementaldecoder()
>     u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True)

Thanks for the explanations, it is a little bit clearer now!

> >> > Certainly no normalization of diacritics will be done; surrogate
> >> > handling depends on the encoding and whether the unicode string
> >> > implementation uses 16 or 32 bits per character.
> >> >
> >> > I agree that we need to be able to specify the error handling as well.
> >>
> >> Should it be possible to change the error handling during the lifetime
> >> of a stream? Then this change would have to be passed through to the
> >> underlying codec.
> >
> > Not unless you have a really good use case handy...
>
> Not for decoding, but for encoding: If you're outputting XML and use an
> encoding that can't encode all unicode characters, then it makes sense
> to switch to "xmlcharrefreplace" error handling during the output of
> text nodes (and back to "strict" for element names etc.).

So do the incremental codecs allow this switching?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From walter at livinglogic.de  Tue Feb 27 22:47:26 2007
From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=)
Date: Tue, 27 Feb 2007 22:47:26 +0100
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702271337s625a1d41v42786b30dbf3ea5@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>	
	<aac2c7cb0702270700rb0a2a8ek31d4fff0bc264aa1@mail.gmail.com>	
	<fb6fbf560702271041w51c50136k7c01217f63f61b64@mail.gmail.com>	
	<ca471dc20702271102t7038c15esd1159f70b1495a20@mail.gmail.com>	
	<45E49714.9060003@livinglogic.de>	
	<ca471dc20702271244u41aad6e9vcc1f9d54850475c2@mail.gmail.com>	
	<45E4A226.9010908@livinglogic.de>
	<ca471dc20702271337s625a1d41v42786b30dbf3ea5@mail.gmail.com>
Message-ID: <45E4A6EE.6010903@livinglogic.de>

Guido van Rossum wrote:

> On 2/27/07, Walter D?rwald <walter at livinglogic.de> wrote:
> [...]
>> The basic principle is that these codecs can encode strings and decode
>> bytes in multiple chunks. If you want to encode a unicode string u in
>> UTF-16 you can do it in one go:
>>     s = u.encode("utf-16")
>> or character by character:
>>     encoder = codecs.lookup("utf-16").incrementalencoder()
>>     s = "".join(encoder.encode(c) for c in u) + encoder.encode(u"", True)
>> The incremental encoder makes sure, that the result contains only one 
>> BOM.
>>
>> Decoding works in the same way:
>>     decoder = codecs.lookup("utf-16").incrementaldecoder()
>>     u = u"".join(decoder.decode(c) for c in s) + decoder.decode("", True)
> 
> Thanks for the explanations, it is a little bit clearer now!
> 
> [...]
>> >> Should it be possible to change the error handling during the lifetime
>> >> of a stream? Then this change would have to be passed through to the
>> >> underlying codec.
>> >
>> > Not unless you have a really good use case handy...
>>
>> Not for decoding, but for encoding: If you're outputting XML and use an
>> encoding that can't encode all unicode characters, then it makes sense
>> to switch to "xmlcharrefreplace" error handling during the output of
>> text nodes (and back to "strict" for element names etc.).
> 
> So do the incremental codecs allow this switching?

Yes:

 >>> import codecs
 >>> ci = codecs.lookup("ascii")
 >>> enc = ci.incrementalencoder(errors="xmlcharrefreplace")
 >>> enc.encode(u"\xff")
'&#255;'
 >>> enc.errors = "strict"
 >>> enc.encode(u"\xff")
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/usr/local/lib/python2.5/encodings/ascii.py", line 22, in encode
     return codecs.ascii_encode(input, self.errors)[0]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in 
position 0: ordinal not in range(128)

And it's documented that changing the errors attribute is allowed:
    http://docs.python.org/lib/incremental-encoder-objects.html
    http://docs.python.org/lib/incremental-decoder-objects.html

Servus,
    Walter


From jimjjewett at gmail.com  Tue Feb 27 23:17:50 2007
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 Feb 2007 17:17:50 -0500
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702271239y49523ecfuea95c7d520271948@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>
	<ca471dc20702271051q21999822ye6c423bf4749217e@mail.gmail.com>
	<fb6fbf560702271118y661b8ea7o24e70b3a60e72f0c@mail.gmail.com>
	<ca471dc20702271239y49523ecfuea95c7d520271948@mail.gmail.com>
Message-ID: <fb6fbf560702271417l3e96343ds878c6ecc5125ac21@mail.gmail.com>

On 2/27/07, Guido van Rossum <guido at python.org> wrote:
> On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:

> > Therefore, normal code can ignore the possibility, or (to be really
> > robust against someone else messing with the input stream) add an "if
> > result is None: continue" clause to its loops.

> No, since that would mean busy-waiting while the I/O isn't ready,

Then should I assume that:

(1)  Read with a timeout is in the "better know your concrete object" category.

(2)  Dealing with possibly unready objects in a library/framework
(yield the timeslot?) should generally be framework specific.

> FWIW we just discovered that the buffered writers need a __del__
> method that calls flush()...

All they really need is a __close__ method -- you don't want it to
cause gc cycles, and it is OK if the flush happens more than once.

(I'll stop for now, as the __del__ semantics are a different long thread.)

-jJ

From greg.ewing at canterbury.ac.nz  Tue Feb 27 23:17:11 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Feb 2007 11:17:11 +1300
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
Message-ID: <45E4ADE7.7040709@canterbury.ac.nz>

Guido van Rossum wrote:

> I'd like to constrain newline to be either \n or \r\n for writing;

What about \r?

--
Greg

From guido at python.org  Tue Feb 27 23:37:18 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 16:37:18 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <45E4ADE7.7040709@canterbury.ac.nz>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<45E4ADE7.7040709@canterbury.ac.nz>
Message-ID: <ca471dc20702271437m4855be57j76b3905c3c3584de@mail.gmail.com>

On 2/27/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > I'd like to constrain newline to be either \n or \r\n for writing;
>
> What about \r?

Mac OS 9 has been dead and unsupported for many years now.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue Feb 27 23:39:12 2007
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 Feb 2007 16:39:12 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <fb6fbf560702271417l3e96343ds878c6ecc5125ac21@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<fb6fbf560702271022pda2ed8ckb66e4ab4309bedad@mail.gmail.com>
	<ca471dc20702271051q21999822ye6c423bf4749217e@mail.gmail.com>
	<fb6fbf560702271118y661b8ea7o24e70b3a60e72f0c@mail.gmail.com>
	<ca471dc20702271239y49523ecfuea95c7d520271948@mail.gmail.com>
	<fb6fbf560702271417l3e96343ds878c6ecc5125ac21@mail.gmail.com>
Message-ID: <ca471dc20702271439u5975e908j1cd22b46d6834c9c@mail.gmail.com>

On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 2/27/07, Guido van Rossum <guido at python.org> wrote:
> > On 2/27/07, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> > > Therefore, normal code can ignore the possibility, or (to be really
> > > robust against someone else messing with the input stream) add an "if
> > > result is None: continue" clause to its loops.
>
> > No, since that would mean busy-waiting while the I/O isn't ready,
>
> Then should I assume that:
>
> (1)  Read with a timeout is in the "better know your concrete object" category.

Using these shouldn't necessarily need to be (but you *should* know to
expect EWOULDBLOCK); but setting the timeout should be, yes.

> (2)  Dealing with possibly unready objects in a library/framework
> (yield the timeslot?) should generally be framework specific.

Yeah, event loop business typically is.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Wed Feb 28 01:20:24 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Feb 2007 13:20:24 +1300
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <ca471dc20702271437m4855be57j76b3905c3c3584de@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<79990c6b0702270257w459129bbp566385ec1e5f2646@mail.gmail.com>
	<ca471dc20702270838j63e680ydbbce30dfbb44688@mail.gmail.com>
	<45E4ADE7.7040709@canterbury.ac.nz>
	<ca471dc20702271437m4855be57j76b3905c3c3584de@mail.gmail.com>
Message-ID: <45E4CAC8.1070902@canterbury.ac.nz>

Guido van Rossum wrote:
> On 2/27/07, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> > What about \r?
> 
> Mac OS 9 has been dead and unsupported for many years now.

Even if the Python code isn't running on MacOS 9, it
might want to write a file that will be read by a
MacOS 9 system.

--
Greg

From oliphant.travis at ieee.org  Wed Feb 28 02:10:23 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 18:10:23 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
Message-ID: <es2kbv$f0r$1@sea.gmane.org>


Attached is my current draft of the enhanced buffer protocol for Python 
3000.  It is basically what has been discussed except for some issues 
with non single-segment memory areas (such as a sub-array).

Comments are welcome.

-Travis Oliphant

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pep_buffer.txt
Url: http://mail.python.org/pipermail/python-3000/attachments/20070227/6a49daa4/attachment-0001.txt 

From daniel at stutzbachenterprises.com  Wed Feb 28 03:00:12 2007
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Tue, 27 Feb 2007 20:00:12 -0600
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <es2kbv$f0r$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org>
Message-ID: <eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>

I know I'm joining this discussion late in the game, so I apologize if
my look through the list archives was not sufficiently exhaustive and
this has been proposed and shot down before...

What if the locking mechanism were put into the array's memory instead
of the container's memory?  If the array-memory is a PyObject, then
the existing reference counting mechanism can be used, instead of
inventing a new one.  We can introduce a new type, PyArray, that is
pretty much opaque (bare minimum of methods).  A PyArray is just a
PyObject_HEAD (one type pointer plus the reference counter) followed
by the data that would normally be there.

When an array-like container allocates memory, it allocates a PyArray
to store the actual data in.  When a caller request a view, the
container increments the PyArray's reference counter and returns a
pointer to the PyArray.  The caller is responsible for decrementing
the reference counter when it is done with the view, so
bf_releasebuffer becomes unnecessary.

The container cannot reallocate the memory unless the reference
counter on the PyArray is exactly 1.

Basically, I'm wondering if it makes sense to move the new reference
counter into the buffered memory rather than putting it in the
container, so that there is only one reference counter implementation.

Different question: what is a container supposed to do if a view is
locking its memory and it needs to reallocate to complete some
operation?  I assume it would raise an exception, but it would be nice
to spell this out in the PEP.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC

From ncoghlan at gmail.com  Wed Feb 28 03:19:30 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 28 Feb 2007 12:19:30 +1000
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
Message-ID: <45E4E6B2.8090506@gmail.com>

Daniel Stutzbach wrote:
> When an array-like container allocates memory, it allocates a PyArray
> to store the actual data in.  When a caller request a view, the
> container increments the PyArray's reference counter and returns a
> pointer to the PyArray.  The caller is responsible for decrementing
> the reference counter when it is done with the view, so
> bf_releasebuffer becomes unnecessary.
> 
> The container cannot reallocate the memory unless the reference
> counter on the PyArray is exactly 1.
> 
> Basically, I'm wondering if it makes sense to move the new reference
> counter into the buffered memory rather than putting it in the
> container, so that there is only one reference counter implementation.

An object can use a similar approach (by calling Py_INCREF/DECREF in the 
get/release methods), but there is no need for it to be the *only* 
approach (TOOWTDI is given significantly less emphasis in the C API, 
while speed & memory efficiency concerns are higher on the priority list).

> Different question: what is a container supposed to do if a view is
> locking its memory and it needs to reallocate to complete some
> operation?  I assume it would raise an exception, but it would be nice
> to spell this out in the PEP.
> 

I was wondering this, too. I'd also like to know what should happen if 
the object's Python refcount drops to zero, but the view count is still 
greater than 0.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jcarlson at uci.edu  Wed Feb 28 03:48:03 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Tue, 27 Feb 2007 18:48:03 -0800
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
Message-ID: <20070227182352.AE67.JCARLSON@uci.edu>


"Daniel Stutzbach" <daniel at stutzbachenterprises.com> wrote:
[snip]
> The container cannot reallocate the memory unless the reference
> counter on the PyArray is exactly 1.

Alternatively, some objects could opt to create a new PyArray of
sufficient size, copy data as necessary, and leave all previous views to
point to the old data.  If done periodically, this could lead to an
interesting versioning mechanism (especially if we could teach Python to
virtualize itself and pull memory from a specific buffer), but that is
a different discussion for a different day :)


About the only issue I can see with implementing the mechanism as you
describe is that everything that wants to offer the buffer interface
would need to store its data in a PyArray structure.  Bytes, unicode,
array.array, mmap, etc.  Most of the difference will essentially be a
call to PyArray_New() rather than PyMalloc(), and an indirection via
macro of PyArray_ASSTRINGANDSIZE() to get the pointer and length of the
buffer. I would suspect that such overhead would be minimal, but without
implementing and testing it on something that is used often (maybe
Python 2.x strings as the simplest example?), it would be hard to say.

The benefit to implementing the interface as described by Travis is that
if an object is read-only (like unicode), the acquire/release is (as in
the PyArray version) an incref/decref, and no other structural changes
are necessary.

Then again, after switching to PyArrays, all views are more or less an
incref or decref and the allocation of a "view" object to describe
memory layout.


 - Josiah


From daniel at stutzbachenterprises.com  Wed Feb 28 04:38:39 2007
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Tue, 27 Feb 2007 21:38:39 -0600
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <20070227182352.AE67.JCARLSON@uci.edu>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
	<20070227182352.AE67.JCARLSON@uci.edu>
Message-ID: <eae285400702271938x48c5a6c5va7341ef44a769c2f@mail.gmail.com>

On 2/27/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> About the only issue I can see with implementing the mechanism as you
> describe is that everything that wants to offer the buffer interface
> would need to store its data in a PyArray structure.  Bytes, unicode,
> array.array, mmap, etc.  Most of the difference will essentially be a
> call to PyArray_New() rather than PyMalloc(), and an indirection via
> macro of PyArray_ASSTRINGANDSIZE() to get the pointer and length of the
> buffer. I would suspect that such overhead would be minimal, but without
> implementing and testing it on something that is used often (maybe
> Python 2.x strings as the simplest example?), it would be hard to say.

Each type can implement it's own PyArray subtype, so there'd be no
need for a macro/function to do the indirection.  For example, if we
wanted to build an C integer-based array type for some reason, we
could create it's PyArray subtype as follows:

typedef struct {
    PyObject_HEAD
    int ival[1];
} PyIntArray;

The data can then be accessed cleanly like this:

PyIntArray *my_array = allocate_some_memory();
my_array->ival[some_index] = v;

Possibly on some architectures accessing the data will be very
slightly slower because ival isn't at the top of the structure.  I
wrote a short test program just now and didn't see a difference on my
architecture (Intel Duo).

> The benefit to implementing the interface as described by Travis is that
> if an object is read-only (like unicode), the acquire/release is (as in
> the PyArray version) an incref/decref, and no other structural changes
> are necessary.

If I read the source right, the current Unicode implementation
converts the unicode string to a regular string using the default
encoding when a buffer is requested.  Presumably this will need to be
re-thought for Python 3000 since non-unicode strings are going away.

However, for certain read-only types (like 2.5-style strings) their
implementation is already a PyObject with an array-tacked on to the
end.  These could be subtypes of the PyArray type with very little
trouble, and it would only be necessary to maintain one reference
counter for them instead of two.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC

From oliphant.travis at ieee.org  Wed Feb 28 05:05:23 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 21:05:23 -0700
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
Message-ID: <es2uk3$7cj$1@sea.gmane.org>

Daniel Stutzbach wrote:
> I know I'm joining this discussion late in the game, so I apologize if
> my look through the list archives was not sufficiently exhaustive and
> this has been proposed and shot down before...

No, I don't think you are late.  But this discussion has been going on 
off and on for at least 10 years :-)  We don't all remember all the 
issues, though.

> 
> What if the locking mechanism were put into the array's memory instead
> of the container's memory?  

Basically, my first proposal was to have a single view object and you 
would get at the memory through it.  But, having a light-weight API that 
returns a pointer to memory like the current one does is desirable.

> If the array-memory is a PyObject, then
> the existing reference counting mechanism can be used, instead of
> inventing a new one.  We can introduce a new type, PyArray, that is
> pretty much opaque (bare minimum of methods).  A PyArray is just a
> PyObject_HEAD (one type pointer plus the reference counter) followed
> by the data that would normally be there.
> 

The original object still needs to distinguish between normal references 
and "view-based references."  Thus, even with your proposal it seems you 
will need another counter on the objects that wish to track 
buffer-interface views.


> When an array-like container allocates memory, it allocates a PyArray
> to store the actual data in.  When a caller request a view, the
> container increments the PyArray's reference counter and returns a
> pointer to the PyArray.  The caller is responsible for decrementing
> the reference counter when it is done with the view, so
> bf_releasebuffer becomes unnecessary.

Maybe I'm not understanding you correctly.  Perhaps what you are saying 
is that we should have all memory allocation go through a light-weight 
memory-object.  Then, you would get this object + an offset when you 
wanted a pointer into memory.

This way, the memory would never be deallocated until nothing was 
referencing it.  I think this approach would work.   However, you could 
still have the case, where an object reallocated memory while another 
object which thought it had a view of that object ended up with a 
"out-dated" view.   You just wouldn't segfault in that case.

You could check the reference count on the memory object, before 
reallocating, I suppose. But I've heard that the reference counts on 
Python objects can be larger than 1 in some cases (even though there 
isn't really anything "viewing" the memory).


> 
> The container cannot reallocate the memory unless the reference
> counter on the PyArray is exactly 1.

I'm not sure we can guarantee this would work.  It seems like for 
various reasons depending on the state of the interpreter, reference 
counts increase.

> 
> Basically, I'm wondering if it makes sense to move the new reference
> counter into the buffered memory rather than putting it in the
> container, so that there is only one reference counter implementation.
> 

This is an idea I've thought of too, but we would be enforcing a 
"use-python for all shared memory allocations" restriction.


> Different question: what is a container supposed to do if a view is
> locking its memory and it needs to reallocate to complete some
> operation?  I assume it would raise an exception, but it would be nice
> to spell this out in the PEP.

It would raise an exception.

-Travis




From greg.ewing at canterbury.ac.nz  Wed Feb 28 04:58:07 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Feb 2007 16:58:07 +1300
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <45E4E6B2.8090506@gmail.com>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
	<45E4E6B2.8090506@gmail.com>
Message-ID: <45E4FDCF.6060408@canterbury.ac.nz>

Nick Coghlan wrote:
> I'd also like to know what should happen if 
> the object's Python refcount drops to zero, but the view count is still 
> greater than 0.

That shouldn't happen, because the code using the view
ought to be responsible for holding a reference to the
containing object as long as it's using the view.

--
Greg

From ncoghlan at gmail.com  Wed Feb 28 05:09:34 2007
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 28 Feb 2007 14:09:34 +1000
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <45E4FDCF.6060408@canterbury.ac.nz>
References: <es2kbv$f0r$1@sea.gmane.org>	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>	<45E4E6B2.8090506@gmail.com>
	<45E4FDCF.6060408@canterbury.ac.nz>
Message-ID: <45E5007E.3000203@gmail.com>

Greg Ewing wrote:
> Nick Coghlan wrote:
>> I'd also like to know what should happen if 
>> the object's Python refcount drops to zero, but the view count is still 
>> greater than 0.
> 
> That shouldn't happen, because the code using the view
> ought to be responsible for holding a reference to the
> containing object as long as it's using the view.

That's what I thought, but it should probably be mentioned explicitly in 
the PEP (and the eventual docs) that the memory reference needs to be in 
addition to a normal object reference, rather than instead of.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Wed Feb 28 05:20:07 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Feb 2007 17:20:07 +1300
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <es2kbv$f0r$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org>
Message-ID: <45E502F7.4070603@canterbury.ac.nz>

Travis E. Oliphant wrote:

>     typedef char *(*formatbufferproc)(PyObject *view, int *itemsize)
> 
>       Get the format-string of the memory using the struct-module
>       string syntax

I'm not sure whether a struct-format string would be
the most convenient form for use by C-level code, as
it could require some tedious parsing to extract
useful information from it.

>     typedef PyObject *(*shapebufferproc)(PyObject *view)
> 
>       Return a 2-tuple of lists containing shape information: (shape,
>       strides).

I'm also not sure about using Python data structures
to represent this, as it will force C-level code to
use Python API calls to pull it apart. What would be
wrong with C array of structs containing two integers
each?

The buffer API is for the use of C code, and it should
be designed with the convenience of C code in mind.
Using Python data structures unnecessarily seems like
the wrong way to go about that.

The following alternative would seem to provide most of
the things that Travis's proposal does without involving
Python objects:

     struct pybuffer_shape {
        Py_ssize_t length;
        Py_ssize_t stride;
     };

     typedef int (*getbufferproc)(PyObject *obj,
        void **buf, Py_ssize_t *len,
        char **format,
        struct pybuffer_shape **shape, int *ndim);

        /* Any of buf, format and shape may be NULL if you're
           not interested in them. */

     typedef int (*releasebufferproc)(PyObject *obj);

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Feb 28 05:21:50 2007
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 Feb 2007 17:21:50 +1300
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <20070227182352.AE67.JCARLSON@uci.edu>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
	<20070227182352.AE67.JCARLSON@uci.edu>
Message-ID: <45E5035E.1050602@canterbury.ac.nz>

Josiah Carlson wrote:

> About the only issue I can see with implementing the mechanism as you
> describe is that everything that wants to offer the buffer interface
> would need to store its data in a PyArray structure.

And that wouldn't be acceptable, because the point of
the buffer interface is to provide access to memory
which is *not* kept in any standard kind of container.
Often the memory is allocated and managed by an
external library, and we have no control over it.

--
Greg

From oliphant.travis at ieee.org  Wed Feb 28 05:53:42 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 21:53:42 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <45E502F7.4070603@canterbury.ac.nz>
References: <es2kbv$f0r$1@sea.gmane.org> <45E502F7.4070603@canterbury.ac.nz>
Message-ID: <es31en$d8i$1@sea.gmane.org>

Greg Ewing wrote:
> Travis E. Oliphant wrote:
> 
>>     typedef char *(*formatbufferproc)(PyObject *view, int *itemsize)
>>
>>       Get the format-string of the memory using the struct-module
>>       string syntax
> 
> I'm not sure whether a struct-format string would be
> the most convenient form for use by C-level code, as
> it could require some tedious parsing to extract
> useful information from it.

Yes, this was the reason for my dtype object.  But, I think that folks 
felt it was too much, especially since the struct-style syntax is 
already there in Python.

Do you have any other suggestions?

> 
>>     typedef PyObject *(*shapebufferproc)(PyObject *view)
>>
>>       Return a 2-tuple of lists containing shape information: (shape,
>>       strides).
> 
> I'm also not sure about using Python data structures
> to represent this, as it will force C-level code to
> use Python API calls to pull it apart. What would be
> wrong with C array of structs containing two integers
> each?

Nothing except memory management.  Now, you have to worry about 
allocating and deallocating memory.

> 
> The buffer API is for the use of C code, and it should
> be designed with the convenience of C code in mind.

I agree.  I would like to use something besides Python objects, but 
handling the memory allocation is non-trivial.

On the other hand, Python tuples are pretty simple wrappers around 
integers.

> Using Python data structures unnecessarily seems like
> the wrong way to go about that.
> 
> The following alternative would seem to provide most of
> the things that Travis's proposal does without involving
> Python objects:
> 
>      struct pybuffer_shape {
>         Py_ssize_t length;
>         Py_ssize_t stride;
>      };
> 
>      typedef int (*getbufferproc)(PyObject *obj,
>         void **buf, Py_ssize_t *len,
>         char **format,
>         struct pybuffer_shape **shape, int *ndim);
> 
>         /* Any of buf, format and shape may be NULL if you're
>            not interested in them. */
>

Besides not allowing for the request of a "contiguous" buffer from the 
object or a writeable one you are also not describing how allocation for 
this array of structs will be handled.

I'm not opposed in principle.  In fact, I would like to get rid of the 
Python objects in the protocol (in the array_struct interface for NumPy 
we have the shape and strides in an array of integers).

The memory management is the only issue.

-Travis


From oliphant.travis at ieee.org  Wed Feb 28 06:00:19 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 22:00:19 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <es31en$d8i$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org> <45E502F7.4070603@canterbury.ac.nz>
	<es31en$d8i$1@sea.gmane.org>
Message-ID: <es31r3$e1h$1@sea.gmane.org>

Travis E. Oliphant wrote:
> 
> The memory management is the only issue.

In fact, the PEP still has the issue of who manages the memory for the 
format-description string when it is returned.

The easiest thing to do is to return a Python String and let reference 
counting handle the memory management.

What if we were also to return from the shape call a Python C-Object 
that loosely wrapped the shape and strides c-arrays.  Then, it would 
free the memory on deallocation.

A C-API call that created such a C-Object from two arrays of integers 
could be provided to make it easy.

-Travis


From oliphant.travis at ieee.org  Wed Feb 28 06:34:28 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 22:34:28 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <45E502F7.4070603@canterbury.ac.nz>
References: <es2kbv$f0r$1@sea.gmane.org> <45E502F7.4070603@canterbury.ac.nz>
Message-ID: <es33r3$htu$1@sea.gmane.org>

Greg Ewing wrote:
> 
> The buffer API is for the use of C code, and it should
> be designed with the convenience of C code in mind.
> Using Python data structures unnecessarily seems like
> the wrong way to go about that.
> 
> The following alternative would seem to provide most of
> the things that Travis's proposal does without involving
> Python objects:
> 


In my latest version of the PEP, I suggest using Python CObject's as 
loose wrappers around C-structures for both the char * format string and 
the structure

int ndim
Py_ssize_t *shape;
Py_ssize_t *strides;

This way, we get the benefit of Python object counting for memory 
management but easy-access to the relevant C-objects.

I've also added simple functions to the proposed C-API to construct 
these C-objects.

-Travis


From oliphant.travis at ieee.org  Wed Feb 28 06:35:54 2007
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Tue, 27 Feb 2007 22:35:54 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <es2kbv$f0r$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org>
Message-ID: <es33tp$htu$2@sea.gmane.org>

Travis E. Oliphant wrote:
> 
> Attached is my current draft of the enhanced buffer protocol for Python 
> 3000.  It is basically what has been discussed except for some issues 
> with non single-segment memory areas (such as a sub-array).
> 
> Comments are welcome.
> 

The latest version of the PEP is always available here:

http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/pep_buffer.txt


-Travis


From rasky at develer.com  Wed Feb 28 09:20:01 2007
From: rasky at develer.com (Giovanni Bajo)
Date: Wed, 28 Feb 2007 09:20:01 +0100
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
Message-ID: <es3dvg$a2d$1@sea.gmane.org>

[reposting since the first time it didn't get through...]


On 26/02/2007 22.35, Mike Verdone wrote:

 > Daniel Stutzbach and I have prepared a draft PEP for the new IO system
 > for Python 3000. This document is, hopefully, true to the info that
 > Guido wrote on the whiteboards here at PyCon. This is still a draft
 > and there's quite a few decisions that need to be made. Feedback is
 > welcomed.

Thanks for this!


 > Raw I/O
 > The abstract base class for raw I/O is RawIOBase.  It has several
 > methods which are wrappers around the appropriate operating system
 > call.  If one of these functions would not make sense on the object,
 > the implementation must raise an IOError exception.  For example, if a
 > file is opened read-only, the .write() method will raise an IOError.
 > As another example, if the object represents a socket, then .seek(),
 > .tell(), and .truncate() will raise an IOError.
 >
 >    .read(n: int) -> bytes
 >    .readinto(b: bytes) -> int
 >    .write(b: bytes) -> int

What are the requirements here?

- Can read()/readinto() return *less* bytes than specified?
- Can read() return a 0-sized byte object (=no data available)?
- Can read() return *more* bytes than specified (think of a datagram socket or 
a decompressing stream)?
- Can readinto() read *less* bytes than specified?
- Can readinto() read zero bytes?
- Should read()/readinto() raise EOFError?
- Can write() write less bytes than specified?
- Can write() write zero bytes?

Please, see also the examples at the end of the mail before providing an answer :)

 >    .seek(pos: int, whence: int = 0) -> None
 >    .tell() -> int
 >    .truncate(n: int = None) -> None
 >    .close() -> None

Why should this very low-level basic type define *two* read methods? Assuming 
that readinto() is the most primitive, can we have the ABC RawIOBase provide a 
default read() method that calls readinto?

Consider providing more ABC/mixins to help implementations. 
ReadIOBase/WriteIOBase are pretty obvious:

class RawIOBase:
     def readable(self): return False
     def writeable(self): return False
     def seekable(self): return False

     def read(self,n): raise IOError
     def readinto(self,b): raise IOError
     def write(self,b): raise IOError
     def seek(self,pos,wh): raise IOError
     def tell(self): raise IOError
     def truncate(self,n=None): raise IOError


class ReadIOBase(RawIOBase):
     def readable(self): return True
     def read(self, n):
         b = bytes(n)  #whatever
         self.readinto(b)
         return b


class MySpecialReader(ReadIOBase):
     def readinto(self, b):
         # ....
         # must implement only this and nothing else

class MySpecialReaderWriter(ReadIOBase, WriteIOBase):
     def readinto(self, b):
         # ....
     def write(self, b):
         # ....


 >     (should these "is_" functions be attributes instead?
 > "file.readable == True")

Yes, I think readable/writeable/seekable/fileno *perfectly* match the good 
usage of attributes/properties. They all provide a value without any 
side-effect and that can be computed without doing O(n)-style computations.


 > Buffered I/O
 > The next layer is the Buffer I/O layer which provides more efficient
 > access to file-like objects. The abstract base class for all Buffered

I think you probably want the buffer size to be optionally specified by the 
user, for the standard 4 implementations.

 > Q: Do we want to mandate in the specification that switching between
 > reading to writing on a read-write object implies a .flush()?  Or is
 > that an implementation convenience that users should not rely on?

I'd be glad if using flush() wasn't a requirement for users of the class. It 
always strikes me as abstraction leak to me.

 > TextIOBase class implementations additionally provide the following methods:
 >
 >     .readline(self)
 >
 >        Read until newline or EOF and return the line.
 >
 >     .readlinesiter()
 >
 >        Returns an iterator that returns lines from the file (which
 > happens to be 'self').
 >
 >     .next()
 >
 >        Same as readline()
 >
 >     .__iter__()
 >
 >        Same as readlinesiter()

Note sure why you need "readlinesiter()" at all. I thought Py3k was disposing 
most of the "fooiter()" functions (thinking of dicts...).


 > Another way to do it is as follows (we should pick one or the other):
 >
 >     .__init__(self, buffer, encoding=None, newline=None)

I think this is clearer. I can't find a good real-world usecase for requiring 
the two parameters version.

==========================================================================

Now for some real example. Let's say I'm given a readable RawIOBase object. 
I'm told that it's a foobar-compressed utf-8 text-file. I have this API available:

     class Foobar:
        # initialize decompressor
        __init__()

        # feed compressed bytes and get uncompressed bytes.
        # The uncompressed data can be smaller, equal or larger
        # than the compressed data
        decompress(bytes) -> bytes

        # finish decompression and get tail
        flush() -> bytes


This is basically similar to the way zlib.decompress/flush works. I would like 
to wrap the readable RawIOBase object in a way that I obtain a textual 
file-like with readline() etc.

This is pretty hard to do with the current I/O library (you need to write a 
lot of code). It'd be good if the new I/O library makes it easier to achieve.

Let's see. I start with a raw I/O reader:

class FoobarRaw(RawIOBase):
     def __init__(self, raw):
         self.raw = raw
         self._d = Foobar()
         self._buf = bytes()

     def readable(self):
         return True

     # I assume RawIOBase.read() must return the
     #   exact number of bytes (unless at the end).
     # I assume RawIOBase.read() raises EOFError when done
     # I assume readinto() does not exist...
     def read(self, n):
         try:
             while len(self._buf) < n:
                 b = self.raw.read(n)
                 self._buf += self._d.decompress(b)
         except EOFError:
             self._buf += self._d.flush()

         d = self._buf[:n]
         del self._buf[:n]
         if not d:
             raise EOFError
         return d

and complete the job:

def foobar_open(raw):
     return TextIOWrapper(BufferedReader(FoobarRaw(raw)), encoding="utf-8")

for L in foobar_open(sock):
     print(L)


Uhm, looks great!

==========================================================================

Now, it might be interesting playing with the different semantic of 
RawIOBase.read(), which I proposed above, and see how the implementation of 
FoobarRaw.read() changes.

For instance (now being radical): why don't we drop the "n" argument 
altogether? We could just define it like this:

     # Returns a block of data, whose size is implementation-defined
     # and may vary between calls. It never returns a zero-sized block.
     # Raises EOFError when done.
     read() -> bytes

After all, there's a BufferedIO layer to handle buffering and exact-size 
reads/writes. If we go this way, the above example is even easier:

     def read(self):
         try:
            b = self.raw.read() # any size!
            return self._d.decompress(b)
         except EOFError:
            b = self._d.flush()
            if not b:
               raise EOFError
            return b

It would also work well for sockets, since they would return exactly the 
buffer of data arrived from the network, and simply block once if there's not 
data available.
-- 
Giovanni Bajo


From theller at ctypes.org  Wed Feb 28 13:07:53 2007
From: theller at ctypes.org (Thomas Heller)
Date: Wed, 28 Feb 2007 13:07:53 +0100
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <es2kbv$f0r$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org>
Message-ID: <es3rao$77c$1@sea.gmane.org>

Travis E. Oliphant schrieb:
> Attached is my current draft of the enhanced buffer protocol for Python 
> 3000.  It is basically what has been discussed except for some issues 
> with non single-segment memory areas (such as a sub-array).
> 
> Comments are welcome.
> 
> -Travis Oliphant


> Additions to the struct string-syntax
> 
>    The struct string-syntax is missing some characters to fully
>    implement data-format descriptions already available elsewhere (in
>    ctypes and NumPy for example).  Here are the proposed additions:
> 
>    Character         Description
>    ==================================
>    '1'               bit (number before states how many bits)
>    '?'               platform _Bool type 

In SVN trunk (2.6), the struct module already supports _Bool, but the
format character used is 't'.  Not a big issue, though, and I like '?'
better.

>    'g'               long double  
>    'F'               complex float  
>    'D'               complex double 
>    'G'               complex long double 

IIUC, in the latest PEP draft you have apparently changed to two-letter codes
for complex types; which is inconsistent with previous conventions in struct.

>    'c'               ucs-1 (latin-1) encoding 
>    'u'               ucs-2 
>    'w'               ucs-4 
>    'O'               pointer to Python Object 
>    'T{}'             structure (detailed layout inside {}) 
>    '(k1,k2,...,kn)'  multi-dimensional array of whatever follows 
>    ':name:'          optional name of the preceeding element 
>    '&'               specific pointer (prefix before another charater) 
>    'X{}'             pointer to a function (optional function 
>                                              signature inside {})
> 
>    The struct module will be changed to understand these as well and
>    return appropriate Python objects on unpacking.  Un-packing a
>    long-double will return a c-types long_double.

This is probably because there is no way for current Python to support
the long double datatype.  The question for ctypes is: How should ctypes
support that?  Should the .value attribute of a c_longdouble have two
components, should it expose the value as decimal, should Python itself
switch to using long double internally, or are there other possibilities?

>  Unpacking 'u' or
>    'w' will return Python unicode.  Unpacking a multi-dimensional
>    array will return a list of lists.  Un-packing a pointer will
>    return a ctypes pointer object.

ctypes does not support pointer objects of non-native byte order;
should they be forbidden?

>  Un-packing a bit will return a
>    Python Bool.
> 
>    Endian-specification ('=','>','<') is also allowed inside the
>    string so that it can change if needed.  The previously-specified
>    endian string is enforce at all times.  The default endian is '='.
> 
>    According to the struct-module, a number can preceed a character
>    code to specify how many of that type there are.  The
>    (k1,k2,...,kn) extension also allows specifying if the data is
>    supposed to be viewed as a (C-style contiguous, last-dimension
>    varies the fastest) multi-dimensional array of a particular format.
> 
>    Functions should be added to ctypes to create a ctypes object from
>    a struct description, and add long-double, and ucs-2 to ctypes.

Well, ucs-4 should probably be added to ctypes as well.  The current ctypes.c_wchar
type corresponds to the C WCHAR type, its size is configuration dependend.

Thomas


From daniel at stutzbachenterprises.com  Wed Feb 28 14:39:33 2007
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Wed, 28 Feb 2007 07:39:33 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <es3dvg$a2d$1@sea.gmane.org>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
	<es3dvg$a2d$1@sea.gmane.org>
Message-ID: <eae285400702280539t4f2d4652g111ed815ec995ebc@mail.gmail.com>

Note:  to make my answers true, I had to change the Non-blocking I/O
part of the PEP so that .read(), .write(), and .readinto() all return
None if no data is available from a non-blocking object.  Previously
it had specified that .readinto() would return 0, but I realized this
would be ambiguous with an EOF condition.

I'll work on fleshing out the PEP with answers to these questions
within a couple hours.

On 2/28/07, Giovanni Bajo <rasky at develer.com> wrote:
>  > Raw I/O
>  >
>  >    .read(n: int) -> bytes
>  >    .readinto(b: bytes) -> int
>  >    .write(b: bytes) -> int
>
> What are the requirements here?
>
> - Can read()/readinto() return *less* bytes than specified?

Yes.

> - Can read() return a 0-sized byte object (=no data available)?

A 0-sized byte object indicates end-of-file.

> - Can read() return *more* bytes than specified (think of a datagram socket or
> a decompressing stream)?

No.  For a Raw I/O object, any such extra bytes are either buffered in
the kernel or lost.  For a Buffered IO object, extra bytes are
buffered.

> - Can readinto() read *less* bytes than specified?

For a Raw I/O object, yes.  For a Buffered I/O object in non-blocking
mode, yes.  For a Buffer I/O object in blocking mode, no.

> - Can readinto() read zero bytes?

Only on end-of-file.

> - Should read()/readinto() raise EOFError?

On EOF, they return a length-0 object or 0 instead.  If the user tries
to read again *after* hitting EOF, then an EOFError is raised.

> - Can write() write less bytes than specified?

For a Raw I/O or non-blocking Buffered I/O object, yes.  For a
blocking Buffered I/O object, no.

> - Can write() write zero bytes?

Only if requested by the user. ;)

Exception to a few questions about: a zero-byte read/readinto/write
can occur on a non-blocking object, but the functions return None to
distinguish this case from an EOF condition.

> Please, see also the examples at the end of the mail before providing an answer :)
>
>  >    .seek(pos: int, whence: int = 0) -> None
>  >    .tell() -> int
>  >    .truncate(n: int = None) -> None
>  >    .close() -> None
>
> Why should this very low-level basic type define *two* read methods? Assuming
> that readinto() is the most primitive, can we have the ABC RawIOBase provide a
> default read() method that calls readinto?

> Yes, I think readable/writeable/seekable/fileno *perfectly* match the good
> usage of attributes/properties. They all provide a value without any
> side-effect and that can be computed without doing O(n)-style computations.

Unfortunately, seekable() may need to call .seek() to figure it out.
I favor calling .seek() (or usting stat()) once when constructing the
object and storing the value (since we'll almost certainly need to do
this anyway to figure out what kind of Buffered I/O object to use).
If we do that, then we can make these attributes.

> Now for some real example. Let's say I'm given a readable RawIOBase object.
> I'm told that it's a foobar-compressed utf-8 text-file. I have this API available:
>
>      class Foobar:
>         # initialize decompressor
>         __init__()
>
>         # feed compressed bytes and get uncompressed bytes.
>         # The uncompressed data can be smaller, equal or larger
>         # than the compressed data
>         decompress(bytes) -> bytes
>
>         # finish decompression and get tail
>         flush() -> bytes
>
>
> This is basically similar to the way zlib.decompress/flush works. I would like
> to wrap the readable RawIOBase object in a way that I obtain a textual
> file-like with readline() etc.

The easy way to do this is for the zlib decompressor to wrap the
RawIOBase object in an appropriate BufferIOBase object first.  Then
read() can be called with no argument and return as many bytes as are
available.  It sounds like you want to force RawIOBase objects to have
a buffer, too, which defeats the point of having layers.  Most
use-cases will want to use a BufferIOBase object to buffer the bytes
coming out of the raw object.  In a few cases though, it really is
useful to get down to the system-call level.  Part of the motivation
for reworking the I/O interface is to make this possible.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC

From exarkun at divmod.com  Wed Feb 28 15:10:46 2007
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Wed, 28 Feb 2007 09:10:46 -0500
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <eae285400702280539t4f2d4652g111ed815ec995ebc@mail.gmail.com>
Message-ID: <20070228141046.17852.1877073320.divmod.quotient.899@ohm>

On Wed, 28 Feb 2007 07:39:33 -0600, Daniel Stutzbach <daniel at stutzbachenterprises.com> wrote:
>
> [snip]
>
>> - Should read()/readinto() raise EOFError?
>
>On EOF, they return a length-0 object or 0 instead.  If the user tries
>to read again *after* hitting EOF, then an EOFError is raised.
>

What is the motivation for having two different ways to signal EOF?  How
is this case handled?

   >>> f = file('name', 'w')
   >>> g = file('name', 'r')
   >>> g.read(10)
   ''
   >>> f.write('bytes')
   >>> f.flush()
   >>> g.read(10)
   'bytes'
   >>>

Jean-Paul

From agthorr at barsoom.org  Wed Feb 28 05:52:54 2007
From: agthorr at barsoom.org (Daniel Stutzbach)
Date: Tue, 27 Feb 2007 22:52:54 -0600
Subject: [Python-3000] PEP Draft: Enhancing the buffer protcol
In-Reply-To: <es2uk3$7cj$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org>
	<eae285400702271800t29722cf3yf6a066ccf925edb2@mail.gmail.com>
	<es2uk3$7cj$1@sea.gmane.org>
Message-ID: <eae285400702272052h640d4650i1d60ccd4f573ccf@mail.gmail.com>

On 2/27/07, Travis E. Oliphant <oliphant.travis at ieee.org> wrote:
> Maybe I'm not understanding you correctly.  Perhaps what you are saying
> is that we should have all memory allocation go through a light-weight
> memory-object.  Then, you would get this object + an offset when you
> wanted a pointer into memory.
>
> This way, the memory would never be deallocated until nothing was
> referencing it.  I think this approach would work.   However, you could
> still have the case, where an object reallocated memory while another
> object which thought it had a view of that object ended up with a
> "out-dated" view.   You just wouldn't segfault in that case.
>
> You could check the reference count on the memory object, before
> reallocating, I suppose.

You have understood me correctly.

(though I see that Greg Ewing has raised a good objection so its a moot point)

> But I've heard that the reference counts on
> Python objects can be larger than 1 in some cases (even though there
> isn't really anything "viewing" the memory).

Is that true?

I'm writing an extension module (for my own use and to scratch an
itch) that relies on the following notion:

    If a C module never exposes an object to the user, then the
object's reference counter is only incremented/decremented by the
module.

Does the garbage collector sometimes temporarily increment reference
counters in the course of its operation?  I looked through the code,
but didn't see anything to that effect (except with regard to weak
reference objects).  I can't see how anything other than the garbage
collector would even find such an object.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC

From shredwheat at gmail.com  Wed Feb 28 06:50:16 2007
From: shredwheat at gmail.com (Pete Shinners)
Date: Tue, 27 Feb 2007 21:50:16 -0800
Subject: [Python-3000] unit test for advanced formatting
Message-ID: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>

I've gone over PEP3101 to create an initial unittest for the advanced
formatting. Based on this intro to the formatting syntax, I thought I'd also
share my thoughts. I've also experimented with this against the python
prototype of the formatting.

I have commented out the tests where that implementation fails, but should
work (by my interpretation). If anything these tests will provide a preview
look at the way the formatting looks.

1. The early python implementation does not allow "reusing" an argument
either by index or by keyword name. The PEP has not defined this behavior. I
think it is important to be allowed to reuse any of the argument objects
given to format.

2. The implementation we have always requires a "fill" argument in the
format, if a width is specified. It would be a big improvement if space
characters were default.

3. The specification is deep. It will take an intense amount of unit testing
of corner cases to make sure this is actually doing what is correct. It may
be too complex, but it is hard to know what might be yagni.

4. The PEP still leaves a bit of wiggle room in the design, but since an
implementation is underway, I think more experimentation would be better
before locking down the design.

5. The "strict mode" activation through a global state on the string object
is a bad idea. I would prefer some sort of "flags" argument passed to each
function. I would prefer the "strict" mode where exceptions are raised by
default. But I do not want the strict behavior of requiring all arguments to
be used.

6. Security on the attribute lookups is probably an unending topic. A simple
minimum would be to not allow attribute lookups on names starting with an
underscore.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070227/47ef8085/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_format.py
Type: text/x-python
Size: 5990 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-3000/attachments/20070227/47ef8085/attachment-0001.py 

From agthorr at barsoom.org  Wed Feb 28 15:24:21 2007
From: agthorr at barsoom.org (Daniel Stutzbach)
Date: Wed, 28 Feb 2007 08:24:21 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <20070228141046.17852.1877073320.divmod.quotient.899@ohm>
References: <eae285400702280539t4f2d4652g111ed815ec995ebc@mail.gmail.com>
	<20070228141046.17852.1877073320.divmod.quotient.899@ohm>
Message-ID: <eae285400702280624w6604a174r71264cd7cce38aaa@mail.gmail.com>

On 2/28/07, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> >On EOF, they return a length-0 object or 0 instead.  If the user tries
> >to read again *after* hitting EOF, then an EOFError is raised.
>
> What is the motivation for having two different ways to signal EOF?  How
> is this case handled?

I checked how Python 2.5 handles this, and you're right.  Read
operations should continue to return 0 bytes if the user keeps trying
to read at EOF.  Not sure what I was thinking.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC

From mike.verdone at gmail.com  Wed Feb 28 17:20:08 2007
From: mike.verdone at gmail.com (Mike Verdone)
Date: Wed, 28 Feb 2007 10:20:08 -0600
Subject: [Python-3000] unit test for advanced formatting
In-Reply-To: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>
References: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>
Message-ID: <5487f95e0702280820y17bdb171i2f4cb50fb62a54f0@mail.gmail.com>

Hi Pete,

These look very good. My comments to your comments below,

> 1. The early python implementation does not allow "reusing" an argument
> either by index or by keyword name. The PEP has not defined this behavior. I
> think it is important to be allowed to reuse any of the argument objects
> given to format.

I just sort of assumed it would be possible to do. Hopefully someone
can add it to the spec officially.

> 5. The "strict mode" activation through a global state on the string object
> is a bad idea. I would prefer some sort of "flags" argument passed to each
> function. I would prefer the "strict" mode where exceptions are raised by
> default. But I do not want the strict behavior of requiring all arguments to
> be used.

I agree. It feels kind of Perl-like. I have nightmares of someone
setting strict mode on the string and having unrelated modules start
blowing up. Could the strict formatting string be a subclass of
string?

strictformat("my format string {0}").format(...)

Alternately maybe strings should be strict by default and you'd have a
lenientformat type for lenient mode. Old-style formatting would blow
up when you were missing arguments. New format should be just as
strict unless you ask it to be nice.

Just my 2c.

Mike.


On 2/27/07, Pete Shinners <shredwheat at gmail.com> wrote:
> I've gone over PEP3101 to create an initial unittest for the advanced
> formatting. Based on this intro to the formatting syntax, I thought I'd also
> share my thoughts. I've also experimented with this against the python
> prototype of the formatting.
>
> I have commented out the tests where that implementation fails, but should
> work (by my interpretation). If anything these tests will provide a preview
> look at the way the formatting looks.
>
> 1. The early python implementation does not allow "reusing" an argument
> either by index or by keyword name. The PEP has not defined this behavior. I
> think it is important to be allowed to reuse any of the argument objects
> given to format.
>
> 2. The implementation we have always requires a "fill" argument in the
> format, if a width is specified. It would be a big improvement if space
> characters were default.
>
> 3. The specification is deep. It will take an intense amount of unit testing
> of corner cases to make sure this is actually doing what is correct. It may
> be too complex, but it is hard to know what might be yagni.
>
> 4. The PEP still leaves a bit of wiggle room in the design, but since an
> implementation is underway, I think more experimentation would be better
> before locking down the design.
>
> 5. The "strict mode" activation through a global state on the string object
> is a bad idea. I would prefer some sort of "flags" argument passed to each
> function. I would prefer the "strict" mode where exceptions are raised by
> default. But I do not want the strict behavior of requiring all arguments to
> be used.
>
> 6. Security on the attribute lookups is probably an unending topic. A simple
> minimum would be to not allow attribute lookups on names starting with an
> underscore.
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/mike.verdone%40gmail.com
>
>
>

From daniel at stutzbachenterprises.com  Wed Feb 28 18:13:39 2007
From: daniel at stutzbachenterprises.com (Daniel Stutzbach)
Date: Wed, 28 Feb 2007 11:13:39 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
Message-ID: <eae285400702280913s7b45cb58ueb12d7e09b3a2a1e@mail.gmail.com>

Should FileIO objects define the following methods and properties that
the Python 2 file object defines?

    mode
    name
    closed
    isatty

Secondly, should any of these be bumped up to the Raw I/O ABC?

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC

From oliphant.travis at ieee.org  Wed Feb 28 18:56:16 2007
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 28 Feb 2007 10:56:16 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <es3rao$77c$1@sea.gmane.org>
References: <es2kbv$f0r$1@sea.gmane.org> <es3rao$77c$1@sea.gmane.org>
Message-ID: <es4fob$1bj$1@sea.gmane.org>

Thomas Heller wrote:
> 
>>Additions to the struct string-syntax
>>
>>   The struct string-syntax is missing some characters to fully
>>   implement data-format descriptions already available elsewhere (in
>>   ctypes and NumPy for example).  Here are the proposed additions:
>>
>>   Character         Description
>>   ==================================
>>   '1'               bit (number before states how many bits)
>>   '?'               platform _Bool type 
> 
> 
> In SVN trunk (2.6), the struct module already supports _Bool, but the
> format character used is 't'.  Not a big issue, though, and I like '?'
> better.
> 

I think 't' should be used for the bit type also (because '1' is 
confusing when you have something like '71b'  which looks like 71 signed 
chars but is actually 7 bits + 1 signed char).

I've changed this in the current PEP.

> 
>>   'g'               long double  
>>   'F'               complex float  
>>   'D'               complex double 
>>   'G'               complex long double 
> 
> 
> IIUC, in the latest PEP draft you have apparently changed to two-letter codes
> for complex types; which is inconsistent with previous conventions in struct.

Yeah, I've introduced two-letter codes for pointers as well. But, there 
is a certain logic to it because 'Zd' would be similar to 'dd' except 
you would know that the two are supposed to be treated as a complex number.


> 
> 
>>   'c'               ucs-1 (latin-1) encoding 
>>   'u'               ucs-2 
>>   'w'               ucs-4 
>>   'O'               pointer to Python Object 
>>   'T{}'             structure (detailed layout inside {}) 
>>   '(k1,k2,...,kn)'  multi-dimensional array of whatever follows 
>>   ':name:'          optional name of the preceeding element 
>>   '&'               specific pointer (prefix before another charater) 
>>   'X{}'             pointer to a function (optional function 
>>                                             signature inside {})
>>
>>   The struct module will be changed to understand these as well and
>>   return appropriate Python objects on unpacking.  Un-packing a
>>   long-double will return a c-types long_double.
> 
> 
> This is probably because there is no way for current Python to support
> the long double datatype. 

Right.   On some platforms there is no difference between double and 
long double.  I guess returning a decimal object might actually be the 
easiest solution.


> The question for ctypes is: How should ctypes
> support that?  Should the .value attribute of a c_longdouble have two
> components, should it expose the value as decimal, should Python itself
> switch to using long double internally, or are there other possibilities?
> 

I think I like the decimal object solution better.

> 
>> Unpacking 'u' or
>>   'w' will return Python unicode.  Unpacking a multi-dimensional
>>   array will return a list of lists.  Un-packing a pointer will
>>   return a ctypes pointer object.
> 
> 
> ctypes does not support pointer objects of non-native byte order;
> should they be forbidden?

Yes, I'm fine with them being forbidden.

> 
> 
>>
>>   Functions should be added to ctypes to create a ctypes object from
>>   a struct description, and add long-double, and ucs-2 to ctypes.
> 
> 
> Well, ucs-4 should probably be added to ctypes as well.  The current ctypes.c_wchar
> type corresponds to the C WCHAR type, its size is configuration dependend.

I think you are right.  In the discussions for unifying string/unicode I 
really like the proposals that are leaning toward having a unicode 
object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending 
on what is in the string.

This does create some conversion issues that must be handled, but I 
think it is the best option.   In the Python 3.0 version of NumPy, I 
think that's what we are going to have (three different string types 
ucs-1, ucs-2, ucs-4).

-Travis


From jcarlson at uci.edu  Wed Feb 28 19:55:21 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 28 Feb 2007 10:55:21 -0800
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <es4fob$1bj$1@sea.gmane.org>
References: <es3rao$77c$1@sea.gmane.org> <es4fob$1bj$1@sea.gmane.org>
Message-ID: <20070228104438.AE6E.JCARLSON@uci.edu>


Travis Oliphant <oliphant.travis at ieee.org> wrote:
> I think you are right.  In the discussions for unifying string/unicode I 
> really like the proposals that are leaning toward having a unicode 
> object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending 
> on what is in the string.

Except that its not going to happen.  The width of the unicode
representation is going to be fixed at compile time, generally utf-16 or
ucs-4.  I say utf-16 because the representation allows for surrogate
pairs, etc., but each value of the pair are considered a "character",
where as (according to my potentially flawed memory of reading the spec)
ucs-2 doesn't allow for surrogates.

Note that I previously offered an overlay structure that could support
the O(logn) time access of arbitrary full characters regardless of
encoding (utf-8, utf-16 or ucs-4) using O(logn) space, but it was
decided by Guido that Python should return partial character (half of a
surrogate pair) rather than offer non-constant character access time.*

 - Josiah

* As a side note, the space and time is really a function of how often
surrogates or their equivalent in utf-8, etc., occurred.  In worst-case
O(logn) for both, but is actually a function of the structure of
occurrances of the non-constant character lengths.


From jackdied at jackdied.com  Wed Feb 28 21:52:12 2007
From: jackdied at jackdied.com (Jack Diederich)
Date: Wed, 28 Feb 2007 15:52:12 -0500
Subject: [Python-3000] PEP Draft: Class Decorators
Message-ID: <20070228205212.GD5537@performancedrivers.com>

Greetings from PyCon!

I read hundreds of emails in the dozens of threads about class decorators and
there was surprisingly little content (most of the arguments were about syntax
which is no longer up for debate). As a result this PEP is quite plain.

If any IronPython or Jython folks could throw in their two bits it would be
appreciated.  

PEP: 3XXX
Title: Class Decorators
Version: 1
Last-Modified: 28-Feb-2007
Authors: Jack Diederich
Implementation: SF#1671208
Status: Draft
Type: Standards Track
Created: 26-Feb-2007

Abstract
========

Extending the decorator syntax to allow the decoration of classes.

Rationale
=========

Allowing classes to be decorated serves many of the same purposes as
the ability to decorate functions.  Decorators move factory registration
and class manipulation to the top of the class definition instead of the
current alternate methods of post-processing or the action-at-a-distance
of metaclasses.

    import myfactory
      
    @myfactory.register
    class MyClass:
        pass

History and Implementation
==========================

Class decorators were originally proposed in PEP318 [1]_ and were rejected
by Guido [2]_ for lack of use cases.  Two years later he saw a use case
he liked and gave the go-ahead for a PEP and patch [3]_.

The current patch is loosely based on a pre-2.4 patch [4]_ and updated to
use the new AST.

Grammar/Grammar is changed from

   funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite

to

    decorated_thing: decorators (classdef | funcdef)
    funcdef: 'def' NAME parameters ['->' test] ':' suite

References
==========
If you enjoyed this PEP you might also enjoy:

.. [1] PEP 318, "Decorators for Functions and Methods"
  http://www.python.org/dev/peps/pep-0318/

.. [2] Class decorators rejection
  http://mail.python.org/pipermail/python-dev/2004-March/043458.html

.. [3] Class decorator go-ahead
  http://mail.python.org/pipermail/python-dev/2006-March/062942.html

.. [4] 2.4 class decorator patch
  http://python.org/sf/1007991

.. [5] 3.x class decorator patch
  http://python.org/sf/1671208


From oliphant at ee.byu.edu  Wed Feb 28 21:28:12 2007
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed, 28 Feb 2007 13:28:12 -0700
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <20070228104438.AE6E.JCARLSON@uci.edu>
References: <es3rao$77c$1@sea.gmane.org> <es4fob$1bj$1@sea.gmane.org>
	<20070228104438.AE6E.JCARLSON@uci.edu>
Message-ID: <45E5E5DC.9040103@ee.byu.edu>

Josiah Carlson wrote:

>Travis Oliphant <oliphant.travis at ieee.org> wrote:
>  
>
>>I think you are right.  In the discussions for unifying string/unicode I 
>>really like the proposals that are leaning toward having a unicode 
>>object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending 
>>on what is in the string.
>>    
>>
>
>Except that its not going to happen.  The width of the unicode
>representation is going to be fixed at compile time, generally utf-16 or
>ucs-4.  
>
Are you sure about this?  Guido was still talking about the 
multiple-version representation at PyCon a few days ago.

-Travis


From jcarlson at uci.edu  Wed Feb 28 22:40:06 2007
From: jcarlson at uci.edu (Josiah Carlson)
Date: Wed, 28 Feb 2007 13:40:06 -0800
Subject: [Python-3000] PEP Draft:  Enhancing the buffer protcol
In-Reply-To: <45E5E5DC.9040103@ee.byu.edu>
References: <20070228104438.AE6E.JCARLSON@uci.edu>
	<45E5E5DC.9040103@ee.byu.edu>
Message-ID: <20070228132631.AE7A.JCARLSON@uci.edu>


Travis Oliphant <oliphant at ee.byu.edu> wrote:
> Josiah Carlson wrote:
> >Travis Oliphant <oliphant.travis at ieee.org> wrote:
> >>I think you are right.  In the discussions for unifying string/unicode I 
> >>really like the proposals that are leaning toward having a unicode 
> >>object be an immutable string of either ucs-1, ucs-2, or ucs-4 depending 
> >>on what is in the string.
> >
> >Except that its not going to happen.  The width of the unicode
> >representation is going to be fixed at compile time, generally utf-16 or
> >ucs-4.  
>
> Are you sure about this?  Guido was still talking about the 
> multiple-version representation at PyCon a few days ago.

I was thinking of Guido's message from August 31, 2006 with the subject
of "Re: [Python-3000] UTF-16", in that message he states that he would
like it to be a configure (presumably during compilation) option.

If he's talking about different runtime representations, then there's an
entire thread discussing it with the subject of "How will unicode get
used?" in September of 2006, and an earlier thread prior to that.  While
I was an early proponent of 'represent minimally', I'm not terribly
worried about it either way at this point, and was merely attempting to
state what had been expressed in the past.


 - Josiah


From collinw at gmail.com  Wed Feb 28 23:15:30 2007
From: collinw at gmail.com (Collin Winter)
Date: Wed, 28 Feb 2007 16:15:30 -0600
Subject: [Python-3000] PEP Draft: Class Decorators
In-Reply-To: <20070228205212.GD5537@performancedrivers.com>
References: <20070228205212.GD5537@performancedrivers.com>
Message-ID: <43aa6ff70702281415o2c7ccd75n7fb3db167506abfe@mail.gmail.com>

On 2/28/07, Jack Diederich <jackdied at jackdied.com> wrote:
[snip]
> History and Implementation
> ==========================
>
> Class decorators were originally proposed in PEP318 [1]_ and were rejected
> by Guido [2]_ for lack of use cases.  Two years later he saw a use case
> he liked and gave the go-ahead for a PEP and patch [3]_.

While I can look up the use-case that prompted Guido to change his
mind via the footnote, I'd appreciate having a sampling of use-cases
listed in the PEP itself.

[snip]
> Grammar/Grammar is changed from
>
>    funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite
>
> to
>
>     decorated_thing: decorators (classdef | funcdef)
>     funcdef: 'def' NAME parameters ['->' test] ':' suite

The PEP should show how 'decorated_thing' fits into the existing grammar.

Thanks,
Collin Winter

From talin at acm.org  Wed Feb 28 21:37:13 2007
From: talin at acm.org (Talin)
Date: Wed, 28 Feb 2007 12:37:13 -0800
Subject: [Python-3000] unit test for advanced formatting
In-Reply-To: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>
References: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>
Message-ID: <45E5E7F9.2060004@acm.org>

Pete Shinners wrote:
> I've gone over PEP3101 to create an initial unittest for the advanced
> formatting. Based on this intro to the formatting syntax, I thought I'd 
> also
> share my thoughts. I've also experimented with this against the python
> prototype of the formatting.
> 
> I have commented out the tests where that implementation fails, but should
> work (by my interpretation). If anything these tests will provide a preview
> look at the way the formatting looks.
> 
> 1. The early python implementation does not allow "reusing" an argument
> either by index or by keyword name. The PEP has not defined this 
> behavior. I
> think it is important to be allowed to reuse any of the argument objects
> given to format.

Sounds good to me. I think that may have been a side effect of trying to 
insure that all arguments were used at least once.

> 2. The implementation we have always requires a "fill" argument in the
> format, if a width is specified. It would be a big improvement if space
> characters were default.

Concur.

> 3. The specification is deep. It will take an intense amount of unit 
> testing
> of corner cases to make sure this is actually doing what is correct. It may
> be too complex, but it is hard to know what might be yagni.

Well, all I can say is - it could have been a lot deeper. I had to 
restrict myself to limiting the scope as it was, as there are a lot of 
related issues that weren't covered.

> 4. The PEP still leaves a bit of wiggle room in the design, but since an
> implementation is underway, I think more experimentation would be better
> before locking down the design.
> 
> 5. The "strict mode" activation through a global state on the string object
> is a bad idea. I would prefer some sort of "flags" argument passed to each
> function. I would prefer the "strict" mode where exceptions are raised by
> default. But I do not want the strict behavior of requiring all 
> arguments to
> be used.

Here's my primary issue with this: I wanted a way to enable strictness 
on an application-wide level, without having to go and individually 
revise the many (typically hundreds) of individual calls to the string 
formatting function.

A typical example of what I am talking about here is something like a 
web application server, where you have a "development" mode and a 
"production" mode. In the development mode, you want to find errors as 
quickly as possible, so you enable strict formatting. In production, 
however, you want the server to be as fault-tolerant as possible, so you 
would enable lenient mode.

Moreover, I would want this strict/lenient decision to apply to all of 
the code modules that I am using, including libraries that I didn't write.

The problem with a flags argument is that most people aren't going to 
bother using it, since it destroys some of the convenience and 
simplicity of the string formatting function. Thus, if I am calling a 
library function, and that library wasn't written using the flag 
argument, then I have no way to control the setting except through some 
kind of global.

A more ideal solution would be to allow some kind of 'security context', 
i.e. apply strictness to all code which is running within a given 
context. However, I don't know of a way to do that in Python.

If someone can think of a better way to accomplish this, I'd love to 
hear it.

> 6. Security on the attribute lookups is probably an unending topic. A 
> simple
> minimum would be to not allow attribute lookups on names starting with an
> underscore.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org

From mike.verdone at gmail.com  Wed Feb 28 23:57:31 2007
From: mike.verdone at gmail.com (Mike Verdone)
Date: Wed, 28 Feb 2007 16:57:31 -0600
Subject: [Python-3000] unit test for advanced formatting
In-Reply-To: <45E5E7F9.2060004@acm.org>
References: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>
	<45E5E7F9.2060004@acm.org>
Message-ID: <5487f95e0702281457t7ff68deq73eeed7df5b58482@mail.gmail.com>

Hi Talin,

Some more thoughts...

> Here's my primary issue with this: I wanted a way to enable strictness
> on an application-wide level, without having to go and individually
> revise the many (typically hundreds) of individual calls to the string
> formatting function.
>
> A typical example of what I am talking about here is something like a
> web application server, where you have a "development" mode and a
> "production" mode. In the development mode, you want to find errors as
> quickly as possible, so you enable strict formatting. In production,
> however, you want the server to be as fault-tolerant as possible, so you
> would enable lenient mode.

Personally, I think this is something that application writers should
worry about. Like this:

def fmtString(string):
    return lenientformat(string) if applicationInProduction else string

fmtString("my format {0}").format(...)

Now the application can turn on and off strict mode from one place for
all strings that call fmtString. I don't think I'd ever want to change
between lenient and strict mode across the board. What if a module
writer writes code that depends upon format strings being strict and
raising exceptions? When you switch your application to lenient mode,
the entire functioning of the module could change. With an across the
board switch module writers will have to assume two completely
different failure modes for every use of format().

Mike.


On 2/28/07, Talin <talin at acm.org> wrote:
> Pete Shinners wrote:
> > I've gone over PEP3101 to create an initial unittest for the advanced
> > formatting. Based on this intro to the formatting syntax, I thought I'd
> > also
> > share my thoughts. I've also experimented with this against the python
> > prototype of the formatting.
> >
> > I have commented out the tests where that implementation fails, but should
> > work (by my interpretation). If anything these tests will provide a preview
> > look at the way the formatting looks.
> >
> > 1. The early python implementation does not allow "reusing" an argument
> > either by index or by keyword name. The PEP has not defined this
> > behavior. I
> > think it is important to be allowed to reuse any of the argument objects
> > given to format.
>
> Sounds good to me. I think that may have been a side effect of trying to
> insure that all arguments were used at least once.
>
> > 2. The implementation we have always requires a "fill" argument in the
> > format, if a width is specified. It would be a big improvement if space
> > characters were default.
>
> Concur.
>
> > 3. The specification is deep. It will take an intense amount of unit
> > testing
> > of corner cases to make sure this is actually doing what is correct. It may
> > be too complex, but it is hard to know what might be yagni.
>
> Well, all I can say is - it could have been a lot deeper. I had to
> restrict myself to limiting the scope as it was, as there are a lot of
> related issues that weren't covered.
>
> > 4. The PEP still leaves a bit of wiggle room in the design, but since an
> > implementation is underway, I think more experimentation would be better
> > before locking down the design.
> >
> > 5. The "strict mode" activation through a global state on the string object
> > is a bad idea. I would prefer some sort of "flags" argument passed to each
> > function. I would prefer the "strict" mode where exceptions are raised by
> > default. But I do not want the strict behavior of requiring all
> > arguments to
> > be used.
>
> Here's my primary issue with this: I wanted a way to enable strictness
> on an application-wide level, without having to go and individually
> revise the many (typically hundreds) of individual calls to the string
> formatting function.
>
> A typical example of what I am talking about here is something like a
> web application server, where you have a "development" mode and a
> "production" mode. In the development mode, you want to find errors as
> quickly as possible, so you enable strict formatting. In production,
> however, you want the server to be as fault-tolerant as possible, so you
> would enable lenient mode.
>
> Moreover, I would want this strict/lenient decision to apply to all of
> the code modules that I am using, including libraries that I didn't write.
>
> The problem with a flags argument is that most people aren't going to
> bother using it, since it destroys some of the convenience and
> simplicity of the string formatting function. Thus, if I am calling a
> library function, and that library wasn't written using the flag
> argument, then I have no way to control the setting except through some
> kind of global.
>
> A more ideal solution would be to allow some kind of 'security context',
> i.e. apply strictness to all code which is running within a given
> context. However, I don't know of a way to do that in Python.
>
> If someone can think of a better way to accomplish this, I'd love to
> hear it.
>
> > 6. Security on the attribute lookups is probably an unending topic. A
> > simple
> > minimum would be to not allow attribute lookups on names starting with an
> > underscore.
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Python-3000 mailing list
> > Python-3000 at python.org
> > http://mail.python.org/mailman/listinfo/python-3000
> > Unsubscribe: http://mail.python.org/mailman/options/python-3000/talin%40acm.org
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/mike.verdone%40gmail.com
>

From brett at python.org  Wed Feb 28 23:29:28 2007
From: brett at python.org (Brett Cannon)
Date: Wed, 28 Feb 2007 16:29:28 -0600
Subject: [Python-3000] unit test for advanced formatting
In-Reply-To: <45E5E7F9.2060004@acm.org>
References: <cfd22a7c0702272150w73ed4f5dl49c78ee7979c429@mail.gmail.com>
	<45E5E7F9.2060004@acm.org>
Message-ID: <bbaeab100702281429w4263b266t6934379257b785f6@mail.gmail.com>

On 2/28/07, Talin <talin at acm.org> wrote:
[SNIP]
> >
> > 5. The "strict mode" activation through a global state on the string object
> > is a bad idea. I would prefer some sort of "flags" argument passed to each
> > function. I would prefer the "strict" mode where exceptions are raised by
> > default. But I do not want the strict behavior of requiring all
> > arguments to
> > be used.
>
> Here's my primary issue with this: I wanted a way to enable strictness
> on an application-wide level, without having to go and individually
> revise the many (typically hundreds) of individual calls to the string
> formatting function.
>
> A typical example of what I am talking about here is something like a
> web application server, where you have a "development" mode and a
> "production" mode. In the development mode, you want to find errors as
> quickly as possible, so you enable strict formatting. In production,
> however, you want the server to be as fault-tolerant as possible, so you
> would enable lenient mode.
>
> Moreover, I would want this strict/lenient decision to apply to all of
> the code modules that I am using, including libraries that I didn't write.
>
> The problem with a flags argument is that most people aren't going to
> bother using it, since it destroys some of the convenience and
> simplicity of the string formatting function. Thus, if I am calling a
> library function, and that library wasn't written using the flag
> argument, then I have no way to control the setting except through some
> kind of global.
>
> A more ideal solution would be to allow some kind of 'security context',
> i.e. apply strictness to all code which is running within a given
> context. However, I don't know of a way to do that in Python.
>
> If someone can think of a better way to accomplish this, I'd love to
> hear it.

Insert a value into __builtin__ and reference that in all format
calls.  That way it is global to the application if you want it but
does not force this level of granularity on people who want a more
fine-grained configuation at a per-method level.

-Brett

From daniel.stutzbach at gmail.com  Wed Feb 28 17:00:58 2007
From: daniel.stutzbach at gmail.com (Daniel Stutzbach)
Date: Wed, 28 Feb 2007 10:00:58 -0600
Subject: [Python-3000] Draft PEP for New IO system
In-Reply-To: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
References: <5487f95e0702261335k64a8a278sdf628dd7baec4773@mail.gmail.com>
Message-ID: <eae285400702280800u578fbc3cj558d271ff7ce5b18@mail.gmail.com>

What should Buffered I/O .write() do for a non-blocking object?

It seems like the .write() should write as much as it can to the Raw
I/O object and buffer the rest, but then how do we tell the Buffered
I/O object to "write more data from the buffer but still don't block"?

Along the same lines, for a non-blocking Buffer I/O object, how do we
specify "Okay, I know I've been writing only one byte a time so you
probably haven't bothered writing it to the raw object.  Write as much
data as you can now, but don't block".

Option #1: On a non-blocking object, .flush() writes as much as it
can, but won't block.  It would need a return value then, to indicate
whether the flush completed or not.

Option #2: Calling .write() with no arguments causes the Buffer I/O
object to flush as much write data to the raw object, but won't block.
 (For a blocking object, it would block until all data is written to
the raw object).

I prefer option #2 because a .flush() that doesn't flush is more surprising.

The goal of supporting non-blocking file-like objects is to be able to
use select() with buffered I/O objects (and other things like a
compressed socket stream).

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC